General Overview: Debating on the best way to produce the desired results of data conversion.
Input: a series of records, containing a unique id, a category identifier, a start time, and an end time (Times are input as a sortable datetime string.)
Output: A tracking of categories that records the concurrent total of records over time.
So, for example.
uniqid | cat | start | end |
---|---|---|---|
1 | 1 | 2024-12-06 04:04:04 | 2024-12-06 06:04:04 |
2 | 1 | 2024-12-06 05:04:04 | 2024-12-06 06:04:54 |
3 | 1 | 2024-12-06 06:02:04 | 2024-12-06 07:02:04 |
4 | 1 | 2024-12-06 07:04:04 | 2024-12-06 08:04:04 |
Theoretical Output:
cat | time | conc |
---|---|---|
1 | 2024-12-06 04:04:04 | 1 |
1 | 2024-12-06 05:04:04 | 2 |
1 | 2024-12-06 06:02:04 | 3 |
1 | 2024-12-06 06:04:04 | 2 |
1 | 2024-12-06 06:04:54 | 1 |
1 | 2024-12-06 07:02:04 | 0 |
1 | 2024-12-06 07:04:04 | 1 |
1 | 2024-12-06 08:04:04 | 0 |
(cat,time)
would be a unique tuple.
Known Gotchas:
Start Time and End Time are not guaranteed to be unique within a cat. These are independent operations.
Current Working Theory:
walk through the data, creating the time based single column structure, with “conc” being a modifier number (so 1 for a start, -1 for a stop).
walk through the single column structure; hold the current value for each cat; for each entry in the column, add the modifier to the current; store the result in the row and in the current value.
This feels off, because I have to walk the data twice. But I can’t… think of a good way to do it alternately, because you can’t do the math on the endtime before knowing how many other things start or end during that time span.