Accumulating A DateTime Series

General Overview: Debating on the best way to produce the desired results of data conversion.

Input: a series of records, containing a unique id, a category identifier, a start time, and an end time (Times are input as a sortable datetime string.)

Output: A tracking of categories that records the concurrent total of records over time.

So, for example.

uniqid cat start end
1 1 2024-12-06 04:04:04 2024-12-06 06:04:04
2 1 2024-12-06 05:04:04 2024-12-06 06:04:54
3 1 2024-12-06 06:02:04 2024-12-06 07:02:04
4 1 2024-12-06 07:04:04 2024-12-06 08:04:04

Theoretical Output:

cat time conc
1 2024-12-06 04:04:04 1
1 2024-12-06 05:04:04 2
1 2024-12-06 06:02:04 3
1 2024-12-06 06:04:04 2
1 2024-12-06 06:04:54 1
1 2024-12-06 07:02:04 0
1 2024-12-06 07:04:04 1
1 2024-12-06 08:04:04 0

(cat,time) would be a unique tuple.

Known Gotchas:
Start Time and End Time are not guaranteed to be unique within a cat. These are independent operations.

Current Working Theory:
walk through the data, creating the time based single column structure, with “conc” being a modifier number (so 1 for a start, -1 for a stop).
walk through the single column structure; hold the current value for each cat; for each entry in the column, add the modifier to the current; store the result in the row and in the current value.

This feels off, because I have to walk the data twice. But I can’t… think of a good way to do it alternately, because you can’t do the math on the endtime before knowing how many other things start or end during that time span.

I think I would first transform the data to (cat, date, date_type) where type can be start or end. Then sort that by date and then loop from top to bottom, every time you encounter a start that’s +1 for the cat, every time you encounter an end that’s -1 for the cat.

Not horribly efficient, but quite easy to follow, which is a good tradeoff in my mind.

Edit: reading your post again I think this is your proposal too. In that case I agree and I don’t think there is a better method to it as indeed you can’t know what end dates you will encounter and where.
Okay, there is an alternative I can think of, but it’s so much worse I won’t even try to describe it :sweat_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.