Select timeline in MySQL database

Hi here, I need your help.

This is my MySQL table.

+-----------+-------------+---------------------+---------+
|   element | status      | hour                |   ID    |
+-----------+-------------+---------------------+---------+
|        18 | Available   | 2020-01-19 14:23:49 |   6     |
|        18 | Unavailable | 2019-09-13 18:19:47 |   5     |
|        18 | Available   | 2019-09-13 18:18:49 |   4     |
|        18 | Unavailable | 2019-09-09 08:22:45 |   3     |
|        19 | Available   | 2019-09-07 19:13:56 |   2     |
|        19 | Available   | 2019-09-03 18:13:49 |   1     |
+-----------+-------------+---------------------+---------+

Normally the timeline of rows in this MySQL table for each element status is unavailable / available.

But it happened that for element number 19 the timeline of rows in status is available / available :

+----------+-------------+---------------------+
| element  | status      | hour                |
+----------+-------------+---------------------+
|       19 | Available   | 2019-09-07 19:13:56 |
|       19 | Available   | 2019-09-03 18:13:49 |
+----------+-------------+---------------------+

is this means anomaly.

I need to intercept these cases that is, all the rows for each element status when the timeline is available / available.

How to do resolve this ?

Please can you help me ?

Thank you in advance for any help.

Hi Guido1971 welcome to the forum.

Sorry, but I do not see what makes that an anomaly. (I would have the status column an integer datatype instead of text, anyway …) All rows have different id and datetime values. I understand that “19” is in two rows and both have the same “status” value. But “18” also has two rows with “available” status values.

Hi Mittineague, thank you fo reply.

I try to explain better.

This is the correct teoric sequence in timeline :

+-----------+-------------+---------------------+---------+
|   element | status      | hour                |   ID    |
+-----------+-------------+---------------------+---------+
|        18 | Available   | 2020-01-19 14:23:49 |   6     |
|        18 | Unavailable | 2019-09-13 18:19:47 |   5     |
|        18 | Available   | 2019-09-13 18:18:49 |   4     |
|        18 | Unavailable | 2019-09-09 08:22:45 |   3     |
|        19 | Available   | 2019-09-07 19:13:56 |   2     |
|        19 | Unavailable | 2019-09-03 18:13:49 |   1     |
+-----------+-------------+---------------------+---------+ 

Actually for element number 19 instead :

+----------+-------------+---------------------+
| element  | status      | hour                |
+----------+-------------+---------------------+
|       19 | Available   | 2019-09-07 19:13:56 |
|       19 | Available   | 2019-09-03 18:13:49 |
+----------+-------------+---------------------+

And this is error.

I need to intercept these cases that is, all the rows for each element status.

So the logic is something like

When - ordered by “hour” [datetime] and / or “ID”
and - grouped by “element”
sequential values for “status” should alternate between the two possible values (see how “status” sounds like a boolean?)

You want a query to find elements where this is not the case. i.e. the corrupt elements?

Exactly !

One query to find elements where this is not the case !

Thanks!

1 Like

It is safe to say that what you would initially like to do is to identify which “element” values have more than one unmatched value? i.e. in your example, the result might be

element count_avail count_notavail 
19      2           0 

What he’s saying is that every state-change of an element should be recorded, and that row ID’s 1 and 2 in his OP represent an anomaly - the system recorded a state change to Available when the state was last recorded as changing to Available.

This… shouldnt be too difficult… (Code untested, off the top of my head, and i’m sure there’s a better way.)

SELECT a.element,a.id,a.status,
(SELECT b.id FROM statustable b WHERE a.element = b.element AND b.hour < a.hour ORDER BY hour DESC LIMIT 1) as last_id, 
(SELECT b.status FROM statustable b WHERE a.element = b.element AND b.hour < a.hour ORDER BY hour DESC LIMIT 1) as last_status 
FROM statustable a 
WHERE a.status = a.last_status
ORDER BY a.hour DESC

Thanks you fo reply.

I have this error in your sql

[Err] 1054 - Unknown column ‘a.last_status’ in ‘where clause’[quote=“m_hutley, post:8, topic:346853, full:true”]
This… shouldnt be too difficult… (Code untested, off the top of my head, and i’m sure there’s a better way.)

SELECT a.element,a.id,a.status,
(SELECT b.id FROM statustable b WHERE a.element = b.element AND b.hour < a.hour ORDER BY hour DESC LIMIT 1) as last_id, 
(SELECT b.status FROM statustable b WHERE a.element = b.element AND b.hour < a.hour ORDER BY hour DESC LIMIT 1) as last_status 
FROM statustable a 
WHERE a.status = a.last_status
ORDER BY a.hour DESC

[/quote]

This output it’s correct… because when the value of count_notavail for the same element is 0 I have the problem …

Danger Will Robinson… consider this potential set of data:

19 Unavailable 05:00 4
19 Unavailable 04:00 3
19 Available 03:00 2
19 Available 02:30 1

Is this set a problem?

Those column names are “alias” names. eg. the “AS” part
https://dev.mysql.com/doc/refman/8.0/en/problems-with-alias.html

I lean towards avoiding terseness whenever possible if there’s a chance it may compromise clarity. I chose those names in hope of suggesting the values involved the aggregate COUNT function
https://dev.mysql.com/doc/refman/8.0/en/counting-rows.html

The “avail” and “notavail” names were in hope of suggesting those two values of the status field. So yes, they might better be named more explicitly as “count_status_available” and “count_status_unavailable”.

EDIT

Indeed, there are many potential “gotchas”. This is why my previous “initially”. My thought was a quick check for a problem would be a good start, under the assumption that concern of the “gotchas” involving a limited data set (eg. element = #) would be next.

More detailed specs regarding exactly what kind of results are wanted would be a big help at arriving at a better solution sooner.

that is the correct approach – stop erroneous data from being inserted into the database in the first place

even if you identify the existing bad rows now, with the queries people are suggesting for you, you must still do something to prevent this from happening again

1 Like

Thank you for suggestion.
I’m sorry but unfortunately this table is compiled by a hosted service.
Otherwise I would have already solved.
On this table I can only make a SELECT query, I don’t have other privileges

That’s too bad. My whole point of finding corrupt data is to fix the root cause so there won’t be yet more corrupt data.

If you can’t do that it doesn’t make sense to continue pursuing a query.

IMHO, you should hand this over to whoever can fix things.