Get the lowest timestamp value of three Postgres tables

I’m over my head here. :swimmer:

I have a table that due to a miss in the code logic has some NULL timestamp fields

SELECT COUNT(id)
FROM topic_users
WHERE ( topic_users.last_visited_at IS NOT NULL
      AND topic_users.first_visited_at IS NULL )

* result = 91

Although it is impossible to determine what the actual first_visited_at timestamp should be or would have been, it is possible to get a “best (or at least better) guess” by using values when they exist in either of two other tables.

For example, if a user makes a post in a topic and later visits the topic the posts created_at timestamp value would be lower than the last_visited_at and be closer to, if not the, actual unrecorded first_visited_at

Similarly, if a user visited a topic and Liked a post and later visits the topic the post_actions created_at timestamp value would be lower than the last_visited_at and be closer to, if not the, actual unrecorded first_visited_at

I’ve cobbled together a mess of a query, and as it returns 5398 (instead of 91) I know I’m missing something

The three tables all have a user_id and a timestamp.
But the posts table is the only way to “connect” the topic_users table to the post_actions table.

topic_users.user_id
topic_users.last_visited_at
  topic_users.topic_id
post_actions.user_id
post_actions.created_at
  post_actions.post_id
posts.user_id
posts.created_at
  posts.topic_id
  posts.id

Prepare yourself …

WITH p_table AS (
SELECT posts.topic_id
 , posts.user_id
 , MIN(posts.created_at)
FROM posts
WHERE posts.topic_id IN (
  SELECT DISTINCT topic_users.topic_id
  FROM topic_users
  WHERE topic_users.posted
  AND ( topic_users.last_visited_at IS NOT NULL
         AND topic_users.first_visited_at IS NULL )
)
AND posts.user_id IN (
  SELECT DISTINCT topic_users.user_id
  FROM topic_users
  WHERE topic_users.posted
  AND ( topic_users.last_visited_at IS NOT NULL
         AND topic_users.first_visited_at IS NULL )
)
GROUP BY posts.topic_id, posts.user_id
),
pa_table AS (
SELECT post_actions.post_id
 , post_actions.user_id
 , MIN(post_actions.created_at)
FROM post_actions
WHERE post_actions.post_id IN (
  SELECT posts.id
  FROM posts
  WHERE posts.topic_id IN (
    SELECT DISTINCT topic_users.topic_id
    FROM topic_users
    WHERE ( topic_users.last_visited_at IS NOT NULL
         AND topic_users.first_visited_at IS NULL )
  )
)
AND post_actions.user_id IN (
  SELECT posts.user_id
  FROM posts
  WHERE posts.user_id IN (
    SELECT DISTINCT topic_users.user_id
    FROM topic_users
    WHERE ( topic_users.last_visited_at IS NOT NULL
         AND topic_users.first_visited_at IS NULL )
  )
)
GROUP BY post_actions.post_id, post_actions.user_id
),
tu_table AS (
SELECT topic_users.user_id
 , topic_users.topic_id
 , topic_users.last_visited_at
FROM topic_users
WHERE ( topic_users.last_visited_at IS NOT NULL
      AND topic_users.first_visited_at IS NULL )
)
SELECT COUNT(LEAST(p_table.min, pa_table.min, tu_table.last_visited_at))
FROM p_table, pa_table, tu_table
WHERE ( tu_table.user_id = p_table.user_id
    OR tu_table.user_id = pa_table.user_id )
    AND tu_table.topic_id = p_table.topic_id 

interesting scenario

were you intending to ask a question? because you didn’t :wink:

lol
I was going to begin the title with “how can I” but left it implied instead.
I should know better than to do that. .

I’ve tried various JOINs, subqueries, aliases, etc. but :frowning:

Where I should be getting 91 rows I’m getting 5398 (better than the first attempt’s 65K+ but just the same …)
So I’m obviously missing the way(s) to narrow the number of results down to the expected 91 rows.

My end goal is to be able to UPDATE the NULL first_visited_at fields with the lowest of the three timestamps. But I need to get them before I can do that.

For reference

disclaimer: i did not read that discourse discussion – glanced at it, said “nope nope nope”, and bailed

my suggestion – instead of one complex query, run three simple ones, such that they pull out the three timestamps you need and store them in a temp table, then do a MIN() on those in order to do your update

1 Like

Thanks, I hadn’t thought about using a temp table.

I’ll give it my best shot.

I’ve made some progress.

Getting the only last_visited_at fields is straightforward enough.

SELECT topic_users.topic_id
 , topic_users.user_id
 , topic_users.last_visited_at
FROM topic_users
WHERE ( topic_users.first_visited_at IS NULL
      AND topic_users.last_visited_at IS NOT NULL ) 

Getting the earliest post created_at fields was a bit more complex, but not all that bad to put together.

SELECT topic_users.topic_id
 , topic_users.user_id
 , MIN(posts.created_at)
FROM topic_users
JOIN posts
ON posts.topic_id = topic_users.topic_id 
AND posts.user_id = topic_users.user_id
WHERE topic_users.posted 
 AND ( topic_users.first_visited_at IS NULL
      AND topic_users.last_visited_at IS NOT NULL )   
GROUP BY topic_users.topic_id, topic_users.user_id  

Getting the earliest post_actions created_at was proving to be a hassle until I discovered a non-standard Postgres feature - “LATERAL”

SELECT tu.topic_id
 , tu.user_id
 , MIN(pa.created_at)
FROM ( SELECT topic_users.topic_id
             , topic_users.user_id
      FROM topic_users
      WHERE ( topic_users.first_visited_at IS NULL
            AND topic_users.last_visited_at IS NOT NULL )
      ) AS tu 
JOIN LATERAL ( SELECT posts.id
              FROM posts
              WHERE posts.topic_id = tu.topic_id 
              ) AS pt
ON true JOIN LATERAL ( SELECT post_actions.created_at
              FROM post_actions
              WHERE post_actions.post_id = pt.id 
              AND post_actions.user_id = tu.user_id 
              ) AS pa
ON true
GROUP BY tu.topic_id, tu.user_id

Now to the temp table.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.