Finding Missing Rows Between Two Remote Databases

Here is my situation.

I have one table on on remote database server that uses a primary as “video_id”. It gets updated once a day with stats from another table on another remote database that has a column as “video_id” that should match up (but not primary).

Because of Database failures and such, the stats (second) server is missing about 500 rows of 110K rows that it should have.

I’d like to run a query to add every “video_id” that exists in the first table but does not exist in the other.

Remember each table is in a separate remote database.

Feedback appreciated.

Cheers!
Ryan

Run INSERT IGNORE queries??

True, they might fail for other reasons, but they should fill the gaps (some? all?) as long as there aren’t other underlying problems.

Running INSERTS for 110K rows to only get ~500 seems wasteful, But I don’t think it would be all that bad.

It would take a couple queries, but you could do it in three steps.

  1. Query the stats server to get a list of all the ids that are there, and save it to a file.
  2. Load the id file to a temp table on the primary table, then query for all records where the ids do not match
  3. Run the inserts for all records from step 2 into the stats table.

Thanks guys,

I went with the INSERT IGNORE approach, but came up with a way to at least indicate the row was missing from the stats table.

If the primary table that gets updated from the stats table has rows with the stats column still at ‘0’, high likelihood that the row is missing in the stats table. So I did a loop on all rows with a stat number less than ‘10’, and that returned about 600 rows, and did the INSERT IGNORE only then with those video_ids.

Cheers!
Ryan

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.