Issues around importing a CSV file on the cron run

Hi there,

I usually get an intelligent response in the PHP Sitepoint forum so I’ll post this conundrum up

I’m just planning ahead for my first Wordpress addon.

It will be a means to allow the affiliate CSV feeds to be downloaded and parsed on the cron run and then the data to create individual posts in Wordpress

There will be thousands and potentially tens of thousands of individual items on the CSV fields.

For every item I will need to check whether it already exists in the database. If not, then create a post with relevant category.

I would imagine that checking thousands of items against a database on a cron run will be quite time consuming and resource draining

I wonder if a better method would be to create a JSON file from the relevant database tables beforehand and then to check the items against the JSON file rather than directly against the database?

Also, if the individual items from the CVS are then fully integrated into the Wordpress database as individual posts then the user will have the ability to delete them.

If the user deletes thousands of items then what shouldn’t happen on the next cron run is that they are replaced again

There has to be some record that these items once existed. Any ideas how I can structure my database for this? I wouldn’t have thought it was good database optimisation practice to keep records of thousands of deleted items.

PS: I tried both Chrome and Firefox but the only way I could write into the form field was by turning off JavaScript. An issue there for the Mods to look into perhaps

If the user deletes thousands of items then what shouldn’t happen on the next cron run is that they are replaced again

Can you store each csv and run some kind of diff against yesterdays (or last uploaded by that client) csv file and only deal with those new items?

Or you can keep the entries in the DB, but mark them as having been deleted/ removed. That might fill up your table quite quickly, but with the right indexes and whatnot you should be OK. You could possibly use a REPLACE INTO statement here, I think.

The cron job shouldn’t take too long as long as you optimise your code and your queries. How often will this job run? If it’s once a day, then the overhead is negligible. If it’s every minute, more brainstorming is required :wink: