I dunno if this is the correct forum for this, but here goes…
Hey guys… I have a PHP script that I am going to be running in the background on my server; basically what it does is gets “x” amount of results from the database & then does several tasks with the data… each row can take between about 2 - 90 seconds to process depending on some factors.
So… I was wondering, server wise, is it more beneficial to run 5 threads of this script with each run having the task of 500 rows to process or 50 threads with each run having 50 rows to process?
Why I was wondering is because I came across this quote on another forum when I was reading about background processes…
For example, your PHP script should put an entry into a database table and return right away. Then, a cron running every minute checks the queue and forks a process for each job.
The advantage here is that you don’t lock an apache thread up for 10 minutes.
Basically each script will grab their results & straight after getting their results will “lock” each record (with a single query) by setting a field called “locked” to 1 & then once the row has been processed it “unlocks” it.
Sure… I suppose it’s a possibility that another script will grab one of the rows before they have been locked, but unlikely… & even if it does - processing wise - it’s not such a big deal if the row is processed again.
Well there actually will be few different scripts… probably 3 or 4 major ones.
Let me first of all ask you why you want the inserter script. Since this only needs to happen once, wouldn’t it be easier to just extend the normal data insertion procedure a little bit and don’t bother cron with doing that?
Not sure I completely understand what you’re asking… but basically the “inserter” script works off of new data that another script again has added to a table & goe soff & does all sorts of tasks with it… now the “updater” script will work on all the data, not just the “new” data… but since all the major tasks have already been taken care of previously this scripts overhead is much much lower.
Now I see your point about having less scripts running & how it may cause problems… we are running MyISAM because we need FULLTEXT support so don’t have row locking… but the way I have coded the database I have made it so that each version of the script never works on the same data as any of the others.
Okay, so you have two scripts: an inserter script and an updater script.
Let me first of all ask you why you want the inserter script. Since this only needs to happen once, wouldn’t it be easier to just extend the normal data insertion procedure a little bit and don’t bother cron with doing that?
As for the updater script, I personally would prefer 5 parallel scripts over 50 (actually I would prefer 1 script but that’s not one of the options). The reason for this is twofold.
50 scripts need more CPU and memory than 5
Having 50 scripts running on one database can create a transactional havoc.
The second point can be especially bad if your database table doesn’t support row locking (for MySQL: MyISAM tables don’t, InnoDB tables do).
I’ve once created a script that loaded data from external sources and inserted the retreived data in the database. As soon as several of these scripts started to run at the same time the whole server somehow ended up in a deadlock, the load went up to ~400, and it had to be rebooted
I sorted that by allowing just one instance of the script at any given time (using a PID file).
Ok… no, not updating an entire table… each record that is processed will after gathering the data add several rows of data to another table & update the current record it’s processing with some new data as well.
Yes… plan on using cron to run the process as below…
Cron will be run constantly throughout the day; however once the script has filled the database with the initial data it will have less data to process over time & the “updater” script will be the script that is doing more of the work… that script however doesn’t have the over head’s this script will have & will process records MUCH faster.
Well this script’s processing isn’t hugely critical, but the updater script is more so… so sooner the better for that one.
Hey thanks for the ideas… I now see the conundrum… the fields that are used in the fulltext index aren’t required for the processing, however they are updated in the process… but that isn’t a problem to do.
So your idea of moving the fulltext fields to a MyISAM table & converting the main table to InnoDB sounds like the way to go really.
Since you are using a MyISAM table that can become problematic. The thing is, if you want to write (update or insert) to one row, MySQL has to lock the complete table in order to do so. What you’re proposing to do is similar to throwing a dinner party for a 50 guests with lots of food but only one fork. At any given time 1 person can eat while the other 49 have to wait for the fork.
That can certainly happen. If you want to prevent this you could use transactions, but alas, just like row level locking, transactions are not supported by MyISAM either.
When running the script(s), do you need the fields with the FULLTEXT index? If not, you could split those out to another table and JOIN the two tables when needed. If you do that you can create an InnoDB main table and a MyISAM secondary table, which will overcome the row level locking problem, allows you to use transactions and you’ll still be able to use FULLTEXT search.
Just tested this and it does work. I’m not 100% if this solution will be efficient though. If you’re wondering about that you could head over to our MySQL forum and ask there