Hi,
I’m currently running a PHP application on a linux VPS (MODS - I’ve posted here as I don’t think it’s specific to PHP, but please feel free to move it accordingly)
I’m currently retrieving domain information through an API, and enriching this with data from various web sources, some of which I have to access via cURL due to the lack of an API.
This didn’t pose a problem with 10,000 of records but as the number increases, I can see a definite bottleneck about to happen.
On the most basic level, the process is
- retrieve domain information then for each domain
- run internal processing (count number of characters etc)
- cURL information on PageRank
- cURL WHOIS data
- Repeat 1 - 3 for next domain
but naturally, with each cURL taking upto ten seconds this is a very slow process with lots of domains to check.
Would it be better design practice to run the cURL jobs for both PR and WHOIS as a separate script to take advantage of ‘multi-threading’ (so for example one cron job to retrieve the domain, one for the PR and one for the WHOIS) or would this make little overall difference?
Thanks