I just came back from FOSDEM where in the Perl devroom Liz Cholet did a talk on "
Automating Firefox with MozREPL, AnyEvent and Coro" (talk abstract is not very useful but I dunno if FOSDEM will later have videos or what). I don't think it matters she happened to do this with Perl, except I dunno what PHP's version of AnyEvent or Python's Twisted would be.
What she had to do at her job was have Firefox directed by some script go to a URL, grab the page and do analysis of the objects found on that page. Also Firefox itself needed to be modified/queried (you could get data from the plugins in Firefox itself!) sometimes and using MozREPL you get access to Firefox' own objects too. Firefox was being run by WWW::Mechanize::Firefox, which again I know there's also a Python version of Mechanize but in this case you want PHP to not actually call a browser but just someone like wget... But anyway,
like Cups suggested, her analysis parts were done separately from the page-loading parts. She used Perl's Any::Event for asynchronously sending the page requests, and Coro is for threading so basically the parts of her code that did the analysis would just be listening for the response so they could then start the analysis (then Firefox would get directed to call the next URL... I assume she had an array of URLs). K Wolfe mentioned ftp; Liz was using telnet port 4242 (for the demo she was using localhost, but at her work she also still used telnet).
I wonder if you could have an asynchronous calling of page URLs with your data-processing parts just listening for success and neither of these processes getting in the way of the rest of your stuff. I'm not sure if it would speed up your total but if currently one process is blocking the next then that might be something to look at.
Bookmarks