String modification optimization

Hey guys,

I have a string $longtext

This string contains a lot of text and varies in length (anywhere from 10,000 to 500,000 characters long). I am performing about 10-15 string_replace() functions and preg_replace() functions on this string like this:


$longtext = string_replace(blah blah);
$longtext = string_replace(blah blah);
$longtext = string_replace(blah blah);
$longtext = preg_replace(blah blah);
$longtext = preg_replace(blah blah);
$longtext = preg_replace(blah blah);
$longtext = preg_replace(blah blah);
$longtext = preg_replace(blah blah);

But I feel like this is completely inefficient and will take a a web server forever to complete especially if many people are using it (like 10,000 users at once).

Will this ever work?

Also, What would you say is more efficient?
1.


$longtext = str_replace('"batters"','"pitchers"',$longtext);
$longtext = str_replace("'batters'","'pitchers'",$longtext);

or


$longtext = preg_replace('/("|\\')batters("|\\')/','$1pitchers$2',$longtext);

I think the first option is probably faster then the second (regexps take more processing power, I believe), and the string functions in PHP are very fast.

Then again, if you have 10,000 concurrent users on your system all using this function on a half megabyte of data, you may need to rethink your application’s strategy somewhat. 10,000 concurrent users is a lot :slight_smile:

What exactly are you doing to the strings? Have you tried running some benchmarks (using, for example, ab) to see what gives the best performance?

For that matter, if you’re replacing this consistently, why not modify the text as it’s being stored, rather than retrieved. That way your concurrent users arn’t running all these functions to begin with.

The problem is, this data is not being stored. It is being taken from different locations and the different locations from which the data is being grabbed are endless in number. Also, the data can be different one day than the next. So it is unsafe to store the data because it is dynamic.

I guess what I’m wondering is, for example, how does a web proxy server support 10,000 users at once?

I think if you have 10,000 concurrent http users, php string replacement isn’t going to be that much of a bottleneck. HTTP, network bandwidth and your db will probably rate limit things before that happens if you aren’t load balancing across several web servers.

Guess I’m just trying to get a good idea of how many concurrent users a single web server can support doin these types of replacements. Basically I’m allowing people to enter a url and curl grabs the html and makes changes to the website like makes the background black or whatever the user chooses like make the font size 24px or red in color. So maybe its like a proxy server as it https the html then alters it with string replcements and regexp then sends it back to the user.

Knowing that, can u help me learn about performance and get an idea of how many users this heavy process can support?

The time taken to go and fetch the HTML with cURL will likely be orders of magnitude greater, and far more variable, than that taken to perform the string replacements regardless of which approach you take.

Have a look at the time taken by different parts of your code (search this forum for profiling PHP code, with tools like XDebug, XHProf, ab), to see areas which might be taking up a lot of time (these are good targets to look at for getting the overall script time down) and to get an estimate of how your script holds up to pressure at the moment.

MediaWiki has a FastStringSearch PHP extension

[mediawiki] Index of /trunk/extensions/FastStringSearch

Thanks for your feedback guys! Much appreciated. Last uestion… now u know my goal, do you think it’d ever be possible with many users? Or ajut be abl to support a few?