Requests finish faster = fewer threads being spawned = better performance.Originally Posted by dotDan
Keep your pages lightweight. If you arent familiar with CSS already, become that way.
| SitePoint Sponsor |


Requests finish faster = fewer threads being spawned = better performance.Originally Posted by dotDan
Keep your pages lightweight. If you arent familiar with CSS already, become that way.

These links may be helpful:
LiveJournal's Backend - A history Of Scaling (pdf)
phpBB tweaks for large forums
Accelerating PHP Code Performance for Oracle
As far as PHP tweaking is concerned: there's generally not much speed improvement to be gained by messing in the PHP code, unless the original code uses inefficient algorithms. If that's not the case, then using an opcode cache and some sort of data caching mechanism are about as much optimization as you can hope to add. But on really large sites, you're going to run into a number of non PHP-related performance issues. Read the presentation about LiveJournal's servers to get some idea of what you may be facing in the future.




That's a classic example. A List Apart: Retooling Slashdot with Web StandardsOriginally Posted by dotDan




Loops are the bottlenecks.Already do I built a template engine of my own that allows easy editing of templates, but compiles them into PHP code, leaving all the complicated parsing to the Admin CP, and then those files can be then included. Probably the best way to go about this, I think?
You should use pull-templates instead of push-templates.
With pull-templates you have only one loop (in the template), and with push, you have one in your database-fetch-code, AND one in the template.
push:
PHP Code:while($dat = mysql_fetch_assoc(...))
{
$messages[] = $dat['msg'];
}
$tpl->set('messages', $messages);
pull:HTML Code:<ul> <?php foreach($messages as $message){ ?> <li><?=$message?></li> <?php } ?> </ul>
PHP Code:$tpl->set('messages', new MessageRetriever());
PHP Code:class MessageRetriever
{
function next()
{
$dat = mysql_fetch_assoc(...);
return $dat['msg'];
}
}
You could also do this without OOP. It just saves you 1/2 of your loops.HTML Code:<ul> <?php while($msg = $messages->next()){ ?> <li><?=$msg?></li> <?php } ?> </ul>




I'd just like to note that using proper abstraction those two can look identical from the template's pov. Gotta love PHP5.Originally Posted by Fenrir2
![]()



I've never seen such setup that runs vB or IPBOriginally Posted by Etnu
Just glance at the hardware page on big-boards...
Unless it's all static html :P
The funniest is Gaia...
80+ web servers, 11 database servers (4 dedicated to forums), 3 session database servers, 1 memory cache server
Speed & scalability in mind...
If you find my reply helpful, fell free to give me a point




At the moment I am writing a replacement to my own system with scalability in mind. I've looked at the various forum boards for ideas and to be honest they all appear to be very inefficient. Main things I will be concentrating on is caching in files where possible and only using the database when it is absolutely required.

The Google approach. Lots and lots and lots of cheap serversOriginally Posted by Daijoubu
![]()



Google runs entirely in memory too :P
Speed & scalability in mind...
If you find my reply helpful, fell free to give me a point
I'm in the core technical team for one of Europes largest bookmakers. I don't post to Sitepoint much, but I can't resist this opportunity.
I don't think enough developers consider the importance of good systems architecture. This goes hand-in-hand with good application design.
I come from a development background, but I decided to get into systems administration for this very reason. Here we run several high demand/high availability systems. Some of the applications we run are not designed especially well, but we can make up some performance in the live environment.
Moreover, I'm constantly complaining that the developers behind our software don't have enough knowledge of our environment - they are only capable of thinking in the "Java bubble", which results in imperfection.
Theres too much advice I could give you, frankly. But pick up some books on UNIX systems architecture and you'll immediately put yourself in the top 1% of developers.
Hope this helps.
Regards,
Andy.





Slightly off topic I know, but...
Consider for a moment if you implement the Composite View pattern, in which case, each Post that belongs to a specific thread could be a Composite, in that case you could cache individual Posts.Main things I will be concentrating on is caching in files where possible
You'd only ever then have to re-cache that given Post if for example, it has been edited (as apposed to cacheing the whole page, and then having to re-cache it again due to that given Post being edited, which is waste of resources), but that isn't the point is it? The point is the flexibility that the Composite gives to you, me and every other developer trying to make our lifes easier![]()


God himself couldn't run 11,000 users on vBulletin. The biggest problem, of course, is that you can't optimize ANYTHING with a cache. Every page view calls at least 4 or 5 evals. vB has plenty of other problems, of course, this is just the worst offender.Originally Posted by Daijoubu
I personally think sites as big as Gaia should be writing custom software, though, as phpBB is really not optimized (or designed...) well enough for that kind of scale.
The biggest problem you usually face is Slurp. Slurp is the most evil being on the planet. While googlebot will only hit you with a few dozen crawlers at any given moment, yahoo feels that it's perfectly acceptable to send upwards of 512 crawlers *simultaneously*. There's no web server that can serve requests that fast.

beautiful suggestions have been given. i wanna suggest u go thru book 2 of php anthology. check information on developmet techniques. it is a summary of best practices in web or application coding.



I doubt Gaia have much code left from phpBBOriginally Posted by Etnu
But you're right, vB really isn't that efficient as many fan boy thinks![]()
Speed & scalability in mind...
If you find my reply helpful, fell free to give me a point

Just finished reading the slides for "LiveJournal's Backend - A history Of Scaling" listed above. Really interesting reading.
What is most revealing about the discussion was that none of the advice or solutions had anything to do with the code, per se - at least it seems like they weren't trawling through the code doing anything other than obvious optimisations (I would imagine that things like DB query tuning & etc would be the first options when starting to scale a site).
The ~architecture~ was the most crucial factor in scaling out ... clustering, segmenting data, caching, proxies.
toby hede
Toby Hede’s Blog on Ruby, Rails, User Experience and Stuff
================================================
FiniteStateMachine - Software Development for Social Networks


Better code usually creates less need for massive solutions, though. Using vBulletin as an example (because it's one piece of software that I'm forced to deal with performance issues on a daily basis with), performance can be increased 4 or 5 times with some code changes (I run a patched version of vB that uses php files for the templates instead of eval -- it was a custom job, of course -- and throughput more than doubled for those web servers).Originally Posted by tobyhede
Ultimately, being able to scale is something that should be built into every system that you ever expect will be used by the public. Internal tools are typically ok to be less performance concious (focus on security & productivity in those areas), but public pages require more thought. Don't, for example, cache static pages in the same folders as your php scripts are served from (what happens if you start serving the php scripts from an NFS mount or something similiar? Do you *really* want to be performing writes over NFS?)

Hi,
A very long time ago, before Bill Gates and Windows came on the scene, I was programming in good old Dos, Clipper and using dBase I had a lecture on speeding up network programs. This may apply to Mysql and should be considered.
In a nutshell: When a record is deleted or added to a table then the whole table has to be re-shuffled. To get round this instead of actually deleting a record just change the search key value to something like ZZZ_001, ZZZ_002, etc
When adding a record first look to see if you have any ZZZ_??? records and if you do then just replace the old data and the new relevant search key.
If when adding a record and there are no ZZZ_??? records then instead of adding a single record, add umpteen all at the same time.
I would be interested in some feedback from a Mysql guru.
Cheers,
John_Betong
http://www.anetizer.com/index.php?joke=157
"All I Know About Computers I Learned From My Mum"

In my experience, the best way to optimise PHP is to do it in SQL. Very few web developers have any real background in database development and most that I come across shy away from complex SQL queries, prefering to do it in code because they feel in control that way. They end up running queries within loops instead of extracting all the data in 1 query. I'm sure you've all seen examples of this.
QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.
So the best piece of advice I can give you to optimise the existing code is to replace PHP with SQL whenever and wherever you can. SQL is a beautifull and powerfull language that few web programmers begin to exploit properly. If needs be, employ a database specialist to teach you how to do it.![]()




Good advice about the database, however if you link too many huge tables (I mean millions of records per table) on a very frequent basis then the database will die, even with indexes. Had past experience of this from my previous job and as such I no longer normalise a database to the same level I was taught at college/university. I try to use ENUM and SET (which I find very powerful) instead of link tables where the data rarely changes, and try to always preprocess counts rather than do anything on demand. I consider the database to be the weak point of any website these days.

Quite right too, Ticksoft, it's called 'domain knowledge' and is a perfectly acceptable thing to do - de-normalising your data because you know when and how it is to be used. They should have taught you that at uni as well as the basics.Originally Posted by ticksoft
The same applies to pre-processing: if the underlying data is static or only changes at known times, then storing summary data is the only intelligent thing to do.
Tools like ENUM and SET are engine specific, and again, exploiting the tools available is only sensible.
The 'weakness' of most backend databases is down to my point - that most web developers know little about database design and SQL. In any other production environment, a database specialist would automatically be included in the project team; but because it's the 'Web' it is not seen as neccessary, so you get 'graphic designers' trying to cope with highly technical and specialised issues. It is no wonder that they don't produce robust, reliable and efficient databases, they just do not have the training or experience to cope.


Big thanks goes out to everyone posting suggestions.
Yes, it is clear most of the things will have to be done on the hardware management side. All I wanted to get at here is to not do any stupid programming errors, like vBulletin running several eval()'s on every single page.
I know that the bottleneck often is the database. Queries in loops is not something I'd be uneducated enough to do. I'm currently aiming at around 5 queries per page and no more. I think that should be low enough to handle a good enough load.
since this is on optimization, does anyone know how much preg_match/preg_match_all will decrease the speed of your scripts in php5? i know its way better than using eregi(). and how many times can you use preg_match or preg_match_all in a script before it really starts to drag down the performance?





Xdebug will answer all these kinds of questions. (I bet you can barely measure the difference).


MySQL already does this by default (you can disable it). When you delete a key from an index, it leaves the node as null. running OPTIMIZE TABLE removes those null nodes.Originally Posted by John_Betong
Mostly true, but the opposite happens frequently as well. Example:They end up running queries within loops instead of extracting all the data in 1 query. I'm sure you've all seen examples of this.
SELECT * FROM table ORDER BY RAND() LIMIT 100;
(try that on a table with a million entries, then come and tell me it was a good idea).
Wrong. If I'm Memory bound, but not CPU bound, this:
QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.
is much more efficient than:PHP Code:$Start = 0;
$Inc = 1000;
while(true)
{
$result = $db->query('SELECT * FROM table LIMIT '.$Start.','.$Inc);
if($result->num_rows == 0)
{
break;
}
while($tmp = $result->fetch())
{
echo $tmp['whatever'];
}
$Start += $Inc;
}
on large datasets.PHP Code:$result = $db->query('SELECT * FROM table');
while($tmp = $result->fetch())
{
echo $tmp['whatever'];
}
NOT everything is more efficient to do in the database. PHP is a very fast language, and in many areas outperforms SQL by a fair margin (most notably in areas like string manipulation). The suggestion to just move things to SQL is shortsighted and misguided.
That being said, though, your heart's in the right place. I'm still baffled as to why I constently see applications that require 8 web servers and only 1 database server.



That's not PHP's fault, it's Apache's
And sometimes it's better to do the processing in PHP
For example, prevent file sort
If you can't manage the query optimizer to use your index to do the sorting, you're better off doing it in PHP
Sure you'll have to buffer the result set and use more memory but it's still less expensive than doing disk IO
Last edited by Daijoubu; Sep 1, 2005 at 20:26.
Speed & scalability in mind...
If you find my reply helpful, fell free to give me a point
Bookmarks