SitePoint Sponsor

User Tag List

Page 1 of 4 1234 LastLast
Results 1 to 25 of 98

Hybrid View

  1. #1
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Planning/Optimising code for speed

    I have a request from a client to clone a website from one country to another, however, this website is a freaking monster. It has 380,000 users and averages out at some 11,000 users online within a 15 minute range during daytime. Registration is invitation-only, and it still gets some odd 1000 new users a day - this isn't a global site, almost all the users are located in the same country, and only maximum of some 500km from each other.

    Basically, I need to know everything there is to know about making PHP code run as fast as possible. Hey, I know a lot about writing fast code, but this stuff has to go beyond fast, really.

    Can I even use classes in a project like this? What is the performance hit for using them? What are some of the most perfomance-draining functions that need avoiding? What is the best accelerator I can use? Can anyone point out some tutorials about this kind of thing?

  2. #2
    SitePoint Zealot
    Join Date
    Jul 2005
    Location
    Venlo, the Netherlands
    Posts
    141
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    tough job.

    some basic things, which you may already know:
    don't use more SQL resources then needed (no SELECT *, but use SELECT username, id, etc. instead)

    no nested SQL (so don't use a query in the for loop of another query)

    use proper indexes on your database tables

    in a loop, perform the count command first (otherwise the count command is execute each time the loop repeats itself):
    $iCnt = count($array);
    for($i = 0; $i > $iCnt; $i++){}

    include files only when needed

    etcetera

  3. #3
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Vennie
    tough job.

    some basic things, which you may already know:
    don't use more SQL resources then needed (no SELECT *, but use SELECT username, id, etc. instead)

    no nested SQL (so don't use a query in the for loop of another query)

    use proper indexes on your database tables

    in a loop, perform the count command first (otherwise the count command is execute each time the loop repeats itself):
    $iCnt = count($array);
    for($i = 0; $i > $iCnt; $i++){}

    include files only when needed

    etcetera
    Thanks, but as you said, I already knew that It's just good programming practices.

    One interesting resource I found with tips like that was lanzer's thread on phpBB.com about how he optimised Gaia Online: http://www.phpbb.com/phpBB/viewtopic.php?t=135383 Interesting tips like getting the content IDs first with a LIMIT query and then merely getting the content for the content IDs gathered, etc. It's a long read I need to sit through.

    Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.
    I have thought about this, but the original site does run on PHP, so the new should be able to as well.

  4. #4
    SitePoint Addict
    Join Date
    Apr 2004
    Location
    Melbourne
    Posts
    362
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Definitely make sure the joins you're using in your SQL are in the correct order; sometimes you might be joining two really really really large tables together before joining on a small table. Check to make sure you're not using cross joins unless you absolutely must. If possible, cache out some highly dynamic content.

  5. #5
    SitePoint Addict
    Join Date
    Jan 2005
    Location
    Ireland
    Posts
    349
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Maybe check out Zend Optimizer or an open source alternative (Turck MMCache). Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.

  6. #6
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Ryan Wray
    Maybe check out Zend Optimizer or an open source alternative (Turck MMCache). Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.
    Screw both of those, and use the PECL extension (APC). It's much more stable than Turck, and is free. It's maintained by Rasmus himself.

    Ultimately, premature optimization is the root of all evil.

    Here's what you do:

    1.) Build things logically. Build them modularly. Build them so that they can be refactored independently of one another.

    2.) Load test.

    3.) Optimize as needed.

    11,000 users is nothing. A single dual xeon (3 Ghz-ish, hyperthreaded) or dual opteron (1.8Ghz) can easily handle roughtly half that load without any serious optimization (from personal experience).

    Of course, hardware is cheap. I recently ordered 6 dual opteron 248 servers with 4 SATA drives and 4GB of ram for about $3200 each.

  7. #7
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Etnu
    Screw both of those, and use the PECL extension (APC). It's much more stable than Turck, and is free. It's maintained by Rasmus himself.

    Ultimately, premature optimization is the root of all evil.

    Here's what you do:

    1.) Build things logically. Build them modularly. Build them so that they can be refactored independently of one another.

    2.) Load test.

    3.) Optimize as needed.

    11,000 users is nothing. A single dual xeon (3 Ghz-ish, hyperthreaded) or dual opteron (1.8Ghz) can easily handle roughtly half that load without any serious optimization (from personal experience).

    Of course, hardware is cheap. I recently ordered 6 dual opteron 248 servers with 4 SATA drives and 4GB of ram for about $3200 each.
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    80+ web servers, 11 database servers (4 dedicated to forums), 3 session database servers, 1 memory cache server
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  8. #8
    SitePoint Zealot
    Join Date
    Apr 2003
    Location
    Connecticut
    Posts
    173
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Daijoubu
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    The Google approach. Lots and lots and lots of cheap servers

  9. #9
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Daijoubu
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    God himself couldn't run 11,000 users on vBulletin. The biggest problem, of course, is that you can't optimize ANYTHING with a cache. Every page view calls at least 4 or 5 evals. vB has plenty of other problems, of course, this is just the worst offender.

    I personally think sites as big as Gaia should be writing custom software, though, as phpBB is really not optimized (or designed...) well enough for that kind of scale.

    The biggest problem you usually face is Slurp. Slurp is the most evil being on the planet. While googlebot will only hit you with a few dozen crawlers at any given moment, yahoo feels that it's perfectly acceptable to send upwards of 512 crawlers *simultaneously*. There's no web server that can serve requests that fast.

  10. #10
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Etnu
    God himself couldn't run 11,000 users on vBulletin. The biggest problem, of course, is that you can't optimize ANYTHING with a cache. Every page view calls at least 4 or 5 evals. vB has plenty of other problems, of course, this is just the worst offender.

    I personally think sites as big as Gaia should be writing custom software, though, as phpBB is really not optimized (or designed...) well enough for that kind of scale.

    The biggest problem you usually face is Slurp. Slurp is the most evil being on the planet. While googlebot will only hit you with a few dozen crawlers at any given moment, yahoo feels that it's perfectly acceptable to send upwards of 512 crawlers *simultaneously*. There's no web server that can serve requests that fast.
    I doubt Gaia have much code left from phpBB

    But you're right, vB really isn't that efficient as many fan boy thinks
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  11. #11
    SitePoint Zealot
    Join Date
    Feb 2005
    Location
    UK
    Posts
    121
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In my experience, the best way to optimise PHP is to do it in SQL. Very few web developers have any real background in database development and most that I come across shy away from complex SQL queries, prefering to do it in code because they feel in control that way. They end up running queries within loops instead of extracting all the data in 1 query. I'm sure you've all seen examples of this.

    QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.

    So the best piece of advice I can give you to optimise the existing code is to replace PHP with SQL whenever and wherever you can. SQL is a beautifull and powerfull language that few web programmers begin to exploit properly. If needs be, employ a database specialist to teach you how to do it.

  12. #12
    SitePoint Evangelist
    Join Date
    Mar 2005
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Roger Ramjet
    QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.
    I've read quite a few different people say this, so i was wondering if the following qualifies as a query inside a loop.

    Say i have a class, clsPictureFinder, that has a static method called GetAllPics which returns an array containing all the picture database ID's on a certain table. I defer the object creation of an individual Picture object until i'm actually looping through them, by passing the pictureID to the constructor, which is in turn triggering another query, to populate the objects properties- a query inside a loop? eg:
    PHP Code:
    $arrPicIDs = array();
    $arrPicIDs clsPictureFinder::GetAllPics();
    foreach(
    $arrPicIDs as $key=>$value)
    {
       
    $objPicture = &new clsPicture($value);
       echo 
    $objPicture->getFilename() . '<br />';
       echo 
    $objPicture->getOwner() . '<br />';
       echo 
    $objPicture->showThumbnail();
       echo 
    '<hr />';
       unset(
    $objPicture);

    Is this bad practice?

  13. #13
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by skinny monkey
    Is this bad practice?
    No, the bad practice would be:
    PHP Code:
    foreach($pic->getCategories() as $cat) {
      
    $pic->getPicsByCat($cat); //query goes on in here
      //...

    If you are going to hit the db for all categories anyway, restructure you code to only hit the db one time and use the info as appropriate when you need it.

    As with all advise, season to taste. I have an application where the inner queries run quickly, and the code was easier to write with 13 queries as opposed to one query with fancy caching and looping, so I wrote it with the multiple queries in a loop structure. I pumped the code out quicker and the users are fine with the application performance.

    The important thing is to be aware of the impact of your design choises on the application and the server.
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  14. #14
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wrong premise, if you don't mind me saying. Our poor brains are not so good at figuring out where the bottlenecks really are. You are better off writing your code for maintainablility, and then running it through a code profiler to see where you can isolate the real performance bottlenecks, and then tackle them.

    Remember: "premature optimization is the root of all evil"
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  15. #15
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    Wrong premise, if you don't mind me saying. Our poor brains are not so good at figuring out where the bottlenecks really are. You are better off writing your code for maintainablility, and then running it through a code profiler to see where you can isolate the real performance bottlenecks, and then tackle them.

    Remember: "premature optimization is the root of all evil"
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.

    And secondly, plugging at the bottlenecks is nice, but what do you do when literally all of your code is one big bottleneck? I'm looking at a million page views a day. I do believe every milisecond gained from a little tip will help. Then again, I've never tackled a project of this size.

    I'm not too concerned about maintainability. The application frontend, which needs to be optimised, is only planned to be like 100 KBs large (before stripping out whitespace and comments). I think that's pretty maintainable irrelevant of how I write code.

    On a high-traffic site such as this, caching is key to performance, no matter the design or language.
    Caching of pages, or cahcing of [semi-]compiled code? Caching of pages themselves would be impossible on this sort of a site What sort of caching mechanisms could I use?

  16. #16
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.
    I have used xdebug, and I have seen presentations on using APC also.

    It nearly always fingers I/O (as you would expect) but sometime there are some supprising results. Particularly if you pay attention to count you can see where perhaps you have something in a loop where it should not be, etc.

  17. #17
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.
    I use xdebug and am happy with it.

    Quote Originally Posted by Edman
    And secondly, plugging at the bottlenecks is nice, but what do you do when literally all of your code is one big bottleneck? I'm looking at a million page views a day. I do believe every milisecond gained from a little tip will help. Then again, I've never tackled a project of this size.
    You start caching parts of the page, or all of it if you can. Almost all of my sites are highly dynamic, but usually most pages contain numerous "regions" which are cached. If it's not dynamic (menus etc), pregenerate it.

    Quote Originally Posted by Edman
    Caching of pages, or cahcing of [semi-]compiled code? Caching of pages themselves would be impossible on this sort of a site What sort of caching mechanisms could I use?
    Both. I am sure you could cache smaller things on the pages, i.e not everything but perhaps give say your "5 latest posts" a 10 second time-to-live or whatever (which can make quite a difference once the numbers add up).

    Personally, I would definitely use eAccelerator (with it's shm-caching), but perhaps more importantly, memcache. On one of my sites I get around 300-400k views a day, and for me, eAccelerator seems very stable on it.

  18. #18
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    USA
    Posts
    1,407
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    code profiler
    What are some good profilers and how do you use them?

  19. #19
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    On a high-traffic site such as this, caching is key to performance, no matter the design or language.

  20. #20
    SitePoint Guru dbevfat's Avatar
    Join Date
    Dec 2004
    Location
    ljubljana, slovenia
    Posts
    684
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you can't cache contents, at least you could cache the "half-baked" (= compiled into op-code) scripts. As suggested, Zend Optimizer does that, so does Turck MM Cache and it's younger brother (which I use quite a lot) eAccelerator.

  21. #21
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have used xdebug, and I have seen presentations on using APC also.

    It nearly always fingers I/O (as you would expect) but sometime there are some supprising results. Particularly if you pay attention to count you can see where perhaps you have something in a loop where it should not be, etc.
    I'll give xdebug a shot, right now I'm a bit baffled as to what to do with it, but I'll get around to it

    You start caching parts of the page, or all of it if you can. Almost all of my sites are highly dynamic, but usually most pages contain numerous "regions" which are cached. If it's not dynamic (menus etc), pregenerate it.
    I already do this on all of my applications, that bit's quite common sense.

    Both. I am sure you could cache smaller things on the pages, i.e not everything but perhaps give say your "5 latest posts" a 10 second time-to-live or whatever (which can make quite a difference once the numbers add up).
    Never thought of doing that! Always seemed like a waste. Will try

    Personally, I would definitely use eAccelerator (with it's shm-caching), but perhaps more importantly, memcache. On one of my sites I get around 300-400k views a day, and for me, eAccelerator seems very stable on it.
    I'll take a look through both of those! Thanks!

  22. #22
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    I'll give xdebug a shot, right now I'm a bit baffled as to what to do with it, but I'll get around to it
    Derick Rethans is the author of xdebug. You might try reviewing some of his performance talks for ideas on how to use xdebug as a profiler:

    http://www.derickrethans.nl/talks.php

  23. #23
    SitePoint Wizard
    Join Date
    Jul 2004
    Location
    Minneapolis, MN
    Posts
    1,924
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Lol, you could try using single quotes. It is said to help and I find it looks neater.

    You seem to already cache pages, optimize your SQL, etc. I think the only way that you're going to get your system any faster is a) remain procedural, it is faster and b) find the most effective way to retrieve your content.

    You might also look into making a better search tool (optimize it) as it can be a bandwidth eater if you haven't programmed it correctly.

  24. #24
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by charmedlover
    Lol, you could try using single quotes. It is said to help and I find it looks neater.
    Yep, this one is sure to solve all your problems..
    Quote Originally Posted by charmedlover
    You seem to already cache pages, optimize your SQL, etc. I think the only way that you're going to get your system any faster is a) remain procedural, it is faster and b) find the most effective way to retrieve your content.
    I wouldn't worry about "remaining procedual". Things like these make minimal impact and would really be negligable anyway by precompiling the scripts.. (If you care this much about marginal benefits you really shouldn't be using PHP in the first place...). Just avoid anything O(n^2) etc (atleast uncached!). Database interaction is usually a huge performancehit, and memcached can help you there.

  25. #25
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by charmedlover
    a) remain procedural, it is faster and b) find the most effective way to retrieve your content.
    Part (b) is the crux of the problem. Part (a) is the single worst piece of advice you could hear right now.

    If the content is fairly simple, then about a third of the time will be the page loading and parsing by PHP. This can be pretty much eliminated with the Zend accelerator and you will cut down on the file system I/O as well. This is a no brainer.

    Of the remaining 70% I bet 50%+ is the DB. I would consider a Mysql support contract if you don't have one already. You will have to shuffle your code around this way and that to meet the demands of the DB. This could range up to replication to a full on cluster. Most likely you just need some query optimisation of some key parts. While you play around with configurations, you wil be glad you have clean maintainable code.

    Next you need to cache any data. You want it in RAM (or RAM disk) and probably for each web server. This will be a massive saving, but you will have to rewrite parts of the app. to make the best of this.

    The reason that you want to spend your time here is that PHP bottlenecks do not exist. Your DB servers will be expensive fast SCSI disk array multiprocessor dudes. You web servers can be simple blade or 1U (twin CPU) servers with a bit of RAM. A decent load balancer/firewall will cost you more than the web server, so just add another web server. It's not worth fretting the development cost when web servers are less than $200 per month.

    Make sure the backplane ethernet is 100Mbit per second full-duplex. It's amazing how rarely ISPs bother to set this up properly.

    If all of this is not enough then you will have to look at the structure of the pages. Cache everything (the decorator pattern will be your friend). Use Javascript to push some of the work onto the browser, especially form validation and tabular display changes. Writing a decent table widget is a good use of your time. Use PHP as the template engine if you are comfortable with that, taht way the accelerator is also your template cache.

    Sessions are going to be...er...interesting. If the load balancer can screen on session ID then you can just use a RAM disk on the web server. If you must use Mysql then use a separate server from your business data unless you are already clustering.

    At all stages remember to profile the whole app. Set up a simulation script that hammers the server in a realistic way. There are tools to replay traffic from Apache boxes. Use the "ab" tool a lot.

    Never twiddle with code for the sake of a few microseconds. That's small minded thinking from amateur sites running on shared hosts. These problems are big, so think big. If a change doesn't double the speed of the app. then don't make it. Buy another couple of gig of RAM instead. Development costs dwarf hardware costs, and opportunity cost from time to market dwarf both.

    Finaly, don't be afraid to ask the ISP. They have a lot of experience in these things.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •