SitePoint Sponsor

User Tag List

Page 1 of 4 1234 LastLast
Results 1 to 25 of 98
  1. #1
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Planning/Optimising code for speed

    I have a request from a client to clone a website from one country to another, however, this website is a freaking monster. It has 380,000 users and averages out at some 11,000 users online within a 15 minute range during daytime. Registration is invitation-only, and it still gets some odd 1000 new users a day - this isn't a global site, almost all the users are located in the same country, and only maximum of some 500km from each other.

    Basically, I need to know everything there is to know about making PHP code run as fast as possible. Hey, I know a lot about writing fast code, but this stuff has to go beyond fast, really.

    Can I even use classes in a project like this? What is the performance hit for using them? What are some of the most perfomance-draining functions that need avoiding? What is the best accelerator I can use? Can anyone point out some tutorials about this kind of thing?

  2. #2
    SitePoint Zealot
    Join Date
    Jul 2005
    Location
    Venlo, the Netherlands
    Posts
    141
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    tough job.

    some basic things, which you may already know:
    don't use more SQL resources then needed (no SELECT *, but use SELECT username, id, etc. instead)

    no nested SQL (so don't use a query in the for loop of another query)

    use proper indexes on your database tables

    in a loop, perform the count command first (otherwise the count command is execute each time the loop repeats itself):
    $iCnt = count($array);
    for($i = 0; $i > $iCnt; $i++){}

    include files only when needed

    etcetera

  3. #3
    SitePoint Addict
    Join Date
    Jan 2005
    Location
    Ireland
    Posts
    349
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Maybe check out Zend Optimizer or an open source alternative (Turck MMCache). Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.

  4. #4
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Vennie
    tough job.

    some basic things, which you may already know:
    don't use more SQL resources then needed (no SELECT *, but use SELECT username, id, etc. instead)

    no nested SQL (so don't use a query in the for loop of another query)

    use proper indexes on your database tables

    in a loop, perform the count command first (otherwise the count command is execute each time the loop repeats itself):
    $iCnt = count($array);
    for($i = 0; $i > $iCnt; $i++){}

    include files only when needed

    etcetera
    Thanks, but as you said, I already knew that It's just good programming practices.

    One interesting resource I found with tips like that was lanzer's thread on phpBB.com about how he optimised Gaia Online: http://www.phpbb.com/phpBB/viewtopic.php?t=135383 Interesting tips like getting the content IDs first with a LIMIT query and then merely getting the content for the content IDs gathered, etc. It's a long read I need to sit through.

    Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.
    I have thought about this, but the original site does run on PHP, so the new should be able to as well.

  5. #5
    SitePoint Addict
    Join Date
    Apr 2004
    Location
    Melbourne
    Posts
    362
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Definitely make sure the joins you're using in your SQL are in the correct order; sometimes you might be joining two really really really large tables together before joining on a small table. Check to make sure you're not using cross joins unless you absolutely must. If possible, cache out some highly dynamic content.

  6. #6
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wrong premise, if you don't mind me saying. Our poor brains are not so good at figuring out where the bottlenecks really are. You are better off writing your code for maintainablility, and then running it through a code profiler to see where you can isolate the real performance bottlenecks, and then tackle them.

    Remember: "premature optimization is the root of all evil"
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  7. #7
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    On a high-traffic site such as this, caching is key to performance, no matter the design or language.

  8. #8
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    Wrong premise, if you don't mind me saying. Our poor brains are not so good at figuring out where the bottlenecks really are. You are better off writing your code for maintainablility, and then running it through a code profiler to see where you can isolate the real performance bottlenecks, and then tackle them.

    Remember: "premature optimization is the root of all evil"
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.

    And secondly, plugging at the bottlenecks is nice, but what do you do when literally all of your code is one big bottleneck? I'm looking at a million page views a day. I do believe every milisecond gained from a little tip will help. Then again, I've never tackled a project of this size.

    I'm not too concerned about maintainability. The application frontend, which needs to be optimised, is only planned to be like 100 KBs large (before stripping out whitespace and comments). I think that's pretty maintainable irrelevant of how I write code.

    On a high-traffic site such as this, caching is key to performance, no matter the design or language.
    Caching of pages, or cahcing of [semi-]compiled code? Caching of pages themselves would be impossible on this sort of a site What sort of caching mechanisms could I use?

  9. #9
    SitePoint Guru dbevfat's Avatar
    Join Date
    Dec 2004
    Location
    ljubljana, slovenia
    Posts
    684
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you can't cache contents, at least you could cache the "half-baked" (= compiled into op-code) scripts. As suggested, Zend Optimizer does that, so does Turck MM Cache and it's younger brother (which I use quite a lot) eAccelerator.

  10. #10
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.
    I have used xdebug, and I have seen presentations on using APC also.

    It nearly always fingers I/O (as you would expect) but sometime there are some supprising results. Particularly if you pay attention to count you can see where perhaps you have something in a loop where it should not be, etc.

  11. #11
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    Interesting thought. Could you recommend me a good code profiler that would show bottlenecks like that? I use some of that stuff now, but it's turning up weird results.
    I use xdebug and am happy with it.

    Quote Originally Posted by Edman
    And secondly, plugging at the bottlenecks is nice, but what do you do when literally all of your code is one big bottleneck? I'm looking at a million page views a day. I do believe every milisecond gained from a little tip will help. Then again, I've never tackled a project of this size.
    You start caching parts of the page, or all of it if you can. Almost all of my sites are highly dynamic, but usually most pages contain numerous "regions" which are cached. If it's not dynamic (menus etc), pregenerate it.

    Quote Originally Posted by Edman
    Caching of pages, or cahcing of [semi-]compiled code? Caching of pages themselves would be impossible on this sort of a site What sort of caching mechanisms could I use?
    Both. I am sure you could cache smaller things on the pages, i.e not everything but perhaps give say your "5 latest posts" a 10 second time-to-live or whatever (which can make quite a difference once the numbers add up).

    Personally, I would definitely use eAccelerator (with it's shm-caching), but perhaps more importantly, memcache. On one of my sites I get around 300-400k views a day, and for me, eAccelerator seems very stable on it.

  12. #12
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have used xdebug, and I have seen presentations on using APC also.

    It nearly always fingers I/O (as you would expect) but sometime there are some supprising results. Particularly if you pay attention to count you can see where perhaps you have something in a loop where it should not be, etc.
    I'll give xdebug a shot, right now I'm a bit baffled as to what to do with it, but I'll get around to it

    You start caching parts of the page, or all of it if you can. Almost all of my sites are highly dynamic, but usually most pages contain numerous "regions" which are cached. If it's not dynamic (menus etc), pregenerate it.
    I already do this on all of my applications, that bit's quite common sense.

    Both. I am sure you could cache smaller things on the pages, i.e not everything but perhaps give say your "5 latest posts" a 10 second time-to-live or whatever (which can make quite a difference once the numbers add up).
    Never thought of doing that! Always seemed like a waste. Will try

    Personally, I would definitely use eAccelerator (with it's shm-caching), but perhaps more importantly, memcache. On one of my sites I get around 300-400k views a day, and for me, eAccelerator seems very stable on it.
    I'll take a look through both of those! Thanks!

  13. #13
    SitePoint Wizard
    Join Date
    Jul 2004
    Location
    Minneapolis, MN
    Posts
    1,924
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Lol, you could try using single quotes. It is said to help and I find it looks neater.

    You seem to already cache pages, optimize your SQL, etc. I think the only way that you're going to get your system any faster is a) remain procedural, it is faster and b) find the most effective way to retrieve your content.

    You might also look into making a better search tool (optimize it) as it can be a bandwidth eater if you haven't programmed it correctly.

  14. #14
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    I'll give xdebug a shot, right now I'm a bit baffled as to what to do with it, but I'll get around to it
    Derick Rethans is the author of xdebug. You might try reviewing some of his performance talks for ideas on how to use xdebug as a profiler:

    http://www.derickrethans.nl/talks.php

  15. #15
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Lol, you could try using single quotes. It is said to help and I find it looks neater.
    I already do I hate hate hate double quotes. Hell, I even sometimes catch myself doing stuff like chr(10); instead of "\n", it's automatic, I don't really think about it anymore.

    a) remain procedural, it is faster
    Yes, I was thinking about that.

    b) find the most effective way to retrieve your content
    Probably the most difficult bit, I guess I'll have to pair it with that 10-sec caching idea.

    You might also look into making a better search tool (optimize it) as it can be a bandwidth eater if you haven't programmed it correctly.
    I only need the search to function in searching usernames - thank God for that, doing stuff like searching a forum with just 5 million posts is near impossible :S

    Derick Rethans is the author of xdebug. You might try reviewing some of his performance talks for ideas on how to use xdebug as a profiler:

    http://www.derickrethans.nl/talks.php
    Great, thanks I'm also looking through his "Speed up PHP" talk!

  16. #16
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHPEdit use DBG for profiling, but personally, I never got it working

    You may also want to ditch Apache for a lightweight httpd such as litespeed or lighttpd

    Usually, the database is the bottleneck, so don't waste 80% of your time in 20% of the code
    Make sure MySQL don't do file sort or use temporary table and that everything is indexed correctly (and that MySQL use them! If you creat useless indexes, it will only slow down INSERTs and add to the DB size)

    eAccelerator is recommended to MMCache since it's no longer develloped
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  17. #17
    SitePoint Enthusiast
    Join Date
    Oct 2003
    Location
    norway
    Posts
    92
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by charmedlover
    Lol, you could try using single quotes. It is said to help and I find it looks neater.
    Yep, this one is sure to solve all your problems..
    Quote Originally Posted by charmedlover
    You seem to already cache pages, optimize your SQL, etc. I think the only way that you're going to get your system any faster is a) remain procedural, it is faster and b) find the most effective way to retrieve your content.
    I wouldn't worry about "remaining procedual". Things like these make minimal impact and would really be negligable anyway by precompiling the scripts.. (If you care this much about marginal benefits you really shouldn't be using PHP in the first place...). Just avoid anything O(n^2) etc (atleast uncached!). Database interaction is usually a huge performancehit, and memcached can help you there.

  18. #18
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by charmedlover
    a) remain procedural, it is faster and b) find the most effective way to retrieve your content.
    Part (b) is the crux of the problem. Part (a) is the single worst piece of advice you could hear right now.

    If the content is fairly simple, then about a third of the time will be the page loading and parsing by PHP. This can be pretty much eliminated with the Zend accelerator and you will cut down on the file system I/O as well. This is a no brainer.

    Of the remaining 70% I bet 50%+ is the DB. I would consider a Mysql support contract if you don't have one already. You will have to shuffle your code around this way and that to meet the demands of the DB. This could range up to replication to a full on cluster. Most likely you just need some query optimisation of some key parts. While you play around with configurations, you wil be glad you have clean maintainable code.

    Next you need to cache any data. You want it in RAM (or RAM disk) and probably for each web server. This will be a massive saving, but you will have to rewrite parts of the app. to make the best of this.

    The reason that you want to spend your time here is that PHP bottlenecks do not exist. Your DB servers will be expensive fast SCSI disk array multiprocessor dudes. You web servers can be simple blade or 1U (twin CPU) servers with a bit of RAM. A decent load balancer/firewall will cost you more than the web server, so just add another web server. It's not worth fretting the development cost when web servers are less than $200 per month.

    Make sure the backplane ethernet is 100Mbit per second full-duplex. It's amazing how rarely ISPs bother to set this up properly.

    If all of this is not enough then you will have to look at the structure of the pages. Cache everything (the decorator pattern will be your friend). Use Javascript to push some of the work onto the browser, especially form validation and tabular display changes. Writing a decent table widget is a good use of your time. Use PHP as the template engine if you are comfortable with that, taht way the accelerator is also your template cache.

    Sessions are going to be...er...interesting. If the load balancer can screen on session ID then you can just use a RAM disk on the web server. If you must use Mysql then use a separate server from your business data unless you are already clustering.

    At all stages remember to profile the whole app. Set up a simulation script that hammers the server in a realistic way. There are tools to replay traffic from Apache boxes. Use the "ab" tool a lot.

    Never twiddle with code for the sake of a few microseconds. That's small minded thinking from amateur sites running on shared hosts. These problems are big, so think big. If a change doesn't double the speed of the app. then don't make it. Buy another couple of gig of RAM instead. Development costs dwarf hardware costs, and opportunity cost from time to market dwarf both.

    Finaly, don't be afraid to ask the ISP. They have a lot of experience in these things.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  19. #19
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent advice Marcus, thank you very much. Couple of questions:

    Next you need to cache any data. You want it in RAM (or RAM disk) and probably for each web server. This will be a massive saving, but you will have to rewrite parts of the app. to make the best of this.
    The question I'm going to ask here is... how? This is a bit over my head. Do I just use caching tools like the ones mentioned above?

    At all stages remember to profile the whole app. Set up a simulation script that hammers the server in a realistic way. There are tools to replay traffic from Apache boxes. Use the "ab" tool a lot.
    Any suggestions on which tools allow doing this most effectively?

    Use PHP as the template engine if you are comfortable with that, taht way the accelerator is also your template cache.
    Already do I built a template engine of my own that allows easy editing of templates, but compiles them into PHP code, leaving all the complicated parsing to the Admin CP, and then those files can be then included. Probably the best way to go about this, I think?

    Sessions are going to be...er...interesting. If the load balancer can screen on session ID then you can just use a RAM disk on the web server. If you must use Mysql then use a separate server from your business data unless you are already clustering.
    I killed sessions completely on the site, and I was happy I could do this. People will either have to use cookies, or just suck it up. 99 something percent all use cookies anyway. There is no real need for session data.

    Never twiddle with code for the sake of a few microseconds. That's small minded thinking from amateur sites running on shared hosts. These problems are big, so think big. If a change doesn't double the speed of the app. then don't make it. Buy another couple of gig of RAM instead. Development costs dwarf hardware costs, and opportunity cost from time to market dwarf both.
    The only real cost here is time. Not development though, a few sticks with a couple of gig of RAM will buy me a programmer for a full month in this part of the world.

    You see, right now that site, with it's 10,000 users on right now, is loading in some odd 5 seconds (!). These people have earned $2 million last year, I'm sure they can afford new hardware to stick on there again and again. Yet their load times still blow. There's gotta be something wrong with their application, and I'm just looking to not to crawl into that mess myself from the start.

  20. #20
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by charmedlover
    I think the only way that you're going to get your system any faster is a) remain procedural, it is faster
    The performance difference between method calls and function calls is negligible. It is the Don't Repeat Yourself (DRY) principle that enables the caching and algorithmic optimizations that can really make a difference. If you do not have duplicate code, it is much easier to identify bottle neck code and you only have to replace it with an improved version in one place. Procedural code is typically full of duplication and much less optimizable.

  21. #21
    SitePoint Wizard
    Join Date
    Aug 2004
    Location
    California
    Posts
    1,672
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Edman
    I have a request from a client to clone a website from one country to another, however, this website is a freaking monster. It has 380,000 users and averages out at some 11,000 users online within a 15 minute range during daytime. Registration is invitation-only, and it still gets some odd 1000 new users a day - this isn't a global site, almost all the users are located in the same country, and only maximum of some 500km from each other.

    Basically, I need to know everything there is to know about making PHP code run as fast as possible. Hey, I know a lot about writing fast code, but this stuff has to go beyond fast, really.

    Can I even use classes in a project like this? What is the performance hit for using them? What are some of the most perfomance-draining functions that need avoiding? What is the best accelerator I can use? Can anyone point out some tutorials about this kind of thing?
    I think lastcraft, sweatje, selkirk, et al. are giving you the best advice. There is lots of sage advice that you should heed. But the bottom line is that you don't have any performance problems yet. So why are you worried about them.

    380,000 users, even with two or three times that many user records is just not that big a database. If you said a couple million my ears might perk up. "11,000 users online within a 15 minute range" does not describe a load, as the number of requests and amount of data transfered could be a wide range for those numbers.

    I would be more concerned about being out of my league design wise than about optimization.
    Christopher

  22. #22
    SitePoint Zealot
    Join Date
    Mar 2004
    Location
    Australia
    Posts
    101
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Getting good hardware setup would be the quickest way to increase capacity. Separating static contents (images, js, css) onto a lightweight static httpd server helps out the servers the run php. Optimize the httpd server for the hardware settings.

    For sessions, memcache (the author used it for livejournal) servers can be considered, it can also be used for caching other data.

    Optimize the database settings for the given hardware, index carefully. Consider caching some queries for a short life span.

    If you don't need httpd access logs, turn them off, or may be consider logging to separate servers. Get the OS running as effecient as possible.

    Only touching the code as a last resort in the optimization process, because it will take a great deal of effort to obtain a marginal gain.

  23. #23
    SitePoint Zealot
    Join Date
    Apr 2003
    Location
    Connecticut
    Posts
    173
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another thing that you might overlook is the frontend coding. I remember reading someone who did a complete rewrite of Slashdot's frontend, who said that the result would use 1/10th of the bandwidth they originally had.

    It may not make the work easier for the servers, but if you can cut out half of your bandwidth costs, the same money can go into more servers.

  24. #24
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by arborint
    I think lastcraft, sweatje, selkirk, et al. are giving you the best advice. There is lots of sage advice that you should heed. But the bottom line is that you don't have any performance problems yet. So why are you worried about them.

    380,000 users, even with two or three times that many user records is just not that big a database. If you said a couple million my ears might perk up. "11,000 users online within a 15 minute range" does not describe a load, as the number of requests and amount of data transfered could be a wide range for those numbers.

    I would be more concerned about being out of my league design wise than about optimization.
    I'm worried about it, because I like to plan ahead to avoid the most obvious of mistakes. A large part of the reason the current site is having such load difficulties is because they were caught unprepared.

    The design follows KISS principles - it only has a logo, and a few coloured divs with a white background and black text. This site's meant for people who have never used the Internet before, it's a national phenomena.

    But yeah, I'm not up for wasting too much time thinking about this one until the problem is actually seriously nearing upon my back.

    Thanks for all the replies from everyone. I'm sure every bit helps.


    EDIT: And oh my, I got a thread on the front page, I feel like a celebrity

  25. #25
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Ryan Wray
    Maybe check out Zend Optimizer or an open source alternative (Turck MMCache). Also, don't eliminate the possibility that PHP may be the wrong langauge for the job. If it is inappropiate to use it, don't.
    Screw both of those, and use the PECL extension (APC). It's much more stable than Turck, and is free. It's maintained by Rasmus himself.

    Ultimately, premature optimization is the root of all evil.

    Here's what you do:

    1.) Build things logically. Build them modularly. Build them so that they can be refactored independently of one another.

    2.) Load test.

    3.) Optimize as needed.

    11,000 users is nothing. A single dual xeon (3 Ghz-ish, hyperthreaded) or dual opteron (1.8Ghz) can easily handle roughtly half that load without any serious optimization (from personal experience).

    Of course, hardware is cheap. I recently ordered 6 dual opteron 248 servers with 4 SATA drives and 4GB of ram for about $3200 each.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •