SitePoint Sponsor

User Tag List

Page 2 of 4 FirstFirst 1234 LastLast
Results 26 to 50 of 98
  1. #26
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dotDan
    Another thing that you might overlook is the frontend coding. I remember reading someone who did a complete rewrite of Slashdot's frontend, who said that the result would use 1/10th of the bandwidth they originally had.

    It may not make the work easier for the servers, but if you can cut out half of your bandwidth costs, the same money can go into more servers.
    Requests finish faster = fewer threads being spawned = better performance.

    Keep your pages lightweight. If you arent familiar with CSS already, become that way.

  2. #27
    SitePoint Zealot
    Join Date
    Jun 2003
    Location
    Elsewhere
    Posts
    107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Lightbulb

    These links may be helpful:

    LiveJournal's Backend - A history Of Scaling (pdf)

    phpBB tweaks for large forums

    Accelerating PHP Code Performance for Oracle

    As far as PHP tweaking is concerned: there's generally not much speed improvement to be gained by messing in the PHP code, unless the original code uses inefficient algorithms. If that's not the case, then using an opcode cache and some sort of data caching mechanism are about as much optimization as you can hope to add. But on really large sites, you're going to run into a number of non PHP-related performance issues. Read the presentation about LiveJournal's servers to get some idea of what you may be facing in the future.

  3. #28
    SitePoint Guru
    Join Date
    May 2005
    Location
    Finland
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dotDan
    Another thing that you might overlook is the frontend coding. I remember reading someone who did a complete rewrite of Slashdot's frontend, who said that the result would use 1/10th of the bandwidth they originally had.
    That's a classic example. A List Apart: Retooling Slashdot with Web Standards

  4. #29
    SitePoint Guru
    Join Date
    Aug 2005
    Posts
    986
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Already do I built a template engine of my own that allows easy editing of templates, but compiles them into PHP code, leaving all the complicated parsing to the Admin CP, and then those files can be then included. Probably the best way to go about this, I think?
    Loops are the bottlenecks.

    You should use pull-templates instead of push-templates.

    With pull-templates you have only one loop (in the template), and with push, you have one in your database-fetch-code, AND one in the template.

    push:

    PHP Code:
    while($dat mysql_fetch_assoc(...))
    {
    $messages[] = $dat['msg'];
    }
    $tpl->set('messages'$messages); 
    HTML Code:
    <ul>
    <?php foreach($messages as $message){ ?>
    <li><?=$message?></li>
    <?php } ?>
    </ul>
    pull:

    PHP Code:
    $tpl->set('messages', new MessageRetriever()); 
    PHP Code:
    class MessageRetriever
    {
    function 
    next()
    {
    $dat mysql_fetch_assoc(...);
    return 
    $dat['msg'];
    }

    HTML Code:
    <ul>
    <?php while($msg = $messages->next()){ ?>
    <li><?=$msg?></li>
    <?php } ?>
    </ul>
    You could also do this without OOP. It just saves you 1/2 of your loops.

  5. #30
    SitePoint Guru
    Join Date
    May 2005
    Location
    Finland
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Fenrir2
    With pull-templates you have only one loop (in the template), and with push, you have one in your database-fetch-code, AND one in the template.
    I'd just like to note that using proper abstraction those two can look identical from the template's pov. Gotta love PHP5.

  6. #31
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Etnu
    Screw both of those, and use the PECL extension (APC). It's much more stable than Turck, and is free. It's maintained by Rasmus himself.

    Ultimately, premature optimization is the root of all evil.

    Here's what you do:

    1.) Build things logically. Build them modularly. Build them so that they can be refactored independently of one another.

    2.) Load test.

    3.) Optimize as needed.

    11,000 users is nothing. A single dual xeon (3 Ghz-ish, hyperthreaded) or dual opteron (1.8Ghz) can easily handle roughtly half that load without any serious optimization (from personal experience).

    Of course, hardware is cheap. I recently ordered 6 dual opteron 248 servers with 4 SATA drives and 4GB of ram for about $3200 each.
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    80+ web servers, 11 database servers (4 dedicated to forums), 3 session database servers, 1 memory cache server
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  7. #32
    SitePoint Guru
    Join Date
    Jun 2002
    Posts
    617
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    At the moment I am writing a replacement to my own system with scalability in mind. I've looked at the various forum boards for ideas and to be honest they all appear to be very inefficient. Main things I will be concentrating on is caching in files where possible and only using the database when it is absolutely required.

  8. #33
    SitePoint Zealot
    Join Date
    Apr 2003
    Location
    Connecticut
    Posts
    173
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Daijoubu
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    The Google approach. Lots and lots and lots of cheap servers

  9. #34
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Google runs entirely in memory too :P
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  10. #35
    SitePoint Member
    Join Date
    Apr 2004
    Location
    UK
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm in the core technical team for one of Europes largest bookmakers. I don't post to Sitepoint much, but I can't resist this opportunity.

    I don't think enough developers consider the importance of good systems architecture. This goes hand-in-hand with good application design.

    I come from a development background, but I decided to get into systems administration for this very reason. Here we run several high demand/high availability systems. Some of the applications we run are not designed especially well, but we can make up some performance in the live environment.

    Moreover, I'm constantly complaining that the developers behind our software don't have enough knowledge of our environment - they are only capable of thinking in the "Java bubble", which results in imperfection.

    Theres too much advice I could give you, frankly. But pick up some books on UNIX systems architecture and you'll immediately put yourself in the top 1% of developers.

    Hope this helps.

    Regards,
    Andy.

  11. #36
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Slightly off topic I know, but...

    Main things I will be concentrating on is caching in files where possible
    Consider for a moment if you implement the Composite View pattern, in which case, each Post that belongs to a specific thread could be a Composite, in that case you could cache individual Posts.

    You'd only ever then have to re-cache that given Post if for example, it has been edited (as apposed to cacheing the whole page, and then having to re-cache it again due to that given Post being edited, which is waste of resources), but that isn't the point is it? The point is the flexibility that the Composite gives to you, me and every other developer trying to make our lifes easier

  12. #37
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Daijoubu
    I've never seen such setup that runs vB or IPB
    Just glance at the hardware page on big-boards...
    Unless it's all static html :P

    The funniest is Gaia...
    God himself couldn't run 11,000 users on vBulletin. The biggest problem, of course, is that you can't optimize ANYTHING with a cache. Every page view calls at least 4 or 5 evals. vB has plenty of other problems, of course, this is just the worst offender.

    I personally think sites as big as Gaia should be writing custom software, though, as phpBB is really not optimized (or designed...) well enough for that kind of scale.

    The biggest problem you usually face is Slurp. Slurp is the most evil being on the planet. While googlebot will only hit you with a few dozen crawlers at any given moment, yahoo feels that it's perfectly acceptable to send upwards of 512 crawlers *simultaneously*. There's no web server that can serve requests that fast.

  13. #38
    SitePoint Zealot asmictech's Avatar
    Join Date
    Oct 2004
    Location
    nigeria
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    beautiful suggestions have been given. i wanna suggest u go thru book 2 of php anthology. check information on developmet techniques. it is a summary of best practices in web or application coding.
    Success is achieved and maintained by those who keep trying.

    www.ngportal.net

  14. #39
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Etnu
    God himself couldn't run 11,000 users on vBulletin. The biggest problem, of course, is that you can't optimize ANYTHING with a cache. Every page view calls at least 4 or 5 evals. vB has plenty of other problems, of course, this is just the worst offender.

    I personally think sites as big as Gaia should be writing custom software, though, as phpBB is really not optimized (or designed...) well enough for that kind of scale.

    The biggest problem you usually face is Slurp. Slurp is the most evil being on the planet. While googlebot will only hit you with a few dozen crawlers at any given moment, yahoo feels that it's perfectly acceptable to send upwards of 512 crawlers *simultaneously*. There's no web server that can serve requests that fast.
    I doubt Gaia have much code left from phpBB

    But you're right, vB really isn't that efficient as many fan boy thinks
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  15. #40
    SitePoint Zealot
    Join Date
    Mar 2005
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just finished reading the slides for "LiveJournal's Backend - A history Of Scaling" listed above. Really interesting reading.

    What is most revealing about the discussion was that none of the advice or solutions had anything to do with the code, per se - at least it seems like they weren't trawling through the code doing anything other than obvious optimisations (I would imagine that things like DB query tuning & etc would be the first options when starting to scale a site).

    The ~architecture~ was the most crucial factor in scaling out ... clustering, segmenting data, caching, proxies.

  16. #41
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tobyhede
    Just finished reading the slides for "LiveJournal's Backend - A history Of Scaling" listed above. Really interesting reading.

    What is most revealing about the discussion was that none of the advice or solutions had anything to do with the code, per se - at least it seems like they weren't trawling through the code doing anything other than obvious optimisations (I would imagine that things like DB query tuning & etc would be the first options when starting to scale a site).

    The ~architecture~ was the most crucial factor in scaling out ... clustering, segmenting data, caching, proxies.
    Better code usually creates less need for massive solutions, though. Using vBulletin as an example (because it's one piece of software that I'm forced to deal with performance issues on a daily basis with), performance can be increased 4 or 5 times with some code changes (I run a patched version of vB that uses php files for the templates instead of eval -- it was a custom job, of course -- and throughput more than doubled for those web servers).

    Ultimately, being able to scale is something that should be built into every system that you ever expect will be used by the public. Internal tools are typically ok to be less performance concious (focus on security & productivity in those areas), but public pages require more thought. Don't, for example, cache static pages in the same folders as your php scripts are served from (what happens if you start serving the php scripts from an NFS mount or something similiar? Do you *really* want to be performing writes over NFS?)

  17. #42
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,578
    Mentioned
    62 Post(s)
    Tagged
    3 Thread(s)

    Go green & recycle records

    Hi,

    A very long time ago, before Bill Gates and Windows came on the scene, I was programming in good old Dos, Clipper and using dBase I had a lecture on speeding up network programs. This may apply to Mysql and should be considered.

    In a nutshell: When a record is deleted or added to a table then the whole table has to be re-shuffled. To get round this instead of actually deleting a record just change the search key value to something like ZZZ_001, ZZZ_002, etc

    When adding a record first look to see if you have any ZZZ_??? records and if you do then just replace the old data and the new relevant search key.

    If when adding a record and there are no ZZZ_??? records then instead of adding a single record, add umpteen all at the same time.

    I would be interested in some feedback from a Mysql guru.

    Cheers,


    John_Betong

    http://www.anetizer.com/index.php?joke=157

    "All I Know About Computers I Learned From My Mum"

  18. #43
    SitePoint Zealot
    Join Date
    Feb 2005
    Location
    UK
    Posts
    121
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In my experience, the best way to optimise PHP is to do it in SQL. Very few web developers have any real background in database development and most that I come across shy away from complex SQL queries, prefering to do it in code because they feel in control that way. They end up running queries within loops instead of extracting all the data in 1 query. I'm sure you've all seen examples of this.

    QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.

    So the best piece of advice I can give you to optimise the existing code is to replace PHP with SQL whenever and wherever you can. SQL is a beautifull and powerfull language that few web programmers begin to exploit properly. If needs be, employ a database specialist to teach you how to do it.

  19. #44
    SitePoint Guru
    Join Date
    Jun 2002
    Posts
    617
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good advice about the database, however if you link too many huge tables (I mean millions of records per table) on a very frequent basis then the database will die, even with indexes. Had past experience of this from my previous job and as such I no longer normalise a database to the same level I was taught at college/university. I try to use ENUM and SET (which I find very powerful) instead of link tables where the data rarely changes, and try to always preprocess counts rather than do anything on demand. I consider the database to be the weak point of any website these days.

  20. #45
    SitePoint Zealot
    Join Date
    Feb 2005
    Location
    UK
    Posts
    121
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ticksoft
    Good advice about the database, however if you link too many huge tables (I mean millions of records per table) on a very frequent basis then the database will die, even with indexes. Had past experience of this from my previous job and as such I no longer normalise a database to the same level I was taught at college/university. I try to use ENUM and SET (which I find very powerful) instead of link tables where the data rarely changes, and try to always preprocess counts rather than do anything on demand. I consider the database to be the weak point of any website these days.
    Quite right too, Ticksoft, it's called 'domain knowledge' and is a perfectly acceptable thing to do - de-normalising your data because you know when and how it is to be used. They should have taught you that at uni as well as the basics.

    The same applies to pre-processing: if the underlying data is static or only changes at known times, then storing summary data is the only intelligent thing to do.

    Tools like ENUM and SET are engine specific, and again, exploiting the tools available is only sensible.

    The 'weakness' of most backend databases is down to my point - that most web developers know little about database design and SQL. In any other production environment, a database specialist would automatically be included in the project team; but because it's the 'Web' it is not seen as neccessary, so you get 'graphic designers' trying to cope with highly technical and specialised issues. It is no wonder that they don't produce robust, reliable and efficient databases, they just do not have the training or experience to cope.

  21. #46
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    274
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Big thanks goes out to everyone posting suggestions.

    Yes, it is clear most of the things will have to be done on the hardware management side. All I wanted to get at here is to not do any stupid programming errors, like vBulletin running several eval()'s on every single page.

    I know that the bottleneck often is the database. Queries in loops is not something I'd be uneducated enough to do. I'm currently aiming at around 5 queries per page and no more. I think that should be low enough to handle a good enough load.

  22. #47
    SitePoint Addict mx2k's Avatar
    Join Date
    Jan 2005
    Posts
    256
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    since this is on optimization, does anyone know how much preg_match/preg_match_all will decrease the speed of your scripts in php5? i know its way better than using eregi(). and how many times can you use preg_match or preg_match_all in a script before it really starts to drag down the performance?

  23. #48
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Xdebug will answer all these kinds of questions. (I bet you can barely measure the difference).

  24. #49
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by John_Betong
    Hi,

    A very long time ago, before Bill Gates and Windows came on the scene, I was programming in good old Dos, Clipper and using dBase I had a lecture on speeding up network programs. This may apply to Mysql and should be considered.

    In a nutshell: When a record is deleted or added to a table then the whole table has to be re-shuffled. To get round this instead of actually deleting a record just change the search key value to something like ZZZ_001, ZZZ_002, etc

    When adding a record first look to see if you have any ZZZ_??? records and if you do then just replace the old data and the new relevant search key.

    If when adding a record and there are no ZZZ_??? records then instead of adding a single record, add umpteen all at the same time.

    I would be interested in some feedback from a Mysql guru.

    Cheers,


    John_Betong

    http://www.anetizer.com/index.php?joke=157

    "All I Know About Computers I Learned From My Mum"
    MySQL already does this by default (you can disable it). When you delete a key from an index, it leaves the node as null. running OPTIMIZE TABLE removes those null nodes.

    They end up running queries within loops instead of extracting all the data in 1 query. I'm sure you've all seen examples of this.
    Mostly true, but the opposite happens frequently as well. Example:

    SELECT * FROM table ORDER BY RAND() LIMIT 100;

    (try that on a table with a million entries, then come and tell me it was a good idea).


    QUERIES INSIDE LOOPS ARE WRONG - ALWAYS AND EVERY TIME.
    Wrong. If I'm Memory bound, but not CPU bound, this:

    PHP Code:
    $Start 0;
    $Inc 1000;
    while(
    true)
    {
       
    $result $db->query('SELECT * FROM table LIMIT '.$Start.','.$Inc);
       if(
    $result->num_rows == 0)
       {
          break;
       }
       while(
    $tmp $result->fetch())
       {
           echo 
    $tmp['whatever']; 
       }
       
    $Start += $Inc;

    is much more efficient than:

    PHP Code:
    $result $db->query('SELECT * FROM table');
        while(
    $tmp $result->fetch())
       {
           echo 
    $tmp['whatever']; 
       } 
    on large datasets.

    NOT everything is more efficient to do in the database. PHP is a very fast language, and in many areas outperforms SQL by a fair margin (most notably in areas like string manipulation). The suggestion to just move things to SQL is shortsighted and misguided.

    That being said, though, your heart's in the right place. I'm still baffled as to why I constently see applications that require 8 web servers and only 1 database server.

  25. #50
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That's not PHP's fault, it's Apache's

    And sometimes it's better to do the processing in PHP
    For example, prevent file sort
    If you can't manage the query optimizer to use your index to do the sorting, you're better off doing it in PHP
    Sure you'll have to buffer the result set and use more memory but it's still less expensive than doing disk IO
    Last edited by Daijoubu; Sep 1, 2005 at 20:26.
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •