SitePoint Sponsor

User Tag List

Page 4 of 4 FirstFirst 1234
Results 76 to 98 of 98
  1. #76
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Daijoubu
    That's not PHP's fault, it's Apache's
    Not really, it's mostly the programmer's "fault". But anyway...

    >> (try that on a table with a million entries, then come and tell me it was a good idea).

    Try telling any data guru that having a table with a million entries in your OLTP is a good idea

    You should never really have to worry about extremely large data sets in your OLTP, because you should never really have extremely large data sets in your OLTP. Your OLTP should really only contain what is necessary for the online system. Everything else should be offloaded, and maybe even de-normalized, to a data warehouse.

    But you're really going it about the wrong way if you find yourself with millions of rows in your online system.

    Everybody seems to talk about "enterprise programming" but then completely ignore aspects of "enterprise databasing" (I know that's not a real word, but you know what I mean).
    Maybe 10 or 20 years ago that was true, but I have, right now, a single database with 26 tables (due to file size limitations more than anything else) each containing ~ 500,000 - 1 million records serving content from a CMS. I can assure you that the DB handles this quite well, without any issue. Putting those in a separate system would make things quite slow, cumbersome, and difficult to manage.

    Our user database is fast approaching a million users who have been active in the last 30 days as well.

    The query that I gave as an example was from a real-world situation. We want to show 10 random articles from the database (the 26 content tables are managed with a MERGE table). The only solution that gives good, truly random results, was to do this:

    1.) Count the number of records.

    2.) use mt_rand(0,$RecordCount);

    3.) Grab the records using the randomly chosen ID.

    Like I said -- order by rand(), limit 10 would never be more efficient (even if you were dealing with a small data set, such as a few thousand rows).

    Of course, not every application is (or should be) designed the same. Some applications need the database as lean as possible; others perform better when all the data is accessable all the time. Blanket statements like "never" or "always" are the first signs of bad design.

    Oh, and I'd never use any of the existing open source CMS systems for the sites that I run. Have you ever tried to move 20,000 articles from one section of your site to another in those things? Yikes!

    Any PHP application is inherently scalable (unless it has been very very very poorly coded). It is not PHP's job to scale the system; it will naturally scale with your system
    Even a few simple mistakes can choke scalability, actually. The two most obvious examples are vBulletin and MediaWiki. Both are very popular, very well-regarded, and very poorly written pieces of software that scale extremely poorly. Yes, you can just throw more web servers at the problem (as most people do), but writing a more efficient code base fromt he ground up would have saved thousands upon thousands (or millions, in the case of mediawiki) of dollars in hardware and maintenance costs.

    Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so, it's cheaper to just buy a second web server most of the time, since apache can't really do much better than that on any reasonably-priced current hardware; I use a dual opteron w/ 4GB of RAM as my benchmark of "reasonable").

    However -- most people don't get those kinds of numbers. I'm still baffled as to why it's more or less impossible to run vBulletin on a single machine once you have more than 2-300 concurrent connections going on (note: vbulletin's "current users online" is not a measure of concurrency, it's a measurement of users online in the last 10 or 15 minutes. and there's a world of difference). Of course, most sites can fall back to using things like Tux, which can easily toss out 25k+ pages per second without batting an eye.

    Of course there are also situations where it is ridiculous to even think about running an application on only a handful of servers, it's clear you are going to need a whole park of servers
    Rarely, and it depends on what you're actually doing.

    fter all it doesn't really matter much, whether you are going to use 60 or 150 machines.
    If it makes the difference between buying 100 servers and buying 110, it matters significantly to most every company out there. Servers are cheap, individually, but the numbers quickly add up, especially for the 95%+ of web companies out there who are not multi-billion dollar operations.

    an example: queries should follow the format db_name.table_name for replication of individual databases.
    That doesn't help you one bit for replication of individual databases on most platforms.

    I generally say it's best to leave the scaling to the db server (mysql, postgresql, oracle, and microsoft sql server all do this naturally. I'm quite positive that the majority of other db platforms out there do as well, those are just the ones I've used personally). Attempting to implement a custom clustering system in your code is, at best, messy, and, at worst, dangerous.

    And scaling databases is easy otherwise. Simple round-robin setup for a read-only database:

    PHP Code:
    class DB extends mysqli
    {
       private static 
    $Connections = array(
       array(
    '192.168.1.100','username','password','Database'),
       array(
    '192.168.1.101','username','password','Database'),
       array(
    '192.168.1.102','username','password','Database'),
       array(
    '192.168.1.103','username','password','Database'),
       );
       function 
    __construct()
       {
           
    $ConnectionData self::$Connections[mt_rand(0,count(self::$Connections))];
           
    parent::__construct($ConnectionData[0,$ConnectionData[1],$ConnectionData[2],$ConnectionData[3]);
       }


  2. #77
    SitePoint Wizard dreamscape's Avatar
    Join Date
    Aug 2005
    Posts
    1,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so)

    Most limitations hit before PHP gives out. The network is probably the biggest bottleneck. If we take a page having a size of 20k, A server on a 10Mbit line, will become saturated at around 50 requests/second. Around 500/second on a 100Mbit line. And around 5,000/second on a 1Gbit line.

    Most servers probably are on 100Mbit lines, and would have one hell of a time serving more than 500 requests/second due to network limitations.

  3. #78
    SitePoint Addict phptek's Avatar
    Join Date
    Jun 2002
    Location
    Wellington, NZ
    Posts
    363
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Edman, I hear you and am in a similar situation. I found this great resource (hudzilla.org) a few months back and mentions pretty much everything you'll need to know about optimising PHP code and database structure.

    Good luck and apoligies if someone has already metnioned this resource!

  4. #79
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dreamscape
    >> Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so)

    Most limitations hit before PHP gives out. The network is probably the biggest bottleneck. If we take a page having a size of 20k, A server on a 10Mbit line, will become saturated at around 50 requests/second. Around 500/second on a 100Mbit line. And around 5,000/second on a 1Gbit line.

    Most servers probably are on 100Mbit lines, and would have one hell of a time serving more than 500 requests/second due to network limitations.
    20k, that's pretty big of an example :P
    The average size of a gzipped page is more likely to be 5-10KB, unless the HTML is really boated
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point

  5. #80
    SitePoint Wizard dreamscape's Avatar
    Join Date
    Aug 2005
    Posts
    1,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> 20k, that's pretty big of an example :P
    The average size of a gzipped page is more likely to be 5-10KB, unless the HTML is really boated

    A page has more than just the HTML. It also can have images, CSS files, JS files, etc, etc, etc... When you consider everything that a "page" consists of aside from HTML, 20K is pretty small.

    Regardless, the network is still likely to be the biggest bottleneck in the system.

  6. #81
    SitePoint Zealot
    Join Date
    Feb 2003
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Etnu
    Rarely, and it depends on what you're actually doing.
    That was my point. Depending on what you are doing (and how large you have to "scale") different things will matter with respect to what's important for "scale".


    If it makes the difference between buying 100 servers and buying 110, it matters significantly to most every company out there. Servers are cheap, individually, but the numbers quickly add up, especially for the 95%+ of web companies out there who are not multi-billion dollar operations.
    If you have software that's running on 50 servers, the cost of hardware will be smaller with respect to all the other costs involved the venture - most of the time. Hence why I said the number of server (beyond a certain threshhold) is not the most important thing. Decisions will be made on economics - sometimes it will be cheaper to have a few devs profile and improve the application (there's often some low hanging fruits in the beginning), sometimes (especially once you picked all the low hanging fruit) it will be cheaper to add hardware.


    Otherwise I agree with a lot of what you wrote, especially with respect to popular software. A lot of popular OS projects have stock sentences about "flexibility" and "performance" in their blurbs - but they are often just that: blurbs intended to make people feel good; rarely was there serious effort made in that respect (let alone any honest comparisons to alternatives).

  7. #82
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Franz Josef Land
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Apache native threading will kill you in seconds. Go asynchronous, unless you want to spend most of your time doing context switching
    I code therefore I am.

  8. #83
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dreamscape
    >> Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so)

    Most limitations hit before PHP gives out. The network is probably the biggest bottleneck. If we take a page having a size of 20k, A server on a 10Mbit line, will become saturated at around 50 requests/second. Around 500/second on a 100Mbit line. And around 5,000/second on a 1Gbit line.

    Most servers probably are on 100Mbit lines, and would have one hell of a time serving more than 500 requests/second due to network limitations.
    Maybe we're looking at "limitations" as two different things. If I see a request not even being served in under 100ms, I see that as being a problem; it makes the site appear sluggish, and that eventually turns users away. No matter how many servers you throw at the problem, the only way to make the page get served faster is to write better code. Ultimately, that's the final word in application performance -- user experience. If the users get their pages quickly, the application is performing well. If they are not, it isn't. If you ignore the execution speed of the script itself and only focus on the overall performance on a massive scale, the site will appear sluggish to users, and that's always a bad thing. There's no reason why it should take me 5 seconds to download a 25k page when I'm on cable or DSL -- period.

    Writing better code also enables you to do much more complicated work in real time, which is going to become increasingly important as stuff like AJAX gets more and more popular. Sure, your site may be able to handle 10,000 requests per second (or whatever), but it's still performing extremely poorly if the user isn't getting near-instant responses from your application (they may as well just do things with traditional "click and wait for response" type of stuff).

  9. #84
    SitePoint Wizard dreamscape's Avatar
    Join Date
    Aug 2005
    Posts
    1,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> Sure, your site may be able to handle 10,000 requests per second (or whatever)

    As I recall you were the one talking about some magical server you have that can do 10,000 PHP requests per second or something like that (which I don't buy for 1 second anyways)... I was just trying to say that unless your server has some kind of uber uplink to the net, you're not even going to be able to get close to that due to network limitations.

    You also seemed baffled how so many servers could not handle more than about 300 concurrent connections, with vBulletin or something, but that doesn't really matter, as most single servers, around 300 is probably their limitation due to network bottlenecks. Most people probably only have 10Mbit uplinks (which would limit to far fewer) or 100Mbit which 300 is probably about right. Once you saturate the line, game over. Trying to pump out a few more cycles isn't going to solve anything if the problem is that the line is saturated.

    >> If you ignore the execution speed of the script itself and only focus on the overall performance on a massive scale

    If you focus on a massive scale, the network will be a far bigger factor than cpu cycles will be, assuming your script can handle what the network can (which I assume most can, since networks saturate easily).

    I'm not trying to say you shouldn't optimize your code to use less CPU cycles, but let's be realistic: In nearly any web app, the network will most likely be the biggest bottleneck.

  10. #85
    SitePoint Wizard dreamscape's Avatar
    Join Date
    Aug 2005
    Posts
    1,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> If the users get their pages quickly, the application is performing well. If they are not, it isn't.

    I think most of us are talking about web apps, and there is far more to the puzzle than just the application. If a use is not getting pages quickly it could be any one of the following (or more, as this is just off the top of my head):

    - network line is saturated (reached limit)
    - web server (apache) not tuned/setup correctly, or reached its limit.
    - CPU has reached its limit
    - other processes on server causing too much overhead
    - SQL server has reached its limit

    If the users don't get their pages quickly, the server is not performing well. Not necessarily the application. There are a myriad of things that could be the cause or contributing to the cause.

  11. #86
    SitePoint Addict mgkimsal's Avatar
    Join Date
    Sep 1999
    Posts
    209
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Coming in a bit late to the discussion, but I wanted to jump in on the 'session' statements I saw.

    I didn't really understand someone's earlier comment about "I got rid of sessions - they'll just have to use cookies". Was that meant as storing information IN a cookie? Not a good idea from a performance standpoint, as generally that cookie information is sent back in every HTTP request (15 images on a page means that cookie information is sent 15 times over the network for that page request).

    I'm getting offtopic a bit, but I'll bring it back to sessions. Generally I don't use the PHP built in session handling. My earlier experience with the LogiCreate framework was that the PHP session handling code wasn't all that hot. Now granted, this was early days of PHP4, and there was no $_SESSION or other improvements. BUT, the thing I've noticed is that it *always* writes out the full session to disk even if there have been no changes to the session data. On large sites that's wasteful - sometimes very much so.

    You can bypass this somewhat by using session_set_save_handler() to write your own save/write routine, but you still need a way to know if anything's been changed. It's probably worth it to simply write your own session system which would check for a 'dirty' flag if the session's been modified, and only write out when things have been changed.

    Just thought I'd throw that out. Excepting the LogiCreate system I'd started years ago, I don't think I've ever seen a PHP framework deal with this issue. drupal doesn't (just mentioning it because it was mentioned as 'use drupal if scalability is important' or something similar).
    Michael Kimsal
    =============================
    groovymag.com - for groovy/grails developers
    jsmag.com - for javascript developers

  12. #87
    SitePoint Addict
    Join Date
    May 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dreamscape
    >> If the users get their pages quickly, the application is performing well. If they are not, it isn't.

    I think most of us are talking about web apps, and there is far more to the puzzle than just the application. If a use is not getting pages quickly it could be any one of the following (or more, as this is just off the top of my head):

    - network line is saturated (reached limit)
    - web server (apache) not tuned/setup correctly, or reached its limit.
    - CPU has reached its limit
    - other processes on server causing too much overhead
    - SQL server has reached its limit

    If the users don't get their pages quickly, the server is not performing well. Not necessarily the application. There are a myriad of things that could be the cause or contributing to the cause.
    We're talking about the same thing. I was using the term "application" as a generic term for the user's interaction with you. That includes server, network, whatever. It's everything. And if any element is slow -- it's bad. You can NOT let users wait 5 or 10 seconds to receive feedback and claim that your application performs well because you're able to handle thousands of simultaneous users. This is where code performance matters.

    As I recall you were the one talking about some magical server you have that can do 10,000 PHP requests per second or something like that (which I don't buy for 1 second anyways)...
    I don't believe in magic, nor was I the one who made that claim. I used that number because it was provided previously, although it is quite possible to serve large numbers of requests when you're dealing with web requests and not actually returning much (or, in many cases, not returning anything at all, just processing input). You assume that every request being made is going to produce a full page of content. That's simply not the case for modern web apps. I can show you plenty of examples of sites taking on thousands of hits, processing / updating databases, and then returning absolutely nothing (i.e. a 304). Amazon and NetFlix come to mind immediately here, as they both do this quite effectively, and I'm sure save a whole lot of bandwidth because of it.

    However, like I said, there is no reason why a single, moderate piece of hardware should not be able to handle a few hundred requests per second (my original benchmark for vbulletin and wikipedia). Of course, these applications (which I use as examples because they're typical of most PHP code) don't really perform all that well even when there is no load.

    You're correct that the average user is heavily bandwidth-limited. Typically, users who are on those types of connections (< 10Mb upstream) aren't encountering these types of scalability issues in the first place, though, so it's irrelevant.

  13. #88
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    USA
    Posts
    1,407
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    code profiler
    What are some good profilers and how do you use them?

  14. #89
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    xdebug or apc. I think the respective authors (Derick Rethans and George Schlossnagle) each have presentations on optimizing code on their websites which include code profiling.
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  15. #90
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    USA
    Posts
    1,407
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lastcraft
    Development costs dwarf hardware costs, and opportunity cost from time to market dwarf both.
    You are so right from a macro sense - good thoughts!

  16. #91
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    USA
    Posts
    1,407
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    where are some good tutorials about optimizing MySQL DB queries?

    EDIT: Nevermind - am posting this in MySQL forum

  17. #92
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    USA
    Posts
    1,407
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also, Apache AB is Apache's appl. benchmarking tool. I've had people post their findings on the forum and found them VERY useful for determining whether works needs done on the code or SQL code.

    My host will not allow me to install it and I am runnngin IIS locally so I don't have the occesion to run it.

    Although, it migh make sense to run it locally since I should replicate my production environment closely.

  18. #93
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Jason,

    I've downloaded the php_apc.dll and put it in my Apache root directory, where the other dlls exist for my installation. But, since APC is only for PHP4.x, I'm having a problem with this extension

    Downloaded phpts4.dll (PHP4.4 package) and put that into C:/Apache2/bin/ but still no result, so can you (or someone else) tell me how to use this extension with PHP5.0.x? I might look also at memcache since I've downloaded that extension as well if this isn't solved.

    Thanks

  19. #94
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You might try http://www.schlossnagle.org/~george/talks/
    In particular, I attended this one , and I believe he was using APC for code profiling there (though it was two years ago, memory might not serve)
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  20. #95
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, I will do just that, I'll let this thread get back on topic, and PM you if I still have problems?

    I attended this one ,
    That one is 24Meg!! Just as well I'm on broadband...

  21. #96
    SitePoint Wizard dreamscape's Avatar
    Join Date
    Aug 2005
    Posts
    1,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    Are you sure you don't mean APD?

    APC is an opcode cache.

  22. #97
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dreamscape
    Are you sure you don't mean APD?

    APC is an opcode cache.
    Probably. I use xdebug myself, but I did think there was an ap* out there which fit the needs
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  23. #98
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's the Opcode Cache I downloaded first without knowing any better, but I know a bit more now

    Won't be downloading that again, for sure


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •