SitePoint Sponsor |
|
User Tag List
Results 76 to 98 of 98
-
Sep 3, 2005, 13:24 #76
- Join Date
- May 2005
- Posts
- 255
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by Daijoubu
>> (try that on a table with a million entries, then come and tell me it was a good idea).
Try telling any data guru that having a table with a million entries in your OLTP is a good idea
You should never really have to worry about extremely large data sets in your OLTP, because you should never really have extremely large data sets in your OLTP. Your OLTP should really only contain what is necessary for the online system. Everything else should be offloaded, and maybe even de-normalized, to a data warehouse.
But you're really going it about the wrong way if you find yourself with millions of rows in your online system.
Everybody seems to talk about "enterprise programming" but then completely ignore aspects of "enterprise databasing" (I know that's not a real word, but you know what I mean).
Our user database is fast approaching a million users who have been active in the last 30 days as well.
The query that I gave as an example was from a real-world situation. We want to show 10 random articles from the database (the 26 content tables are managed with a MERGE table). The only solution that gives good, truly random results, was to do this:
1.) Count the number of records.
2.) use mt_rand(0,$RecordCount);
3.) Grab the records using the randomly chosen ID.
Like I said -- order by rand(), limit 10 would never be more efficient (even if you were dealing with a small data set, such as a few thousand rows).
Of course, not every application is (or should be) designed the same. Some applications need the database as lean as possible; others perform better when all the data is accessable all the time. Blanket statements like "never" or "always" are the first signs of bad design.
Oh, and I'd never use any of the existing open source CMS systems for the sites that I run. Have you ever tried to move 20,000 articles from one section of your site to another in those things? Yikes!
Any PHP application is inherently scalable (unless it has been very very very poorly coded). It is not PHP's job to scale the system; it will naturally scale with your system
Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so, it's cheaper to just buy a second web server most of the time, since apache can't really do much better than that on any reasonably-priced current hardware; I use a dual opteron w/ 4GB of RAM as my benchmark of "reasonable").
However -- most people don't get those kinds of numbers. I'm still baffled as to why it's more or less impossible to run vBulletin on a single machine once you have more than 2-300 concurrent connections going on (note: vbulletin's "current users online" is not a measure of concurrency, it's a measurement of users online in the last 10 or 15 minutes. and there's a world of difference). Of course, most sites can fall back to using things like Tux, which can easily toss out 25k+ pages per second without batting an eye.
Of course there are also situations where it is ridiculous to even think about running an application on only a handful of servers, it's clear you are going to need a whole park of servers
fter all it doesn't really matter much, whether you are going to use 60 or 150 machines.
an example: queries should follow the format db_name.table_name for replication of individual databases.
I generally say it's best to leave the scaling to the db server (mysql, postgresql, oracle, and microsoft sql server all do this naturally. I'm quite positive that the majority of other db platforms out there do as well, those are just the ones I've used personally). Attempting to implement a custom clustering system in your code is, at best, messy, and, at worst, dangerous.
And scaling databases is easy otherwise. Simple round-robin setup for a read-only database:
PHP Code:class DB extends mysqli
{
private static $Connections = array(
array('192.168.1.100','username','password','Database'),
array('192.168.1.101','username','password','Database'),
array('192.168.1.102','username','password','Database'),
array('192.168.1.103','username','password','Database'),
);
function __construct()
{
$ConnectionData = self::$Connections[mt_rand(0,count(self::$Connections))];
parent::__construct($ConnectionData[0,$ConnectionData[1],$ConnectionData[2],$ConnectionData[3]);
}
}
-
Sep 3, 2005, 15:57 #77
>> Obviously there's a certain point where limitations beyond php's control start to hit you (once you get over 10,000 simultaneous connections or so)
Most limitations hit before PHP gives out. The network is probably the biggest bottleneck. If we take a page having a size of 20k, A server on a 10Mbit line, will become saturated at around 50 requests/second. Around 500/second on a 100Mbit line. And around 5,000/second on a 1Gbit line.
Most servers probably are on 100Mbit lines, and would have one hell of a time serving more than 500 requests/second due to network limitations.
-
Sep 3, 2005, 16:54 #78
- Join Date
- Jun 2002
- Location
- Wellington, NZ
- Posts
- 363
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Edman, I hear you and am in a similar situation. I found this great resource (hudzilla.org) a few months back and mentions pretty much everything you'll need to know about optimising PHP code and database structure.
Good luck and apoligies if someone has already metnioned this resource!
- UK2NZ: mY bLoG
- Support Disclosure (and be amazed by it)
-
Sep 4, 2005, 02:51 #79
- Join Date
- Oct 2002
- Location
- Canada QC
- Posts
- 454
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by dreamscape
The average size of a gzipped page is more likely to be 5-10KB, unless the HTML is really boatedSpeed & scalability in mind...
If you find my reply helpful, fell free to give me a point
-
Sep 4, 2005, 08:38 #80
>> 20k, that's pretty big of an example :P
The average size of a gzipped page is more likely to be 5-10KB, unless the HTML is really boated
A page has more than just the HTML. It also can have images, CSS files, JS files, etc, etc, etc... When you consider everything that a "page" consists of aside from HTML, 20K is pretty small.
Regardless, the network is still likely to be the biggest bottleneck in the system.
-
Sep 4, 2005, 08:55 #81
- Join Date
- Feb 2003
- Posts
- 156
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by Etnu
If it makes the difference between buying 100 servers and buying 110, it matters significantly to most every company out there. Servers are cheap, individually, but the numbers quickly add up, especially for the 95%+ of web companies out there who are not multi-billion dollar operations.
Otherwise I agree with a lot of what you wrote, especially with respect to popular software. A lot of popular OS projects have stock sentences about "flexibility" and "performance" in their blurbs - but they are often just that: blurbs intended to make people feel good; rarely was there serious effort made in that respect (let alone any honest comparisons to alternatives).
-
Sep 4, 2005, 17:38 #82
- Join Date
- Jan 2005
- Location
- Franz Josef Land
- Posts
- 28
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Apache native threading will kill you in seconds. Go asynchronous, unless you want to spend most of your time doing context switching
I code therefore I am.
-
Sep 4, 2005, 21:10 #83
- Join Date
- May 2005
- Posts
- 255
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by dreamscape
Writing better code also enables you to do much more complicated work in real time, which is going to become increasingly important as stuff like AJAX gets more and more popular. Sure, your site may be able to handle 10,000 requests per second (or whatever), but it's still performing extremely poorly if the user isn't getting near-instant responses from your application (they may as well just do things with traditional "click and wait for response" type of stuff).
-
Sep 4, 2005, 23:20 #84
>> Sure, your site may be able to handle 10,000 requests per second (or whatever)
As I recall you were the one talking about some magical server you have that can do 10,000 PHP requests per second or something like that (which I don't buy for 1 second anyways)... I was just trying to say that unless your server has some kind of uber uplink to the net, you're not even going to be able to get close to that due to network limitations.
You also seemed baffled how so many servers could not handle more than about 300 concurrent connections, with vBulletin or something, but that doesn't really matter, as most single servers, around 300 is probably their limitation due to network bottlenecks. Most people probably only have 10Mbit uplinks (which would limit to far fewer) or 100Mbit which 300 is probably about right. Once you saturate the line, game over. Trying to pump out a few more cycles isn't going to solve anything if the problem is that the line is saturated.
>> If you ignore the execution speed of the script itself and only focus on the overall performance on a massive scale
If you focus on a massive scale, the network will be a far bigger factor than cpu cycles will be, assuming your script can handle what the network can (which I assume most can, since networks saturate easily).
I'm not trying to say you shouldn't optimize your code to use less CPU cycles, but let's be realistic: In nearly any web app, the network will most likely be the biggest bottleneck.
-
Sep 4, 2005, 23:34 #85
>> If the users get their pages quickly, the application is performing well. If they are not, it isn't.
I think most of us are talking about web apps, and there is far more to the puzzle than just the application. If a use is not getting pages quickly it could be any one of the following (or more, as this is just off the top of my head):
- network line is saturated (reached limit)
- web server (apache) not tuned/setup correctly, or reached its limit.
- CPU has reached its limit
- other processes on server causing too much overhead
- SQL server has reached its limit
If the users don't get their pages quickly, the server is not performing well. Not necessarily the application. There are a myriad of things that could be the cause or contributing to the cause.
-
Sep 9, 2005, 15:51 #86
Coming in a bit late to the discussion, but I wanted to jump in on the 'session' statements I saw.
I didn't really understand someone's earlier comment about "I got rid of sessions - they'll just have to use cookies". Was that meant as storing information IN a cookie? Not a good idea from a performance standpoint, as generally that cookie information is sent back in every HTTP request (15 images on a page means that cookie information is sent 15 times over the network for that page request).
I'm getting offtopic a bit, but I'll bring it back to sessions. Generally I don't use the PHP built in session handling. My earlier experience with the LogiCreate framework was that the PHP session handling code wasn't all that hot. Now granted, this was early days of PHP4, and there was no $_SESSION or other improvements. BUT, the thing I've noticed is that it *always* writes out the full session to disk even if there have been no changes to the session data. On large sites that's wasteful - sometimes very much so.
You can bypass this somewhat by using session_set_save_handler() to write your own save/write routine, but you still need a way to know if anything's been changed. It's probably worth it to simply write your own session system which would check for a 'dirty' flag if the session's been modified, and only write out when things have been changed.
Just thought I'd throw that out. Excepting the LogiCreate system I'd started years ago, I don't think I've ever seen a PHP framework deal with this issue. drupal doesn't (just mentioning it because it was mentioned as 'use drupal if scalability is important' or something similar).Michael Kimsal
=============================
groovymag.com - for groovy/grails developers
jsmag.com - for javascript developers
-
Sep 12, 2005, 23:40 #87
- Join Date
- May 2005
- Posts
- 255
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by dreamscape
As I recall you were the one talking about some magical server you have that can do 10,000 PHP requests per second or something like that (which I don't buy for 1 second anyways)...
However, like I said, there is no reason why a single, moderate piece of hardware should not be able to handle a few hundred requests per second (my original benchmark for vbulletin and wikipedia). Of course, these applications (which I use as examples because they're typical of most PHP code) don't really perform all that well even when there is no load.
You're correct that the average user is heavily bandwidth-limited. Typically, users who are on those types of connections (< 10Mb upstream) aren't encountering these types of scalability issues in the first place, though, so it's irrelevant.
-
Sep 19, 2005, 04:33 #88
- Join Date
- Dec 2004
- Location
- USA
- Posts
- 1,407
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by sweatje
-
Sep 19, 2005, 04:56 #89
- Join Date
- Jun 2003
- Location
- Iowa, USA
- Posts
- 3,749
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Jason Sweat ZCE - jsweat_php@yahoo.com
Book: PHP Patterns
Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
Detestable (adjective): software that isn't testable.
-
Sep 19, 2005, 06:06 #90
- Join Date
- Dec 2004
- Location
- USA
- Posts
- 1,407
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by lastcraft
-
Sep 19, 2005, 06:07 #91
- Join Date
- Dec 2004
- Location
- USA
- Posts
- 1,407
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
where are some good tutorials about optimizing MySQL DB queries?
EDIT: Nevermind - am posting this in MySQL forum
-
Sep 19, 2005, 06:08 #92
- Join Date
- Dec 2004
- Location
- USA
- Posts
- 1,407
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Also, Apache AB is Apache's appl. benchmarking tool. I've had people post their findings on the forum and found them VERY useful for determining whether works needs done on the code or SQL code.
My host will not allow me to install it and I am runnngin IIS locally so I don't have the occesion to run it.
Although, it migh make sense to run it locally since I should replicate my production environment closely.
-
Sep 19, 2005, 06:32 #93
- Join Date
- Jan 2003
- Posts
- 5,748
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Jason,
I've downloaded the php_apc.dll and put it in my Apache root directory, where the other dlls exist for my installation. But, since APC is only for PHP4.x, I'm having a problem with this extension
Downloaded phpts4.dll (PHP4.4 package) and put that into C:/Apache2/bin/ but still no result, so can you (or someone else) tell me how to use this extension with PHP5.0.x? I might look also at memcache since I've downloaded that extension as well if this isn't solved.
Thanks
-
Sep 19, 2005, 06:41 #94
- Join Date
- Jun 2003
- Location
- Iowa, USA
- Posts
- 3,749
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
You might try http://www.schlossnagle.org/~george/talks/
In particular, I attended this one , and I believe he was using APC for code profiling there (though it was two years ago, memory might not serve)Jason Sweat ZCE - jsweat_php@yahoo.com
Book: PHP Patterns
Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
Detestable (adjective): software that isn't testable.
-
Sep 19, 2005, 06:46 #95
- Join Date
- Jan 2003
- Posts
- 5,748
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks, I will do just that, I'll let this thread get back on topic, and PM you if I still have problems?
I attended this one ,
-
Sep 19, 2005, 10:36 #96
Originally Posted by sweatje
APC is an opcode cache.
-
Sep 19, 2005, 10:57 #97
- Join Date
- Jun 2003
- Location
- Iowa, USA
- Posts
- 3,749
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by dreamscape
Jason Sweat ZCE - jsweat_php@yahoo.com
Book: PHP Patterns
Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
Detestable (adjective): software that isn't testable.
-
Sep 19, 2005, 11:07 #98
- Join Date
- Jan 2003
- Posts
- 5,748
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
It's the Opcode Cache I downloaded first without knowing any better, but I know a bit more now
Won't be downloading that again, for sure
Bookmarks