Too many httpd processes, depleting resources of server

All,

I’m having a problem with my dedicated web sever (3.0 GHZ P4 Processor with 2 GB of RAM). It seems as though, during a certain time of day, my server turns into a vegitative state because there are too many httpd processes running at once. I’m unsure if this is due to my mysql database or if there are just too many requests coming at once that my server can’t handle.

I’ve been going through countless forums to find answers to tweak my apache server variables, along with my mysql variables, and I can’t seem to find a resolution to my problem.

I run a site similar to statcounter and sitemeter, where subscribers of my tool insert a small script into there blog or site, and my site tracks and analyzes their visitor statistics for them. I have around 60,000 subscribers

Here’s a brief background of my site:

  1. Received close to 2 million hits per day (accoding to webalizer), before the problems started occuring, now it’s down to 900K per day.

  2. The peak times during my site are 9AM - 12PM (rangning from 47K hits per hour to 54K hits) and 9PM - 1 AM (same range). I believe the morning traffic comes heavily from Hong Kong, and the night traffic comes mainly from the US.
    [B]

  3. The oddity in my problem is that, my server only spawns these massive httpd processes during the morning traffic (9AM-12PM) but not during the night traffic (9pm-1am), even though during these time periods, the hit traffic is almost the same in number. [/B]

Here’s all relevant Apache Configuration:

KeepAlive Off

<IfModule prefork.c>
StartServers 15
MinSpareServers 8
MaxSpareServers 25
MaxClients 150
MaxRequestsPerChild 10000
</IfModule>

<IfModule worker.c>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 10000
</IfModule>

Here’s all relevant mysql information:

  • 1 database, with 7.12 GB of data
  • 700+ tables

SHOW VARIABLES:

| back_log | 50 |
| basedir | /usr/ |
| bdb_cache_size | 8388600 |
| bdb_log_buffer_size | 8388608 |
| bdb_home | /var/lib/mysql/ |
| bdb_max_lock | 10000 |
| bdb_logdir | |
| bdb_shared_data | OFF |
| bdb_tmpdir | /tmp/
| binlog_cache_size | 32768 |
| character_set | latin1 |

| concurrent_insert | ON |
| connect_timeout | 5 |
| datadir | /var/lib/mysql/ |
| delay_key_write | ON |
| delayed_insert_limit | 100 |
| delayed_insert_timeout | 300 |
| delayed_queue_size | 1000 |
| flush | OFF |
| flush_time | 0 |
| have_bdb | YES |
| have_gemini | NO |
| have_innodb | YES |
| have_isam | YES |
| have_raid | NO |
| have_openssl | NO |
| init_file | |
| innodb_additional_mem_pool_size | 12249088 |
| innodb_buffer_pool_size | 489684992 | |
| innodb_data_home_dir | |
| innodb_file_io_threads | 4 |
| innodb_force_recovery | 0 |
| innodb_thread_concurrency | 8 |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_fast_shutdown | ON |
| innodb_flush_method | |
| innodb_lock_wait_timeout | 50 |
| innodb_log_arch_dir | |
| innodb_log_archive | OFF |
| innodb_log_buffer_size | 1048576 |
| innodb_log_file_size | 5242880 |
| innodb_log_files_in_group | 2 | |
| innodb_mirrored_log_groups | 1 |
| interactive_timeout | 28800 |
| join_buffer_size | 131072 |
| key_buffer_size | 1202647040 |
| large_files_support | ON |
| locked_in_memory | OFF |
| log | OFF |
| log_update | OFF |
| log_bin | OFF |
| log_slave_updates | OFF |
| log_long_queries | ON |
| long_query_time | 10 |
| low_priority_updates | OFF |
| lower_case_table_names | 0 |
| max_allowed_packet | 1048576 |
| max_binlog_cache_size | 4294967295 |
| max_binlog_size | 1073741824 |
| max_connections | 16384 |
| max_connect_errors | 10 |
| max_delayed_threads | 20 |
| max_heap_table_size | 16777216 |
| max_join_size | 4294967295 |
| max_sort_length | 1024 |
| max_user_connections | 16000 |
| max_tmp_tables | 32 |
| max_write_lock_count | 4294967295 |
| myisam_max_extra_sort_file_size | 256 |
| myisam_max_sort_file_size | 2047 |
| myisam_recover_options | 0 |
| myisam_sort_buffer_size | 8388608 |
| net_buffer_length | 16384 |
| net_read_timeout | 30 |
| net_retry_count | 10 |
| net_write_timeout | 60 |
| open_files_limit | 0 | |
| port | 3306 |
| protocol_version | 10 |
| record_buffer | 131072 |
| record_rnd_buffer | 1998848 |
| query_buffer_size | 0 |
| safe_show_database | ON |
| server_id | 0 |

| slave_net_timeout | 3600 |
| skip_locking | ON |
| skip_networking | OFF |
| skip_show_database | OFF |
| slow_launch_time | 2 | |
| sort_buffer | 6999992 |
| sql_mode | 0 |
| table_cache | 16384 |
| table_type | MYISAM |
| thread_cache_size | 0 |
| thread_stack | 65536 |
| transaction_isolation | READ-COMMITTED |
| timezone | EST |
| tmp_table_size | 33554432 |
| version | 3.23.58-log |
| wait_timeout | 28800

I have IonCube PHP Accelerator installed my server as well, which I read would optimize my php scripts. I believe this did speed up the compilation of my scripts but this didn’t resolve the problem I have of my server spawning too many httpd processes.

I’m a web programmer and have no prior experience with maintaning and configuring web servers.

I would appreciate anyones help with my problem.

Hi

It sounded like your problems started before you installed PHPA, but worth checking without PHPA as it’s not compatible with all Apache2 setups. Has anything else changed on your system that you’re aware of, e.g. changes to the database, httpd configuration, PHP version?

We’re due to be replacing PHPA with a new performance system later in Q1 or in Q2, and this works with Apache 1 and 2 (we use it ourselves with PHP 5 on Apache 2), 32/64 bit, Unix/Windows etc., and we’re making this available to some people in pre-beta where it’s showing great results. If you’re interested to see if it’ll help just get in touch via the ionCube helpdesk and we’ll see what we can do.

Other things, check the apache error log for signs of any problems there. Have you made any significant change to the site content, such as the amount of data being transferred on each request? More site content would take longer to dispose of each request, potentially increasing the number of active requests and the demand for more worker threads. Are you sending compressed page output or uncompressed, i.e. are you using a module such as mod_gzip or the gz output handler with PHP? Sending compressed output requires extra processing to do the compression, however the page size can be reduced dramatically, leading to much quicker disposal of the page and an overall reduced time for each request.

In terms of other solutions, how dynamic are your pages? Do you have content that always changes on each request, or do you have any pages where the final content is partly or even completely static? If there is static content, then content caching is feasible, and can give a dramatic increase in performance. As an example, on one site where we have been testing IPS, (http://www.sailingnavies.com), the implementation of the site is entirely database driven and dynamic, but unless new entries have been made to the underlying data, the site results are static for a given URI. This is a perfect candidate for content caching.

To give an example, using the new performance system, pages where perhaps only 3 or 4 pages per second can be achieved without content caching, with content caching the achievable performance is nearer 150 to 200 pages per second. Even better for some systems would be to use the Apache mod_cache module. This also offers content caching, but will give even better performance as a cache hit would be picked up by Apache rather than in PHP, and of course content not created by PHP will be cached too.

By the look of it, you need a separate database server. What’s the load like on the server?

Whoa…upate to MySQL 4.x, you’re running an ancient version there :eek:

Or even MySQL 5, though it’s quite new :slight_smile:

As for Apache - your MaxClients is probably maxing out.

As this isn’t your area I strongly recommend getting a professional to fix it up for you since the performance gains from a tightly optimized HTTP & database setup can be very dramatic :slight_smile:

Yeah, MySQL5 when used with the mysqli extension can provide some massive performance gain, up to 40 times in some cases, if the MySQL manual can be trusted. :stuck_out_tongue: I just upgraded and will be testing it. It seems compatible enough from initial test runs. Haven’t checked performance yet though.

Do you know of some websites that have a list of “professional admins” that can be contracted out for a few hours to enhance my webserver?

I’ve thought of this route as well, but my main concern is that, how would I know that this person would be trust worthy? I basically would have to give a total stranger, root access to my site correct?

Yup. I’d hire a company, not an individual. There might be some information in stickies on these forums (I haven’t read them all), or someone here might be able to reccommend someone :slight_smile:

One thing I would say is that your max_connections is pretty high - there’s no way a server can sustain 16384 concurrent connections :eek:

If you post in the Looking To Hire forum section here then you may find someone to help - we can’t pimp for business / make offers etc outside of that section :slight_smile:

One thing I would say is that your max_connections is pretty high - there’s no way a server can sustain 16384 concurrent connections

That’s for the database, and the web server config limits to 150 clients in both process and thread based models. Even if using persistent connections in PHP, the connection count will never reach anything like that, so that’s a red herring.

Barring a local fault impairing behaviour, which is why checking the error log is one step to take, load is related to the number of concurrent requests coming in and the length of time to dispose of the requests. Increasing either of those factors can have a disproportionate impact on how the server behaves.

Do your customers access their web stats via the same server? If so then you’ve probably a mixture of many web hits with what should be very quick requests just to log the request information, combined with longer running requests for stats retrieval. See if the stats retrieval is randomly distributed, or concentrated more during the problem period as it could be that there’s an issue there.

The database schema and design should also be checked. Two common design errors are not having indices at all, and having the wrong indices. This may not show up for a while, but with no indices, database writes will be quick but data retrieval is likely to become slower over time. Unnecessary indices and writes will be increasingly slow for no benefit on retrieval, and possibly a slow down with retrieval if database locks are held for longer during the write process. The correct indices can speed up database access by several orders of magnitude in some cases. Other fairly common errors are not exploiting the database correctly and the power of SQL, retrieving too much data, making unnecessary queries, using the scripts to perform processing that should be done by the database, etc. All of this should be reviewed. Database performance can also be improved by periodically running an “optimize” query to optimise the key distribution. Finally, logging queries and running them manually with an “explain” can expose long running qieries, and give clues as to why. Sometimes, the MySQL query optimiser will also make the wrong decision about join order, and forcing a join order with “straight_join” rather than “,” can lead to big speedups. Doing this recently on a table reduced query time from about 0.20 seconds to 0.03 to 0.04.

As people suggested, posting in the hire section is a good idea, and look for someone experienced in not only in basic web server tuning, but on database and application architecture performance tuning as well. Many peoples systems design is far from optimal, and could run much faster with a corrected design, and it may be that your overall architecture needs to be overhauled.

Good luck, and let us know how you get on. It may very well be none of the above, and something totally unexpected. It’s always interesting to find out :slight_smile:

I’m not exactly sure what’s going on but yesterday, my site was running totally fine. No 20 httpd processes running at once, and my site was running smoothly.

But just as I dreaded, the site is back to where it was before yesterday. I’m at work at the moment, and I constantly need to restart apache to get my server running again. I’ve looking at the processes running via top, and I noticed some httpd processes are chewing up a hell of alot of CPU:

check it out:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4386 apache 25 0 27496 11m 7916 R 83.0 0.6 0:52.79 httpd
4362 apache 25 0 27536 12m 7920 R 49.8 0.6 0:20.66 httpd
4365 apache 25 0 27496 11m 7900 R 49.8 0.6 0:29.91 httpd
2193 mysql 15 0 1779m 538m 3084 S 11.3 26.7 1:07.20 mysqld
4323 apache 15 0 27476 11m 7860 S 1.0 0.6 0:00.35 httpd
4318 apache 15 0 27480 11m 7868 S 0.7 0.6 0:00.22 httpd
4312 apache 15 0 27476 11m 7860 S 0.3 0.6 0:00.28 httpd
4316 apache 15 0 27488 11m 7868 S 0.3 0.6 0:00.24 httpd
4319 apache 15 0 27476 11m 7864 S 0.3 0.6 0:00.29 httpd
4355 apache 15 0 27500 11m 7872 S 0.3 0.6 0:00.27 httpd
4361 apache 15 0 27464 11m 7868 S 0.3 0.6 0:00.33 httpd
4366 apache 15 0 27480 11m 7872 S 0.3 0.6 0:00.29 httpd
4367 apache 15 0 27476 11m 7868 S 0.3 0.6 0:00.28 httpd
4368 apache 15 0 27476 11m 7860 S 0.3 0.6 0:00.32 httpd
4373 apache 15 0 27476 11m 7868 S 0.3 0.6 0:00.31 httpd
4375 apache 15 0 27480 11m 7864 S 0.3 0.6 0:00.22 httpd
4376 apache 16 0 27488 11m 7860 S 0.3 0.6 0:00.27 httpd
4378 apache 16 0 27476 11m 7864 S 0.3 0.6 0:00.24 httpd
4385 apache 15 0 27476 11m 7860 S 0.3 0.6 0:00.29 httpd
4795 apache 16 0 27508 11m 7868 S 0.3 0.6 0:00.16 httpd
4817 root 16 0 2188 924 724 R 0.3 0.0 0:00.28 top
4889 apache 16 0 27472 11m 7856 S 0.3 0.6 0:00.14 httpd
4895 apache 15 0 27436 11m 7860 S 0.3 0.6 0:00.12 httpd

I’ve checked my error_logs, and no errors are occurring out of the ordinary. Does any one have a clue about whats going on?

IonCube, I appreciate your replies. Yes, all the customers do access their stats via the same server, that records the stats. The point I can’t get passed is that, why it’s happening in the morning and not in the night time? The night time experiences the same amount of traffic as the morning, why would it occur in the morning and not during the night? The server and the mysql config is set up exactly the same at both times, if it’s running smoothly at night when the server experiences the same amoutn of traffic, why is dying out in the morning? Would anyone suspect foul play? If so, how can I tell?

One other thing is that, the page that contains the stats for my users is the main one that’s causing the hold up i believe. If you navigate through my site, all the other pages seem to load up lightning fast, but when you do try to access the page that contains the user stats, that’s when slow down begins. Obviously, this has some connection with the mysql db, but what could it be? I’m pretty sure I have my indices on the correct columns and also, I have a cron job set up to do a daily optimize of the tables each night.

You need to look ata a full Apache status to see exactly what those processes are doing, inparticular these columns:

CPU CPU usage, number of seconds
SS Seconds since beginning of most recent request
Req Milliseconds required to process most recent request

:slight_smile:

How would i do that?

If you have WHM you can click the ‘Apache status’ button, if not then this should guide you :slight_smile:

Your process listing looks fine. Are you dual CPU?

High CPU utilisation is actually what you want and is good, because if there’s work to be done, you want as much of the CPU time spent getting the work done and not spent waiting around for I/O, and not achieiving maximum utilisation. However if an httpd process is constantly running, taking a long time to complete a request and with spare processes idle, then that suggests a problem. Your listing shows that there are 3 runnable httpd processes plus mysqld. The other httpd’s are spare and some may die off. Your config specified a minimum pool of 8, which should be sized to be sufficient to handle a quick demand for requests without waiting while more processes are forked, and a maximum spare of 25. After heavy demand of more than 25 concurrent requests, the httpd count should fall back quickly to 25, and will then gradually drop off over time if there’s no heavy demand.

The apache module mod_status may give some useful information. (edit: just saw Beansprout’s suggestion on that one too)

One other thing is that, the page that contains the stats for my users is the main one that’s causing the hold up i believe. If you navigate through my site, all the other pages seem to load up lightning fast, but when you do try to access the page that contains the user stats, that’s when slow down begins. Obviously, this has some connection with the mysql db, but what could it be?

Ok. This is what I’d expect, and may be related. If there’s a peak of customers checking their stats at a certain time, then this could have an impact. Log accesses to the stats or just grep the access_log file to see when stats accesses are happening, and see whether there’s a correlation in particular to when you’re having the problem.

Also, you could turn on query logging with MySQL, or check the code if you have access to it to see what queries are run when performing the stats. Time is going to be spent between the database, processing the stats and page generation, and the database is of particular interest. Try the queries by hand with the mysql command line to see how long they take, and run an explain to see what use of indices will be made. Also see how much data is retrieved with each query. Your application may be fine, but some systems will perform simple SELECT queries and analyse all the data within PHP as opposed to making full use of database functions, groups, exploiting left joins where relevant, and using the conditional functions.

In essence, try to find out more about how your application is behaving as it’s not always obvious, and behaviour can change over time.

You could also cron frequent process listings, calls to vmstat or free, to get some extra info, and that may reveal something prior to the machine being very hosed.

Also, ensure that your drives have dma turned on. In older Linux’es, dma wasn’t activated at boot time, although it is on newer ones. You can check this as root as follows. The output will be different of course, but look for the DMA or UDMA setting. You drives may be sda rather than hda

/sbin/hdparm -i /dev/hda

/dev/hda:

Model=WDC WD800JB-00FMA0, FwRev=13.03G13, SerialNo=WD-WMAJ97099785
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=58
BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: device does not report version:

  • signifies the current active mode

You can also check drive performance with -t -T

e.g.

/sbin/hdparm -t -T /dev/hda

/dev/hda:
Timing cached reads: 2468 MB in 2.00 seconds = 1232.34 MB/sec
Timing buffered disk reads: 176 MB in 3.01 seconds = 58.38 MB/sec

/sbin/hdparm -t -T /dev/sda

/dev/sda:
Timing cached reads: 3312 MB in 2.00 seconds = 1655.90 MB/sec
Timing buffered disk reads: 170 MB in 3.01 seconds = 56.51 MB/sec

A final thought, check /var/log/messages for any signs of problems being reported there.

A final thought, check /var/log/messages for any signs of problems being reported there.

I just checked your reply, although my server isn’t loaded at the moment, I went through the logs. It looks as though there have been several “hack” attempts to log in as root.

authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=61-63-6-host253.kbtelecom.net.tw

I checked the past messages logs and it seems as though this has been going on since jan. It happens at sporadic times, and occurs in intervals of a few seconds, the duration is about 20 minutes. This wouldn’t be enough to take my server would it?

I googled the error and seems as though this is common “hack” attempt but shouldn’t be brining down my server? Upon careful examination, these hack attempts normally do not occur during the times of when my server is lagging…

Can anyone recommend any “Traffic analyzers”? Any free utilities that can accurately log and detail my traffic?

Someone suggested MRTG and CACTI, but I have the slightest clue to have this installed.

Webalizer came installed on my dedicated server, but I’m assuming it’s not as detailed as some of the ther utilities around?

With the root login attempt one, you should disable RemoteRootLogin in your sshd_config. I would recommend installing an IDS to detect and dynamically blacklist the attackers.

If you just want webstats, I recommend awstats. Google Analytics is pretty cool as well but it is slower to update as you need to wait for Google to process your logs.

I have disabled remoterootlogin, thank god. I’ll look into IDS, thanks!

Awstats is pretty similar to Webalizer isn’t it?

Webalizer is stuff of last century. :stuck_out_tongue: Awstats is newer and has a lot more features.

Btw as for IDS, I recommend snort (snort.org), plus snort-sam, mod-security and httpd-guardian (which can talk to snort-sam to dynamically block attackers).

  • there’s nothing wrong with mysql 3.23.58 you don’t HAVE to update it.
  • Don’t overlook the fact that you may have a bug in your PHP code that eats up resources. especially if your having the site suddenly stop working
  • traffic from east asia will cause you to have more httpd processes because the traffic is more latent than US traffic. ( if your server is in the US ) so expect that you need more httpd processes for that.
  • are you using worker or prefork apache? – your MaxRequestsPerChild is pretty high for apache running PHP, you should consider lowering it. find the difference in process size between the larger and smaller processes, if there is a big difference lower MaxRequestsPerChild to save ram since you need ram for mysql and to support more httpd processes.
  • don’t set MaxClients to 150 unless your server can actually handle 150 apache’s without swapping.
    if you set it shoot itself in the foot it will. so why let it. if you don’t have enough apache’s your requests will be qued and some may time out or get dropped. but if your server is dead and swapping it won’t serve anything.
  • add indexes to heavy selected not so heavily updated tables, learn about indexes and EXPLAIN SELECT queries to see where you need indexes.

Plenty, and especially when you’re looking at improving performance.