The sysadmin view on “Why PHP”

Tweet

A funny from the Python crowd: phpfilter – PHP “support” under CherryPy. There is a serious side to that though – it’s spitting out something that looks like a PHP parse error – i.e. this is a developer problem (e.g. someone ftp’d a PHP straight onto their live web server for “testing”), not a runtime error.

More to the point, when was the last time you saw a PHP runtime error take down an entire application or web server? And no – “MySQL Connection Failed: Can’t connect to local MySQL server” doesn’t count – PHP and the web server are still running – the MySQL server (or otherwise) is to blame.

With PHP it’s very hard for a script to take down the runtime environment – the web server – I’d argue that you’d have to be deliberately trying to do so, perhaps filling up disk space or otherwise. Innocent mistakes, specific instances of runtime problems (e.g. script execution too long) and bugs remain local to specific requests and the PHP script handling them. On the next request, we begin again from scratch.

It may now be reasonable to claim that Apache + mod_php has served more HTTP requests for dynamic pages than any other comparable environment. Despite warts and all, this is tested sortware simply by weight of numbers. That translates into a platform which costs little to keep running and less chance of a wakeup call at 2am.

Anyway – ran into an excellent blog recently: FastCGI, SCGI, and Apache: Background and Future discussing the options given new demand for FastCGI with frameworks like Rails, seen from the eyes of a sysadmin. To a great extent it also explains why we’ve ended up with PHP.

To really grasp the discussion Mark Mayo is making it’s worth having a rough idea of the most common technical approaches used to implement servers able to handle multiple web page requests (concurrently) and pass the request through to a program (e.g. a PHP script) for processing. Note this isn’t meant to be an in depth guide to multitasking – it’s more my dummy understanding / view. A good place to start if you want something more meaty is on Wikipedia here.

  • Forking: an HTTP server process spawns a child processes to handle each incoming request, the children either expiring (exit) or returning to a “pool” for reuse, when the request is finished (Apache 1.3x does the latter).

    With Apache + CGI scripts, the Apache child processes must, in turn, fork further child processes within which the CGI program runs, so it get’s pretty slow. FastCGI eliminates that by keeping the CGI process running for further requests (but needs a bunch more complexity to do so).

    With mod_php, the script is run inside the Apache child process itself. This reduces the overhead of a further fork and means the PHP “runtime” only needs to be loaded when an Apache child is created.

    Forking is nice in terms of being relatively easy to implement and that (for the most part) multitasking issues are not pushed onto application developers.

    Another thing that makes this model popular with sysadmins is child processes can “crash” (e.g. that infinite loop in your PHP script) without taking out the main server process – this is probably the number 1 reason why shared hosts are willing to install mod_php – they don’t have to keep restarting the server as a result of what their customers did to it.

    Also, particular to CGI, it’s easier to push security issues off to the operating system, allowing user scripts to be run with their permissions rather than the permissions of the web server user.

    This is not the case with mod_php, which violates normal UNIX filesystem security. PHP scripts only have to be readable on the filesystem for mod_php to execute them.

    The downside of forking is it’s (relatively) slow / expensive to fork a new process and each child gobbles up memory and resources while it’s running, where it might be more efficient to share. The mod_php approach is the simplest way to keep this cost to a minimum.

    Also Windows doesn’t really support UNIX-style forking, placing greater emphasis on threading, which may be a problem if you want your server to run well under Windows.

  • Threads: threads run inside a single process and work on the basis of time-sharing: each thread gets a certain amount of time to do stuff. Threads are now used in Apache 2.x and are also common in Java application servers (which are themselves HTTP servers)

    Threads have the advantage of having a lower cost to “create” (e.g. faster) than forking and it’s easier to “share” between threads (e.g. sharing a variable). Side note: when the Java guys say they’ve got a web server which performs better than PHP, they’re probably telling the truth (but remember performance != scaling)

    On the downside some argue that threads are very tricky to code, with hard to debug problems like deadlocks and race conditions being too easy to create. This may only be an issue for the developers on the web server – you don’t need to push threads onto people writing apps to run under your web server – but the more complexity, the more bugs etc.

    Also (more of an implementation detail), if each thread in the server is being given it’s own I/O stream for an incoming request, this is likely to gobble memory / resources plus most operating systems only support a limited number of threads running concurrently – for a serious discussion see The C10K problem (excellent read in general, in fact).

    The other issue with threads and web servers is there’s a better chance of a given thread taking down the whole server, although that’s probably more of an implementation detail.

  • Asynchronous I/O: it’s common in programming to use sychronous (blocking) I/O – you read from a “stream” and your code (process) stops execution until the read is complete.

    Asynchronous I/O uses non-blocking system calls to allow your code (process) to continue doing other things (e.g. more I/O) in parallel. Callbacks (or similar) are then only executed when a specific events happens (e.g. end of file). And these days we’re all familiar with this way of doing things thanks to AJAX right ? ;).

    Perhaps the foremost example of async I/O is Python’s twisted framework, which I’d guess we hear more and more of in the next couple of years.

    Async I/O is nice in that it does not have the limits threading does and probably results in more efficient use of resources. It may (depending on your API – at lower levels, it’s harder) also be easier to write code this way although it’s still not as easy as forking – much of what twisted does is about providing a nice API for async I/O, solving most concurrency issues for you so you can focus on higher level problems.

    I guess you also have the risk that “user code” takes down the whole server with Async I/O – haven’t looked at how twisted deals with this – perhaps this is just implementation detail.

    BTW you may also be surprised to note that more recent PHP versions also have some support for async I/O. See here (PDF) for more info.

Of course it’s definately not as clear cut as I’m suggesting. For starters, what type of developer you are will influence your world view: Linux kernel developers would see different problems and boundaries to language and library designers who in turn see a different light to application developers consuming the available APIs. And a given web server would likely use more than one approach – perhaps all three.

What does seem to be the case is async I/O is only now coming of age / popularity in web servers. Meanwhile FastCGI is back in demand and development, given Rails, web.py and similar. Despite that, mod_php still (today) represents the lesser of all evils for sysadmins – not the perfect solution (e.g. security headaches) but the best compromise all round – at least for the forseeable future.

BTW: if you’re feeling like another angle on PHP’s past see Adam Trachtenberg’s: The battle for middleware: PHP versus the world (PDF).

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Pingback: » a simple wiki with web.py Affiliate & Blog Tips at ZillionBits make money online

  • R. U. Serious

    With mod_php, the script is run inside the Apache child process itself. This reduces the overhead of a further fork and means the PHP “runtime” only needs to be loaded when an Apache child is created.

    It also means, that all of your images, style-sheets and static files are served by those “fat” child-processes that include an instanec of the php interpreter as well, which is why you often hear the advice of moving static files/images to a different webserver. Hence why many people are so happy with [lighttpd+]php as fcgi (it uses less system resources).

  • http://www.dvdverdict.com/ mjackson42

    It also means, that all of your images, style-sheets and static files are served by those “fat” child-processes that include an instanec of the php interpreter as well, which is why you often hear the advice of moving static files/images to a different webserver. Hence why many people are so happy with [lighttpd+]php as fcgi (it uses less system resources).

    But doesn’t keepalive mitigate the impact? Most browsers are going to make requests serially, not in parallel, so they’ll simply reuse the already open connection (which would mean, on the server end, only one fat process per concurrent user).

    I wonder if it would be practical for Apache (or other web servers) to implement some structure like this internally: Some child processes would load the full operating environment, some would run pretty lean, and incoming requests would be sent to the appropriate child based on…extension?

  • http://timvw.madoka.be timvw

    Feel free to correct me:

    The advantage of multiple client processes (or threads) is that it’s relatively easier to distribute them to other cpu’s than the case where there is only one proces doing async io.

    The advantage of a single proces with async io is that you can’t run into race conditions…

  • http://timvw.madoka.be timvw

    sorry for the useless comment (should have read the article first) but can’t find the delete button…

  • andreask2

    But doesn’t keepalive mitigate the impact? Most browsers are going to make requests serially, not in parallel, so they’ll simply reuse the already open connection (which would mean, on the server end, only one fat process per concurrent user).

    No, keepalive makes it even worse. A process only handles one user at a time, so every fat process (perhaps with a persistent Oracle connection) has to wait, until keepalive timeout is over, before it can handle the next request of another user (doing nothing while waiting). So most people recommend to switch off keepalive when you have a lot of heavy PHP requests, because the TCP handshake is the lesser evil.

    I wonder if it would be practical for Apache (or other web servers) to implement some structure like this internally: Some child processes would load the full operating environment, some would run pretty lean, and incoming requests would be sent to the appropriate child based on…extension?

    You can simply use two apache instances, perhaps with a different hostname for static requests, or by using mod_proxy or squid as transparent reverse proxy.

    But the better idea is IMO to use Lighttpd with FastCGI, because it uses very fast asynchronous I/O (non-blocking), can handle static requests very fast and resource efficient, and passes PHP requests to a number of dedicated, persistent FastCGI/PHP processes (loadbalanced). Lighttpd can also control the FastCGI/PHP processes, you can configure MAX and MIN processes and Lighttpd cares for that. If a PHP process crashes, it will not take down Lighttpd which can spawn a new PHP process in this case.

    From the security point of view, it’s also possible to start a PHP process with a different user. Lighttpd + FastCGI is the first real Apache + mod_php replacement I’ve found so far (for my needs), it’s stable, is a little bit faster and by far more resource efficient.

    And it’s very simple to set up with FastCGI:
    http://www.lighttpd.net/documentation/fastcgi.html
    http://trac.lighttpd.net/trac/wiki/MigratingFromApache

  • http://www.phpism.net Maarten Manders

    It also means, that all of your images, style-sheets and static files are served by those “fat” child-processes that include an instanec of the php interpreter as well

    Is this correct? I don’t think that Apache fires a whole PHP process for a GET /img/image.jpeg.

    But doesn’t keepalive mitigate the impact?

    Keepalive minimizes connection overhead. It is, however, evil when your server is running under heavy load, as users keep hogging server resources. During that time, an Apache’s performance depends mostly on memory, which is mostly wasted on (KeepAlive-)waiting processes.

  • http://www.oscarm.org/ omerida

    With PHP it’s very hard for a script to take down the runtime environment—the web server—I’d argue that you’d have to be deliberately trying to do so, perhaps filling up disk space or otherwise.

    I’ve seen it done non-intentionally or maliciously, and of course you can do these in any language. One example, logging error or other messages to the filesystem but forgetting to rotate the log file. That’s particularly irritating because the file might quietly be growing while your server is happily serving pages until you get the page that the web server isn’t responding and won’t restart. A second is having the server send you and email after every request that encounters a fatal error. Encounter a fatal error during very heavy traffic and you’ve just generated tons of mail for your server to deliver.

  • http://www.phppatterns.com HarryF

    It also means, that all of your images, style-sheets and static files are served by those “fat” child-processes that include an instanec of the php interpreter as well, which is why you often hear the advice of moving static files/images to a different webserver. Hence why many people are so happy with [lighttpd+]php as fcgi (it uses less system resources).

    That’s true although PHP’s approach of refreshing the interpreter on each request I believe is what makes it more appealing vs. mod_perl and similar – no globals etc. Again it’s aiming for the best compromise I guess.

    And here PHP’s persistent resources are an issue – George Schlossnagle also suggests a lightweight HTTP server on a subdomain for static content here.

    But you’re right, I underplayed the memory issue. Also probably underplaying the security issue for shared hosts – an interesting discussion here;

    > On Dec 22, 2005, at 11:44 PM, pbdgny wrote:

    shared hosts were all on mod_php for a while because thats what they thought they should do. but a funny thing happend – they realized that mod_php lets user_a acces all of user_b’s files — because everything runs as the apache instance user and is read/writeable by it. so most hosts started migrating to PHP/CGI via FasctCGI, so account holders can more easily run their scripts as a shell user.

    There’s also some interesting research thanks to Ben Ramsey – Peruser MPM for Apache – basically still no ideal solution for shared hosts.

  • http://www.phppatterns.com HarryF

    While I’m here, here’s a useful link: Scalable Apache for Beginners.

  • LinhGB

    But you’re right, I underplayed the memory issue. Also probably underplaying the security issue for shared hosts—an interesting discussion here

    What about open_basedir?

  • http://www.phppatterns.com HarryF

    What about open_basedir?

    In short, it takes shared hosts into a cat and mouse game.

    open_basedir only effects PHP, which is still running with the same user as Apache. Via exec you might be able to execute standard Unix programs to which the restriction wouldn’t apply. Or if a host has locked that down but still supports Perl, you could have PHP execute them for you. Of course you could try to block that with safe_mode but, in reality, think safe_mode (and open_basedir) only discourages those who probably aren’t that dangerous anyway. Well explained here: http://ilia.ws/archives/18-PHPs-safe_mode-or-how-not-to-implement-security.html

  • http://www.calcResult.co.uk omnicity

    Is this correct? I don’t think that Apache fires a whole PHP process for a GET /img/image.jpeg.

    I don’t think so either – HTTP is supposed to be stateless, so that each resource requires an individual request. My understanding was that Apache evaluated each request on its own merits, regardless of source – the TCP/IP handshake that was mentioned is carried out at a lower layer than any HTTP processing.

    If this is indeed an issue can anyone provide a link to the evidance etc?

  • Pingback: Natalian » Blog Archive » Deploying is hard

  • ahmed bakayoko

    i don’t any site 2 put

  • Pingback: Internet Alchemy Infinite Scalability

  • 59DhEIWFSa

    FwTzTc5hQ4 heyokvgvJD wDD5AI6HpIOd8M