Web page compression is not a new technology, but it has recently gained higher recognition in the minds of IT administrators and managers because of the rapid ROI it generates. Compression extensions exist for most of the major Web server platforms, but in this article I’ll focus on the Open Source Apache and
The idea behind GZIP-encoding documents is very straightforward. Take a file that is to be transmitted to a Web client, and send a compressed version of the data, rather than the raw file. Depending on the size of the file, the compressed version can run anywhere from 50% to 20% of the original file size.
In Apache, this can be achieved using Content Negotiation, which requires that two separate sets of HTML files be generated: one for clients who can handle GZIP-encoding, and one for those who can’t. This solution sends gzip-encoded files to clients who understand them, but does not allow for the compression of dynamically-generated pages.
A More Graceful Solution
A more graceful solution is the use of
mod_gzip, one of the many additional modules available for Apache. I consider it one of the overlooked gems for designing a high-performance Web server. Using this module, configured file types will be compressed using GZIP-encoding after they’ve been processed by all of Apache’s other modules, and before they’re sent to the client. The compressed data that’s generated reduces the number of bytes transferred to the client, without any loss in the structure or content of the original, uncompressed document.
mod_gzipcan be compiled into Apache as either a static or dynamic module -- I've chosen to compile it as a dynamic module in my own server. The advantage of using
mod_gzipis that this method doesn't require anything to be done on the client side in order to make it work. As for the server side, all the server or site administrator has to do is:
- compile the module,
- edit the appropriate configuration directives that were added to the httpd.conf file,
- enable the module in the httpd.conf file, and
- restart the server.
In less than 10 minutes, you can be serving HTML files using GZIP-encoding.
How it Works
When a request is received from a client, Apache determines if
mod_gzip should be invoked by noting whether the "Accept-Encoding" HTTP request header has been sent by the client. If the client sends the header (shown below),
mod_gzip will compress the output of all configured file types when they're sent to the client.
This client header announces to Apache that the client will understand files that have been GZIP-encoded.
mod_gzip then processes the outgoing content and includes the following server response headers.
These server response headers announce that the content returned from the server is GZIP-encoded, but that when the content is expanded by the client application, it should be treated as a standard HTML file. Not only is this successful for static HTML files, but it can also be applied to pages that contain dynamic elements, such as those produced by Server-Side Includes (SSI), PHP, and other dynamic page generation methods. You can also use it to compress your Cascading Stylesheets (CSS) and plain text files. My httpd.conf file sets the following configuration for
mod_gzip_item_exclude file .js$
mod_gzip_item_exclude mime ^text/css$
mod_gzip_item_include file .html$
mod_gzip_item_include file .shtml$
mod_gzip_item_include file .php$
mod_gzip_item_include mime ^text/html$
mod_gzip_item_include file .txt$
mod_gzip_item_include mime ^text/plain$
mod_gzip_item_include file .css$
mod_gzip_item_include mime ^text/css$
I've had limited success compressing other file formats, mainly because Microsoft's Internet Explorer appears to examine the "Content-Type" header message before it examines the "Content-Encoding" header message. So, say you configure your server to GZIP-encode PDF files using the following
mod_gzip_item_include file .pdf$
mod_gzip_item_include mime ^application/pdf$
This will work perfectly in both Mozilla and Opera, as these applications decode the GZIP-encoded content before they pass it along to the PDF reader (most people use Adobe Acrobat Reader).
However, Internet Explorer simply passes the GZIP-encoded content directly to the PDF reader. Once this issue is rectified in the MSIE code, you are likely to see a lot more Web servers serving a broader range of GZIP-encoded content.
As you can see, GZIP-encoded documents can produce substantial savings in bandwidth usage:
Uncompressed File Size: 3122 bytes
Compressed File Size: 1578 bytes
Uncompressed File Size: 56279 bytes
Compressed File Size: 16286 bytes
As a server administrator, you may be concerned that
mod_gzip will place a heavy burden on your systems as they compress files on the fly. I'd like to point out that this does not seem to concern the administrators of Slashdot, one of the busiest Web servers on the Internet, who use
mod_gzip in their very high-traffic environment.
mod_gzip project page is located at SourceForge. Try it out for yourself.