I’ve been running my site for a couple years and some of my older years for about 5 years. Now I’m shocked to see that I’ve looked over the ability to use GZIP or MOD_Deflate.
Now, what are the catches to using these? Any drawbacks? I’ve tested the GZIP function and it actually seems to make page response to slow down. My server is pretty busy, with about 180K page requests per day, and about 1 million additional requests to our database thanks to our API and embeds.
With a site that busy, should I avoid using these? And, if I do decide to use some sort of compress feature, should i do the PHP GZIP route or the htaccess mod_deflate option?
Any further information on this would be most helpful.
If all you send is a last modified or etag, since the client doesn’t know how long the item will be fresh for, it will generally keep asking the server on each page load, or at least every so often. The server can respond with not modified, so at least it doesn’t need to be downloaded again, but you still incur the http request.
If you send an expires or max-age, the client can simply assume it’s fresh, not needing to keep checking with the server for freshness confirmation. This is way better from a performance point of view.
So ya, always send an expires and max-age if it’s something that should be cached for a while. You should still send a last-modified or etag if you can though.
You’ll still see clients re asking for stuff they were told is cacheable under certain circumstances. But, it will generally be far less often.
Don’t use it from php, instead use mod_deflate of Apache web server or similar mod if it’s not Apache server.
It’s faster to use this as a Apache module than as output filter in php.
Generally, you should be using it, there are no drawbacks. Apache lets you configure minimal file size after which the gzip should kick in and what file types to gzip, for example, you should only gzip text files, html, css, js, etc.
In my experience the CPU overhead involved with compressing documents on the fly negates any reduced bandwidth benefits. At least as far as requests per-second is concerned.
I’ve had some success with using a combination of static HTML and GZ files. I have the server check for an Accept-Encoding HTTP header, if it finds one it checks my static cache directory for a GZ copy to return, if it doesn’t find the header it looks in that cache directory for a static HTML file to return. Failing both of those it will rewrite the pretty URL into normal GET variable form and let PHP generate everything on the fly and return whatever it can after generating everything.
Joebert never said GZIP has anyting to do with pretty URL’s. What he said is:
His website uses pretty URL’s
Upon request of a page, he’s looking in a cache directory (using said pretty url) to see if there is a cached version
If the browser accepts GZIP he sends a GZIP’ed version of the website from the cache, if not, he sends HTML from the cache
If there is no cached version, he rewrites the pretty URL to GET parameters, generates the page, stores it in cache, and then serves it.
He does this so his webserver does not have to GZIP pages on the fly (that’s his whole point). This idea is pretty brilliant actually, so by all means, do not ignore his post!
I’m not 100% sure, but I think that that might be because the server has to collect all the output before compressing and sending it, and then the browser has to spend a little time inflating the content again.
But if it saves a bunch of band-width, then I think it’s a good idea to use compression
Not sure about other hosting providers but BlueHost/HostMonster/FastDomains have configured their servers so as to use it when there is sufficient CPU available for it to run efficiently and to automatically turn off if the load on the CPU is such that it would make the downloads slower rather than faster.
Buffering is the main culprit when it comes to a delay. The compression itself is very fast. zlib is capable of streaming, but there’s still chunks being buffered to some extent. Not to say there aren’t other sources of potential buffering, but I wouldn’t be surprised if you get unlucky with the way the buffer windows in various layers align and end up delaying some data a tiny, and it might be one chunk of data that lets the browser start asynchronously fetching that external resource like a css file while it waits for the next chunk.
This can be overcome, but it’s super tuning.
It far more likely that the entire thing is probably being buffered if you’ve actually noticed a delay. If a content-length http header got sent, well, the only way the length could have been measured is if something buffered the whole thing.
Anyway, you can make it not buffer the whole thing.
Take a look at the mod_deflate manual. As is common with the apache manuals it’s quite extensive and tells you everything you need to know in a clear and precise manner
Pretty much. The pretty URL part is actually really important, because I use those URI for the file path in my cache.
Realsitically, I have a static site that’s pre-compressed and has the ability to regenerate itself automatically if anything in the layout changes.
There is a small drawback in storing both static HTML and static GZ versions of files, the addition of the GZ versions force me to use roughly 30% more disk space. However, this is a small price to pay considering I nearly double the number of requests per second I can get with static HTML alone or static HTML with mod_deflate turned on.
Of course you need to sort the GET variabeles (alphabetically) if you do this, to avoid redundant storage of a-1-b-2-c-3, b-2-a-1-c-3, and the 4 other combinations.
And yes, you need the dashes (or some other character), because otherwise you can’t discern ?a=22&col=1 from ?a=2&2col=1
The ETag removal and the expiry information helps improve the caching… but the one you are after is the mod_deflate stuff, which if you see my example I target all HTML, plaintext, XML, CSS and JavaScript (the stuff which can be easily zipped) - You can of course add more file types in if you like. Hope it’s useful
casbboy, you can add ANY content-type into the mod_deflate you like! However it should be pointed out that textual content (like HTML or PHP which contain text code) is what the gzip was designed to compress so you should probably stick to that style of content. For example you could add SVG (as it’s textual code) but PNG wouldn’t be appropriate. As for the expires header, I chose that purely as a default from the Apache documentation on the expire information, I personally recommend having a 3-6 month expiry date (which ensures that identical content has a long enough shelf life).
Doesn’t that mean you have to update the .htaccess quite often?
When april 15 is in the past, cache would be completely disabled (because the content is expired).
How do you deal with this?
When would I want to remind myself to go into htaccess and push that date forward (another three months) again? On June 3rd? Or June 4th? Or should I do it on May 31st, a few days before the date?
Thanks
Ryan
*Also curious: Does gzip of pages affect SEO/SERPS whatsoever?
You would need to update the htaccess file, but if (say) you set the expire dates to 6 months away (which is considered acceptable for a long expiration) it’s not exactly going to be a hard task to quickly change a date is it, it’s not like a daily task, it’s 6 months! Setting header expiry dates requires a date to be added, and you don’t want to make it years into the future. And no gzip doesn’t affect SEO, all it affects is the page size from server to end client machine.