Diving Deeper into HTML5 Offline Browsing

Tweet

Recently, I published an article on one of the new features in HTML 5 called Offline Browsing  in HTML5 with ApplicationCache.

The response to that article was good, and I was asked to expand on some further points including:

  • how to decide on what files to cache
  • the implications of caching those files
  • debugging the ApplicationCache

So, that’s where this article will start: where the last one finished.  If you haven’t, you should probably read the previous article before this one.

Let’s dive into what resources you and should not add to the ApplicationCache.

What Should You Cache?

Technically, adding and removing resources from the ApplicationCache is not difficult.  You specify what resources you want cached in the CACHE: section and that’s it.

Sometimes the difficult decision is which resources you should and should not add to the ApplicationCache.

For me, obvious resources to cache are the following:

  • CSS files
  • JavaScript files
  • Images
  • videos

These are perfect candidates for offline caching.  There’s nothing more frustrating when you’re working offline to see missing images, or worse yet, missing CSS files that cause the page to render incorrectly.

Now what about working with remote files and how should they be handled?  Well when you’re working with remote files, there are two sides to the story.

If the website is not running under SSL, remote resources can be cached.  In the following scenario, local resources as well as the remote jQuery library are all added to the ApplicationCache.

CACHE MANIFEST

# Created on 20 October 2011
CACHE:
clock.css
clock.js

# Caching the remote file

http://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js

However if the website is running over SSL, the resources listed in the ApplicationCache must be local resources.  Note that Google Chrome is the exception to this rule – Chrome will still cache the remote resources as long as they are served over SSL too.

Confused?  I wish all the browsers would play by the same rules.

Let’s get back to deciding which resources should or should not be cached.

It’s important to develop a plan of what features you want available to the user while they’re offline.  If your website interacts with a database for example – and the majority of websites do have some sort of database connectivity these days – pages that interact with the database are not a good candidate for offline caching, because as soon as they try to connect to the database, they’ll fail.

This is where developing a plan comes in.  If you do cache those pages and the user is offline, you’d need to store the user’s data in another location.  That location could be something like a cookie, or you could store it in localStorage.  That’s another area of HTML5 that is really cool!

Once you’ve decided which pages you want cached, you need to ensure you cache any resources that the page needs to run, so any referenced CSS, JavaScript, images, video or flash widgets that the page renders.

If you don’t, then when the user goes offline, they’ll see a broken page … and nobody wants that.

Then What Shouldn’t You Cache?

The obvious choices for not caching are:

  • pages that interact with database storage
  • pages that interact with web services
  • pages that require authentication

As good as the ApplicationCache is, in reality you need to communicate with external systems to continue working.  This is especially true in the enterprise space.  An offline website is great until something’s not working and the business stops making money.

Debugging the Cache Manifest

Now you’ve got resources in the cache, how do you find out what’s in there if you ever need to debug it?

Thankfully, Google Chrome has an address you can navigate to in order to view the cache.  Navigating to chrome://appcache-internals in Chrome opens the AppCache Internals page.

FIG1

As you can see, this page lists the current size of the cache manifest, when it was created, when it was updated and best of all it lists the resources inside the cache.  This is invaluable when you need to see what exactly you have stored in your cache.

I’ve found by clearing your temporary internet files, the resources in the cache are also removed, but depending on which website you go to, it may suggest otherwise.  One sure way of clearing the cache through Chrome is to click Remove.  This guarantees all the resources will be removed.

Things I Don’t Like About ApplicationCache

As good as the ApplicationCache is, there are things I don’t like about it.

At the top of my list is that it requires a special MIME type for the manifest file.   This is fine if you have access to your web server, but on shared servers, sometimes this isn’t possible.  If you don’t create the MIME type, you’re not going to get anywhere.

Another side effect of using the ApplicationCache is when the cached files are used, compared to when they’re not.  Take for example a page called default.html.  If this page is cached, even if the user is online, they’ll use this file.  So how do you notify the browser to update the cache?  You need to notify the user, and the page needs to be refreshed.  We live in a world where Ajax is the norm.  There has to be a better way.

Caching CSS files is fine, but if you reference any images from the CSS file, they aren’t cached automatically.  They must be explicitly referenced in the manifest.

Limits for ApplicationCache sizes also vary.  While the specification places no limits on the size an ApplicationCache can be, different browsers and different devices do have different limits. Currently, the limits are:

  • Safari desktop browser (Mac and Windows) have no limit
  • Mobile Safari has a 10MB limit
  • Chrome has a 5MB limit
  • Android browser has no limit to ApplicationCache size
  • Firefox desktop has unlimited ApplicationCache size
  • Opera’s ApplicationCache limit can be managed by the user, but has a default size of 50MB

Manifest Validation

The manifest file is easy to create, and it’s even easier to get wrong.

Incorrectly referencing files will cause you a headache.  Luckily there’s a Cache Manifest Validator that can help ease the pain of debugging errors in your manifest file.  It’s a great tool so please bookmark it and use it.

Bring It Altogether

A final thought. A lot of my work focuses on working with large enterprise customers.  As soon as an application is offline, alarm bells start ringing! There must be a problem.

The offline capabilities in HTML5 through the ApplicationCache certainly do have great potential, but it doesn’t yet cover all the bases.  And anyone contemplating making offline browsing available would be well advised to keep in mind the ingrained habits some users might have.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Gaurav Chandra

    Hey, nice article. I am facing a problem in google chrome. I have a codeigniter based php application and the browser is caching even the php pages and don’t update the data from the server. How can I exclude those pages? Since, CI is based on MVC pattern, the manifest needs to have the .php extension for the pages.

  • http://www.logicking.com Yuri

    Malcolm, thank you very much for the great article!

    Maybe you will be able to answer my question: is it possible to load different versions of appcache for different clients without performing server side logic. I have the HTML5 game and want to have two different sets of graphics cached for mobiles and for tablets devices.

    Full version of the question is here:
    http://stackoverflow.com/questions/8125224/variable-html5-offline-appcache-for-different-devices

  • Malcolm Sheridan

    @Yuri
    Check out Stackoverflow. Your question has been answred. The simple answer is no you can’t.

    @Gaurav
    If the browser is caching all of the pages, can you set an expires header?