Open Source Image Archiving: Exif, IPTC, XMP and all that

We teamed up with SiteGround
To bring you up to 65% off web hosting, plus free access to the entire SitePoint Premium library (worth $99). Get SiteGround + SitePoint Premium Now

Related to this hack started taking a serious look at the available standards and Open Source tools for adding meta data to images, in the context of building archives of digital photography. With further prompting from reading this (Hey! MS are adopting someone else’s standard!), dumping some notes…

Photo Archiving: The Golden Rule

Store metadata in the image.

Actually should be more precise about the word metadata here – there are essentially three significant types of metadata when in comes to digital images, when it comes to who created it and where it’s stored.

First; the stuff your digital camera attaches to the image when you take a picture (e.g. camera make, time the picture was taken, exposure etc.), second; meta data you manually add to an image, typically at the time you download images from the camera to your PC (e.g. where the picture was taken, a description, keywords, name of the photographer etc.) and third; “grouping metadata” – how a collection of images relate to each other (e.g. they were all part of a single photo shoot or they are all family pictures). For the purposes of this discussion, the first two types of metadata can be regarded as a the same thing and it’s this information that you want stored in the image.

The third type – the “grouping” information is “relative” and, practically, can only work well if it’s stored externally from images. This may be as “lightweight” as a tree of directories on your filesystem with timestamp-based names, under which you store your images, but image archive tools may offer additional facilities in this area. Important is if you want to group a collection of images under a heading like “family”, you also want to have a keyword “family” stored in the image. In such case it may be the image archiving tool can build the groups automatically from your keywords.

But you want the first two types of metadata in the image, so that if you switch archiving tools, transfer your images to another system or forget to backup the archive database but remember the images, it won’t be lost. A decent archiving tool will probably generate it’s own database from this inline metadata, to make searches etc. efficient, but the final authority should be the image itself.

Inline Image Metadata Standards

The next question is how is data stored in an image. File formats like JPEG, RAW and TIFF, commonly used in digital photography / processing, allow additional metadata to be stored in the image file.

So here’s a “avoid too much detail” view: there are three significant inline metadata formats here – Exif, IPTC and XMP. Exif is used by most digital cameras to attach information like exposure at the point you take a picture and tools are available for adding information manually (or automatically) such as geocoding. IPTC “extends” Exif, providing a whole load more named “tags” for metadata. Both are fragile from a technical point of view (easy to destroy / corrupt) and problematic from the point of view of extending (if there’s no named “tag” for the metadata you want to store, you’re out of luck). Meanwhile, XMP, the work of Adobe, is the “holy grail”, providing XML / RDF / Dublin core goodness for storing metadata. It’s just not well supported yet, but starting to gain significant acceptance. And of course our friend IE has it’s own opinion about XMP

In practice, you need to be able to work with all three formats for effective photo archiving.

Image Archiving Apps

There are a bunch of image archiving tools out there, for any and all operating systems. A good one will allow you to manually supply additional meta data about an image (e.g. keywords like “family”, “dog” etc.) but, from a straw poll, the majority store this data separately from the image, in some kind of database. Where Ubuntu is concerned GThumb does just this – stores any user supplied meta data externally from the image (if I remember right, it’s under ~/.gnome2/gthumb).

There’s a bunch of reasons why I consider this fundamentally wrong, which become obvious the moment an image leaves your filesystem (or perhaps even changes directory / filename) plus the fact that there’s no “clear winner” in the image archiving space – there’s a good chance you’ll switch tool more than once. Bottom line: storing metadata separately from the image it refers to is just not the way to do it.

In the Open Source / Linux space, there are a couple of contenders which are getting it “right”, namely F-spot (C# / Mono) and jbrout (python / wxpython). Both are capable of storing metadata in the image but there are limitations, e.g. jbrout has no awareness of XMP. Some worthwhile blogs (and comments) to read are How to manage EXIF, XMP metadata on linux gnome kde, Managing Photographs and Photo organizing in Linux.

Image Archiving Done Right

By contrast, the model application, IMO, for how to do it right (sadly Win32 only) but free to use is PixVue.

For starters, rather than being an monolithic application, it adds a bunch of Windows Shell Namespace Extensions meaning you can work with your images in Windows Explorer – e.g. right click and “annotate” with your metadata.

As I see it “Digital workflow”, in general, is all about procedures – you need “process oriented” applications that fit into your procedure, rather than monolithic windowed views of your images, which dictate how you work. PixVue gets this right by getting out of the way and hence my desire to extend Nautilus.

PixVue also stores almost all metadata in the image, understanding EXIF, IPTC and XMP. The exception is the “grouping” metadata, which it provides something called “galleries”, which are a virtual directory hierarchy – in Explorer you can drag and drop images into a gallery (or galleries) to organise them but these are only “symbolic links” to the original files on your filesystem. You can also export your gallery structures as XML, to allow at least the chance of portability.

Also the “templates” functionality it provides helps make attaching metadata to a large collection of images as fast and painless as possible.

Finally it has excellent search facilities (via the gallery view) plus it “integrates” with (populates the indexes of) Windows search and Google desktop.

The only downside of PixVue is the source isn’t available and there doesn’t seem to be an API (it does register some COM components but, from a quick scan, these don’t allow you to drive it with, say, a Python script), so if you want to extend it to integrate with some geocoding tool, you’re out of luck.

Open Source Image Metadata Tools

So, given a less-than-ideal situation for Open Source image archiving applications, what’s available in the form of libraries, so you can hack your own?

The long and short of it is more bad news. There’s a whole bunch of libraries that do a little bit, perhaps only (partially) understanding one format (typically Exif), read only – not write or lacking support for manufacturer Exif extensions.

The situation is particularly acute when it comes to Python, which is badly lacking – there’s a few “readers” out there, and the occasional writer (no XMP writer though) but nothing that’s really complete. That’s bad news for jbrout and even more bad news for Python on Series 60 phones, where being able to tag your pictures, while you sit on a train, would be a killr app.

There are exceptions though, the #1 being Phil Harvey’s exiftool (Perl – available via CPAN). ExifTool isn’t just good – it’s excellent. It’s basically got it all – EXIF, IPTC and XMP support (read/write), support for manufacturer extensions and a common dictionary of tag names that are, to an extent, metadata-format-independent. The only downside is there doesn’t seem to be (correct me if wrong) a GUI front-end for Linux that exposes all of Exiftool’s functionality – you’re either talking command line or writing your own Perl scripts.

Another noteworthy exception Evan Hunter’s PHP JPEG Metadata Toolkit, which has EXIF, IPTC and XMP support and awareness of some manufacturer quirks. The API is also pretty simple.

The is of course plenty of other stuff around, which offer part of the picture e.g. jempbox (Java) but YMMV.

Anyway – hopefully that saves someone some time in research.