Open Source Image Archiving: Exif, IPTC, XMP and all that

Related to this hack started taking a serious look at the available standards and Open Source tools for adding meta data to images, in the context of building archives of digital photography. With further prompting from reading this (Hey! MS are adopting someone else’s standard!), dumping some notes…

Photo Archiving: The Golden Rule

Store metadata in the image.

Actually should be more precise about the word metadata here – there are essentially three significant types of metadata when in comes to digital images, when it comes to who created it and where it’s stored.

First; the stuff your digital camera attaches to the image when you take a picture (e.g. camera make, time the picture was taken, exposure etc.), second; meta data you manually add to an image, typically at the time you download images from the camera to your PC (e.g. where the picture was taken, a description, keywords, name of the photographer etc.) and third; “grouping metadata” – how a collection of images relate to each other (e.g. they were all part of a single photo shoot or they are all family pictures). For the purposes of this discussion, the first two types of metadata can be regarded as a the same thing and it’s this information that you want stored in the image.

The third type – the “grouping” information is “relative” and, practically, can only work well if it’s stored externally from images. This may be as “lightweight” as a tree of directories on your filesystem with timestamp-based names, under which you store your images, but image archive tools may offer additional facilities in this area. Important is if you want to group a collection of images under a heading like “family”, you also want to have a keyword “family” stored in the image. In such case it may be the image archiving tool can build the groups automatically from your keywords.

But you want the first two types of metadata in the image, so that if you switch archiving tools, transfer your images to another system or forget to backup the archive database but remember the images, it won’t be lost. A decent archiving tool will probably generate it’s own database from this inline metadata, to make searches etc. efficient, but the final authority should be the image itself.

Inline Image Metadata Standards

The next question is how is data stored in an image. File formats like JPEG, RAW and TIFF, commonly used in digital photography / processing, allow additional metadata to be stored in the image file.

So here’s a “avoid too much detail” view: there are three significant inline metadata formats here – Exif, IPTC and XMP. Exif is used by most digital cameras to attach information like exposure at the point you take a picture and tools are available for adding information manually (or automatically) such as geocoding. IPTC “extends” Exif, providing a whole load more named “tags” for metadata. Both are fragile from a technical point of view (easy to destroy / corrupt) and problematic from the point of view of extending (if there’s no named “tag” for the metadata you want to store, you’re out of luck). Meanwhile, XMP, the work of Adobe, is the “holy grail”, providing XML / RDF / Dublin core goodness for storing metadata. It’s just not well supported yet, but starting to gain significant acceptance. And of course our friend IE has it’s own opinion about XMP

In practice, you need to be able to work with all three formats for effective photo archiving.

Image Archiving Apps

There are a bunch of image archiving tools out there, for any and all operating systems. A good one will allow you to manually supply additional meta data about an image (e.g. keywords like “family”, “dog” etc.) but, from a straw poll, the majority store this data separately from the image, in some kind of database. Where Ubuntu is concerned GThumb does just this – stores any user supplied meta data externally from the image (if I remember right, it’s under ~/.gnome2/gthumb).

There’s a bunch of reasons why I consider this fundamentally wrong, which become obvious the moment an image leaves your filesystem (or perhaps even changes directory / filename) plus the fact that there’s no “clear winner” in the image archiving space – there’s a good chance you’ll switch tool more than once. Bottom line: storing metadata separately from the image it refers to is just not the way to do it.

In the Open Source / Linux space, there are a couple of contenders which are getting it “right”, namely F-spot (C# / Mono) and jbrout (python / wxpython). Both are capable of storing metadata in the image but there are limitations, e.g. jbrout has no awareness of XMP. Some worthwhile blogs (and comments) to read are How to manage EXIF, XMP metadata on linux gnome kde, Managing Photographs and Photo organizing in Linux.

Image Archiving Done Right

By contrast, the model application, IMO, for how to do it right (sadly Win32 only) but free to use is PixVue.

For starters, rather than being an monolithic application, it adds a bunch of Windows Shell Namespace Extensions meaning you can work with your images in Windows Explorer – e.g. right click and “annotate” with your metadata.

As I see it “Digital workflow”, in general, is all about procedures – you need “process oriented” applications that fit into your procedure, rather than monolithic windowed views of your images, which dictate how you work. PixVue gets this right by getting out of the way and hence my desire to extend Nautilus.

PixVue also stores almost all metadata in the image, understanding EXIF, IPTC and XMP. The exception is the “grouping” metadata, which it provides something called “galleries”, which are a virtual directory hierarchy – in Explorer you can drag and drop images into a gallery (or galleries) to organise them but these are only “symbolic links” to the original files on your filesystem. You can also export your gallery structures as XML, to allow at least the chance of portability.

Also the “templates” functionality it provides helps make attaching metadata to a large collection of images as fast and painless as possible.

Finally it has excellent search facilities (via the gallery view) plus it “integrates” with (populates the indexes of) Windows search and Google desktop.

The only downside of PixVue is the source isn’t available and there doesn’t seem to be an API (it does register some COM components but, from a quick scan, these don’t allow you to drive it with, say, a Python script), so if you want to extend it to integrate with some geocoding tool, you’re out of luck.

Open Source Image Metadata Tools

So, given a less-than-ideal situation for Open Source image archiving applications, what’s available in the form of libraries, so you can hack your own?

The long and short of it is more bad news. There’s a whole bunch of libraries that do a little bit, perhaps only (partially) understanding one format (typically Exif), read only – not write or lacking support for manufacturer Exif extensions.

The situation is particularly acute when it comes to Python, which is badly lacking – there’s a few “readers” out there, and the occasional writer (no XMP writer though) but nothing that’s really complete. That’s bad news for jbrout and even more bad news for Python on Series 60 phones, where being able to tag your pictures, while you sit on a train, would be a killr app.

There are exceptions though, the #1 being Phil Harvey’s exiftool (Perl – available via CPAN). ExifTool isn’t just good – it’s excellent. It’s basically got it all – EXIF, IPTC and XMP support (read/write), support for manufacturer extensions and a common dictionary of tag names that are, to an extent, metadata-format-independent. The only downside is there doesn’t seem to be (correct me if wrong) a GUI front-end for Linux that exposes all of Exiftool’s functionality – you’re either talking command line or writing your own Perl scripts.

Another noteworthy exception Evan Hunter’s PHP JPEG Metadata Toolkit, which has EXIF, IPTC and XMP support and awareness of some manufacturer quirks. The API is also pretty simple.

The is of course plenty of other stuff around, which offer part of the picture e.g. jempbox (Java) but YMMV.

Anyway – hopefully that saves someone some time in research.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • PaulPr1

    Have you seen imgSeek ? It has nice IPTC/EXIF support and allows querying of images by similarity.

  • Angelo

    exiv2 is a very powerful open source library written in C for EXIF/IPTC manipulation.
    It is cross platform.

    Also, there is iTag which is unfortunately Windows only but a quite nice bulk IPTC editor.

  • http://www.phppatterns.com HarryF

    Have you seen imgSeek ? It has nice IPTC/EXIF support and allows querying of images by similarity.

    Had forgotten about imgSeek – looked at it a long time ago – the idea of “content based” searching is very cool although I hadn’t got as far as deciding out how well in practice – would be interested to hear experiences.

    Unfortunately it’s not updating the image metadata – only extacting existing image metadata then storing a copy / updates elsewhere (from a quick glance at the source – seem to be a DB)

  • Mark A

    >>>So here’s a “avoid too much detail” view:

  • John

    I’m new to organising photos (exif and xmp are new to me)

    Thanks for a great intro and a link to Pixview.
    I searched yesterday with found nothing as helpful as this.

  • lsolesen

    I enjoy PixVue. However, there is a thing I do not understand. If you hit Properties on your pictures in WinXP without any program installed, you are able to add keywords, titles, author and stuff like that.

    Then I installed PixVue and used it to add a couple of keywords to some of my pictures. However, the information PixVue store doesn’t appear on the initial list (which was native in WinXP). Is that because the information entered in Windows isn’t e.g. EXIF?

  • http://www.phppatterns.com HarryF

    However, the information PixVue store doesn’t appear on the initial list (which was native in WinXP). Is that because the information entered in Windows isn’t e.g. EXIF?

    Can only guess here but probably WinXP only has EXIF support while most of what PixVue writes is IPTC or XMP. For when viewing in standard Windows Explorer, your meta is in the image but Explorer can’t see it.

    Again a dumb view of things but EXIF supports the “minimum” information about the picture (e.g. exposure) while IPTC and XMP provide space to store things like “Creator’s Jobtitle”

  • me

    I enjoy PixVue. However, there is a thing I do not understand. If you hit Properties on your pictures in WinXP without any program installed, you are able to add keywords, titles, author and stuff like that.

    The properties you see in windows XP when you right click are not stored in the file itself in most cases.

    See ntfs “file properties” in your favorite search engine.

    I don’t think it’s a particularly good place to store data. Not a lot of software reads these properties, also, if you move the file to another non-NTFS file system, that info will be lost. For archiving the info to burn to CD, you can save those properties in rar files using winrar, but I still think there is many better solutions for metadata.

    • SilversleevesX

      ExifTool reads them.

      Check the documentation under ‘XPTags’.

      BZT

  • lsolesen

    > but I still think there is many better solutions for metadata.

    What solutions. I am very interested, because I am just about to sort and store a huge bunch of images :)

  • salguod

    i use mapivi. It’s the only thing I have found so far which lets me edit the IPTC. I just wish digikam or gthumb would simply add the iption to edit IPTC info.

    • SilversleevesX

      digikam is supposed to do it with the kipi metadata plugins, but I’ve yet to see it work 100%. Gwenview is much more in sync with kipi and Nepomuk add-ons, through either or both of which it does a good job at reading, if not editing, EXIF and IPTC.

      BZT

  • Jet

    IrfanView is freeware that allows you to edit metadata.

  • Martin

    I also use (and develop :) Mapivi http://mapivi.de.vu
    Mapivi is based on another Perl module Image::MetaData::JPEG.
    But future versions of Mapivi may be based on Image::ExifTool and with that support XMP and more picture formats.
    But using IPTC is no harm, it’s still the standard for photographers and a later migration from IPTC to XMP is straight forward.

    Martin

  • Freek

    Was looking for an XMP-tool to tag my photographs. Found nothing serious using Google, until I stumbled on this page. Thanks guys..!

    Question: why doesn’t offer Copernic XMP-support..?

    Freek, Amsterdam

  • Numan

    Is there a solution for .net I am using ASP.net with C# so any solution for reading XMP headers in dot net ?

  • Stefan

    I do not find PixVue anywhere. It seems to be erased from the web.
    Is there anyone that can give me a copy?

    Please send me an email to pixvue.20.wildeast@spamgourmet.com

    Thanks
    Stefan

  • Marco

    link for pixvue 2.01 since pixvue.com is no longer with us.

  • Timmy Jones

    Hey guys, here’s a handy tip. If you want to insert exif, iptc or xmp data into the names of files you can use Quick File Rename, not free though.

  • manatlan

    HI, I’m the author of jBrout … Just a post to correct some things.
    jBrout is written with python/pygtk, and is available for gnu/linux and windows. jbrout store “tags” in “iptc keywords”, store “comment” in the “jpeg comment”, and is aware of EXIF when rotating loss-less.
    AFAIK jbrout is the only one which use “Exif internal thumbnail”, and is able to rebuild it or rotate it (don’t need to regenerate thumbs).
    jBrout will be able to store informations in XMP soon (when exiv2 and pyexiv2 will be ready). and jBrout will hopefuly be able to handle others images formats like tif and rephaps raw …

    And like martin (hi martin !) said : IPTC is more widely used than xmp …

  • Knut

    Hi Tags er genious, especially if one also want to share photos (or any media)
    to several screens, I mean it main screen the TV.
    Unfortunately its all imature. Tversity media server dont support tags (exit/itpc)
    TV’s with network connection and DLNA or UpnP doesnt support it either.

    I am no linux fan neither perl or phyton.
    Does anyone know of windows programs that can query whole folders and extract photo files based on tag info?