Open Source Image Archiving: Exif, IPTC, XMP and all that

Related to this hack started taking a serious look at the available standards and Open Source tools for adding meta data to images, in the context of building archives of digital photography. With further prompting from reading this (Hey! MS are adopting someone else’s standard!), dumping some notes…

Key Takeaways

Digital images contain three significant types of metadata: camera-generated metadata (e.g. camera make, time the picture was taken, exposure), manually added metadata (e.g. description, keywords, name of the photographer), and grouping metadata (how a collection of images relate to each other). The first two should be stored in the image itself for data preservation, while the third type is best stored externally.
Three significant inline metadata formats exist for storing data in an image: Exif, IPTC, and XMP. Exif is used by most digital cameras to attach information at the point of picture-taking, IPTC extends Exif by providing more named tags for metadata, and XMP, developed by Adobe, provides XML/RDF/Dublin core goodness for storing metadata. All three formats should be utilized for effective photo archiving.
Image archiving tools vary in their capabilities, but the best ones should allow the user to manually supply additional metadata about an image and store this data in the image itself, not separately. Open source contenders doing this right include F-spot and jbrout.
While the open source image archiving application landscape is less than ideal, some libraries are available for hacking your own. Phil Harvey’s ExifTool, a Perl-based library, is particularly noteworthy for its comprehensive support of EXIF, IPTC, and XMP, as well as manufacturer extensions.

Photo Archiving: The Golden Rule

Store metadata in the image.

Actually should be more precise about the word metadata here – there are essentially three significant types of metadata when in comes to digital images, when it comes to who created it and where it’s stored.

First; the stuff your digital camera attaches to the image when you take a picture (e.g. camera make, time the picture was taken, exposure etc.), second; meta data you manually add to an image, typically at the time you download images from the camera to your PC (e.g. where the picture was taken, a description, keywords, name of the photographer etc.) and third; “grouping metadata” – how a collection of images relate to each other (e.g. they were all part of a single photo shoot or they are all family pictures). For the purposes of this discussion, the first two types of metadata can be regarded as a the same thing and it’s this information that you want stored in the image.

The third type – the “grouping” information is “relative” and, practically, can only work well if it’s stored externally from images. This may be as “lightweight” as a tree of directories on your filesystem with timestamp-based names, under which you store your images, but image archive tools may offer additional facilities in this area. Important is if you want to group a collection of images under a heading like “family”, you also want to have a keyword “family” stored in the image. In such case it may be the image archiving tool can build the groups automatically from your keywords.

But you want the first two types of metadata in the image, so that if you switch archiving tools, transfer your images to another system or forget to backup the archive database but remember the images, it won’t be lost. A decent archiving tool will probably generate it’s own database from this inline metadata, to make searches etc. efficient, but the final authority should be the image itself.

Inline Image Metadata Standards

The next question is how is data stored in an image. File formats like JPEG, RAW and TIFF, commonly used in digital photography / processing, allow additional metadata to be stored in the image file.

So here’s a “avoid too much detail” view: there are three significant inline metadata formats here – Exif, IPTC and XMP. Exif is used by most digital cameras to attach information like exposure at the point you take a picture and tools are available for adding information manually (or automatically) such as geocoding. IPTC “extends” Exif, providing a whole load more named “tags” for metadata. Both are fragile from a technical point of view (easy to destroy / corrupt) and problematic from the point of view of extending (if there’s no named “tag” for the metadata you want to store, you’re out of luck). Meanwhile, XMP, the work of Adobe, is the “holy grail”, providing XML / RDF / Dublin core goodness for storing metadata. It’s just not well supported yet, but starting to gain significant acceptance. And of course our friend IE has it’s own opinion about XMP…

In practice, you need to be able to work with all three formats for effective photo archiving.

Image Archiving Apps

There are a bunch of image archiving tools out there, for any and all operating systems. A good one will allow you to manually supply additional meta data about an image (e.g. keywords like “family”, “dog” etc.) but, from a straw poll, the majority store this data separately from the image, in some kind of database. Where Ubuntu is concerned GThumb does just this – stores any user supplied meta data externally from the image (if I remember right, it’s under ~/.gnome2/gthumb).

There’s a bunch of reasons why I consider this fundamentally wrong, which become obvious the moment an image leaves your filesystem (or perhaps even changes directory / filename) plus the fact that there’s no “clear winner” in the image archiving space – there’s a good chance you’ll switch tool more than once. Bottom line: storing metadata separately from the image it refers to is just not the way to do it.

In the Open Source / Linux space, there are a couple of contenders which are getting it “right”, namely F-spot (C# / Mono) and jbrout (python / wxpython). Both are capable of storing metadata in the image but there are limitations, e.g. jbrout has no awareness of XMP. Some worthwhile blogs (and comments) to read are How to manage EXIF, XMP metadata on linux gnome kde, Managing Photographs and Photo organizing in Linux.

Image Archiving Done Right

By contrast, the model application, IMO, for how to do it right (sadly Win32 only) but free to use is PixVue.

For starters, rather than being an monolithic application, it adds a bunch of Windows Shell Namespace Extensions meaning you can work with your images in Windows Explorer – e.g. right click and “annotate” with your metadata.

As I see it “Digital workflow”, in general, is all about procedures – you need “process oriented” applications that fit into your procedure, rather than monolithic windowed views of your images, which dictate how you work. PixVue gets this right by getting out of the way and hence my desire to extend Nautilus.

PixVue also stores almost all metadata in the image, understanding EXIF, IPTC and XMP. The exception is the “grouping” metadata, which it provides something called “galleries”, which are a virtual directory hierarchy – in Explorer you can drag and drop images into a gallery (or galleries) to organise them but these are only “symbolic links” to the original files on your filesystem. You can also export your gallery structures as XML, to allow at least the chance of portability.

Also the “templates” functionality it provides helps make attaching metadata to a large collection of images as fast and painless as possible.

Finally it has excellent search facilities (via the gallery view) plus it “integrates” with (populates the indexes of) Windows search and Google desktop.

The only downside of PixVue is the source isn’t available and there doesn’t seem to be an API (it does register some COM components but, from a quick scan, these don’t allow you to drive it with, say, a Python script), so if you want to extend it to integrate with some geocoding tool, you’re out of luck.

Open Source Image Metadata Tools

So, given a less-than-ideal situation for Open Source image archiving applications, what’s available in the form of libraries, so you can hack your own?

The long and short of it is more bad news. There’s a whole bunch of libraries that do a little bit, perhaps only (partially) understanding one format (typically Exif), read only – not write or lacking support for manufacturer Exif extensions.

The situation is particularly acute when it comes to Python, which is badly lacking – there’s a few “readers” out there, and the occasional writer (no XMP writer though) but nothing that’s really complete. That’s bad news for jbrout and even more bad news for Python on Series 60 phones, where being able to tag your pictures, while you sit on a train, would be a killr app.

There are exceptions though, the #1 being Phil Harvey’s exiftool (Perl – available via CPAN). ExifTool isn’t just good – it’s excellent. It’s basically got it all – EXIF, IPTC and XMP support (read/write), support for manufacturer extensions and a common dictionary of tag names that are, to an extent, metadata-format-independent. The only downside is there doesn’t seem to be (correct me if wrong) a GUI front-end for Linux that exposes all of Exiftool’s functionality – you’re either talking command line or writing your own Perl scripts.

Another noteworthy exception Evan Hunter’s PHP JPEG Metadata Toolkit, which has EXIF, IPTC and XMP support and awareness of some manufacturer quirks. The API is also pretty simple.

Anyway – hopefully that saves someone some time in research.

Frequently Asked Questions (FAQs) about Open Source Image Archiving: EXIF, IPTC, XMP, and More

What is the difference between EXIF, IPTC, and XMP metadata?

EXIF, IPTC, and XMP are all types of metadata that can be embedded in an image file. EXIF (Exchangeable Image File Format) is a standard that specifies the formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other systems. IPTC (International Press Telecommunications Council) metadata is used by news organizations for information like captions, headlines, categories, credits, and copyrights. XMP (Extensible Metadata Platform) is a standard created by Adobe for storing metadata in digital assets. It’s more flexible and can include EXIF and IPTC data, as well as other types of metadata.

How can I read and write IPTC metadata in C#?

To read and write IPTC metadata in C#, you can use libraries like MetadataExtractor and Magick.NET. MetadataExtractor is a straightforward library for reading metadata from image and video files. Magick.NET, on the other hand, is a .NET wrapper for the ImageMagick library, which allows you to read, write and manipulate images in various formats, including the ability to handle metadata.

How can I export IPTC as XMP in Photos for Mac?

In Photos for Mac, you can export IPTC as XMP by selecting the photos you want to export, then choosing File > Export > Export Unmodified Original. In the dialog that appears, check the box for “Export IPTC as XMP”. This will include the IPTC metadata in an XMP sidecar file when the photos are exported.

What does exporting IPTC as XMP do?

Exporting IPTC as XMP creates a separate file (known as a sidecar file) that contains the IPTC metadata in XMP format. This can be useful when you’re transferring photos between different systems or software that may not support IPTC directly, but do support XMP.

How can I handle IPTC metadata with ImageSharp in C#?

ImageSharp is a new, fully featured, fully managed, cross-platform, 2D graphics API. Currently, it does not natively support reading or writing IPTC metadata. However, you can use it in conjunction with other libraries that do, such as MetadataExtractor for reading metadata, and Magick.NET for writing metadata.

How can I export photos, videos, slideshows, and memories in Photos for Mac?

In Photos for Mac, you can export your photos, videos, slideshows, and memories by selecting the items you want to export, then choosing File > Export. You can choose to export the items as they are, or export a version that’s been optimized for a specific size or format.

What is the importance of metadata in digital assets?

Metadata in digital assets is crucial as it provides information about other data. It can include details like the author, creation date, last modified date, and so on. In the context of digital images, metadata can include details like the camera model, exposure settings, GPS location, and even copyright information. This can be extremely useful for organizing, categorizing, and searching through your digital assets.

How can I view the metadata of an image?

Most image viewers and editors allow you to view the metadata of an image. In Windows, you can right-click on the image file, select Properties, and then click on the Details tab. On a Mac, you can use the “Get Info” command in Finder. There are also many online tools available that can display image metadata.

Can metadata be removed from an image?

Yes, metadata can be removed from an image. This is often done to protect privacy, as metadata can include potentially sensitive information like the location where a photo was taken. Most image editing software, including Adobe Photoshop and Lightroom, have tools for removing metadata. There are also dedicated metadata removal tools available online.

Is it possible to add or modify the metadata of an image?

Yes, it’s possible to add or modify the metadata of an image. This can be done using image editing software like Adobe Photoshop or Lightroom, or with a dedicated metadata editor. Adding or modifying metadata can be useful for organizing your images, adding copyright information, or correcting inaccurate information.