Programming - - By Wyatt Barnett

Binaries Belong in the Database Too

One often repeated piece of advice one sees across the internet is “Don’t store binary files in the database. Save a pointer to the file and keep it on the disk.” The ostensible reasons for this is that storing binary files is that file systems are much more efficient at storing and retrieving files than databases. Which is a valid argument, especially if one is dealing with very large files. But I think there are a number of advantages to storing files in the database which can easily outweigh this issue, especially for dynamic, data-driven sites.

The biggest advantage is that databases generally support transactions and are ACID compliant. Which means if there is a problem uploading the file, one can easily cancel the entire update and roll back to the previous state. Unlike dealing with a file system, where one must manually clean up any mess one makes. Furthermore, when using a database as the backing store for your files, you need not worry about file name collisions.

Backup is another huge advantage. If your application is already storing the content in a database, storing binary attachments in there as well makes a lot of sense. Your backup routine becomes much simpler, as you need just backup a database rather than worrying about some arbitrary folders on the disk. And your restore routine also becomes greatly simplified as you need just restore a database rather than a database as well as some bit of the file system. This comes in especially handy when you, as a developer, need to pull down a “current” copy of the site in order to do some maintenance work.

From a security standpoint there are huge advantages as well. If your application need not write files, you can easily ban it from writing to disk at all using file system permissions. The inability to write malicious files to disk makes many exploits impossible. Furthermore, since you are using a server-side language to handle files, keeping unauthorized users from raiding your files becomes much easier as they can easily be patched through your normal security. In addition, because no one needs to upload files, one can oftentimes disable FTP access entirely. And FTP is about as insecure a service as exists today.

Finally, depending on the database platform, one can do some very neat trick with database-based file storage. In Sql Server, one can do full-text searches on Html or Word documents out of the box.  Installing the PDF IFilter allows for doing the same sort of searches on PDFs. And there are other IFilters available for many sorts of files. Users love full-text search.

As I stated above, there is clearly some overhead associated with using a database to store your binary files. The most significant one is that database connections are expensive, and pulling hundreds of KB out of a database does require it be open a lot longer. But this is easily overcome using caching, either to a temporary folder on the disk or-better yet-into memory.

So, next time someone tells you are crazy for storing files in a RDBMS, tell them they are crazy to store the files on a disk.