<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Binaries Belong in the Database Too</title>
	<atom:link href="http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/</link>
	<description>News, opinion, and fresh thinking for web developers and designers. The official podcast of sitepoint.com.</description>
	<lastBuildDate>Sat, 07 Nov 2009 23:35:20 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: thrashtad</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-109969</link>
		<dc:creator>thrashtad</dc:creator>
		<pubDate>Tue, 28 Nov 2006 20:38:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-109969</guid>
		<description>@LihnGB:

&lt;blockquote&gt;File caching at web server level or kernel level is far far better than doing it with the overheads associated with a database.&lt;/blockquote&gt;

Who says you can&#039;t do both ? You really think that because it&#039;s stored in a DBMS that you don&#039;t access it like any other served file? Your webapp would ouput something like:

http://blah.com/imgs/img.jpg?ID=123

where img.jpg is really a PHP/ASP/Whateva file that pulls the image from the DB, fakes some headers, and outputs the binary data.

Apache, IIS, etc..can cache this output, as well it should. You have a single performance hit that occurs when the cache is nonexistant or has expired, and the rest that follow don&#039;t touch the DB at all.

You were talking about performance issues why?

@pete:

&lt;blockquote&gt;1. I often want other applications to have access to the same image store without using an API into the database&lt;/blockquote&gt;

See above. Brain-dead simple there.

IMHO, the benefits to security (having random web users actually altering my filesystem is NOT COOL!!) that comes with not having binary data splattered onto your platters directly, the ease of searching, serving, updating, (and writing secure code to do such tasks) far outweigh any percieved performance drawbacks. 

I would also like to point to this study:
http://www.cis.ksu.edu/~rpalani/MSReport/ComparisonDbVsServer.htm

As you can see, both methods (done correctly) scale perfectly well. Yes, uploading huge 100M files takes about 50 seconds longer, but that is a non-issue as the bottleneck there will almost always be the network and not the DB or the FS. Uploading is also not going to be your major operation, as it&#039;s not being done very often. Notice that downloading has the lead going to the DB until the FS catches up at 100M.

100M is an awfully large file for the types of data we&#039;re talking about here. (Images, documents, etc..) At 10M and below, the performance hits are negligable.

I personally am a fan of meta-data table + blob table for situations that warrant that functionality. Granted, if you&#039;re serving billions of files to millions of users a day then storing them in a DBMS is stupidity beyond belief. But even in that case, you&#039;re going to be rolling your own custom filesystem architecture, akin to Google, which is way beyond the scope of this discussion.

In 80% of other use cases, both methods work equally well.</description>
		<content:encoded><![CDATA[<p>@LihnGB:</p>
<blockquote><p>File caching at web server level or kernel level is far far better than doing it with the overheads associated with a database.</p></blockquote>
<p>Who says you can&#8217;t do both ? You really think that because it&#8217;s stored in a DBMS that you don&#8217;t access it like any other served file? Your webapp would ouput something like:</p>
<p><a href="http://blah.com/imgs/img.jpg?ID=123" rel="nofollow">http://blah.com/imgs/img.jpg?ID=123</a></p>
<p>where img.jpg is really a PHP/ASP/Whateva file that pulls the image from the DB, fakes some headers, and outputs the binary data.</p>
<p>Apache, IIS, etc..can cache this output, as well it should. You have a single performance hit that occurs when the cache is nonexistant or has expired, and the rest that follow don&#8217;t touch the DB at all.</p>
<p>You were talking about performance issues why?</p>
<p>@pete:</p>
<blockquote><p>1. I often want other applications to have access to the same image store without using an API into the database</p></blockquote>
<p>See above. Brain-dead simple there.</p>
<p>IMHO, the benefits to security (having random web users actually altering my filesystem is NOT COOL!!) that comes with not having binary data splattered onto your platters directly, the ease of searching, serving, updating, (and writing secure code to do such tasks) far outweigh any percieved performance drawbacks. </p>
<p>I would also like to point to this study:<br />
<a href="http://www.cis.ksu.edu/~rpalani/MSReport/ComparisonDbVsServer.htm" rel="nofollow">http://www.cis.ksu.edu/~rpalani/MSReport/ComparisonDbVsServer.htm</a></p>
<p>As you can see, both methods (done correctly) scale perfectly well. Yes, uploading huge 100M files takes about 50 seconds longer, but that is a non-issue as the bottleneck there will almost always be the network and not the DB or the FS. Uploading is also not going to be your major operation, as it&#8217;s not being done very often. Notice that downloading has the lead going to the DB until the FS catches up at 100M.</p>
<p>100M is an awfully large file for the types of data we&#8217;re talking about here. (Images, documents, etc..) At 10M and below, the performance hits are negligable.</p>
<p>I personally am a fan of meta-data table + blob table for situations that warrant that functionality. Granted, if you&#8217;re serving billions of files to millions of users a day then storing them in a DBMS is stupidity beyond belief. But even in that case, you&#8217;re going to be rolling your own custom filesystem architecture, akin to Google, which is way beyond the scope of this discussion.</p>
<p>In 80% of other use cases, both methods work equally well.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: wwb_99</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-103821</link>
		<dc:creator>wwb_99</dc:creator>
		<pubDate>Thu, 23 Nov 2006 13:17:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-103821</guid>
		<description>@Andrew-J2000

I think a big part of your issues with things revolved around MySql. Sql server handles blobs alot better.

In any case, the memcaching is one method I use alot to help take load off of the DB.</description>
		<content:encoded><![CDATA[<p>@Andrew-J2000</p>
<p>I think a big part of your issues with things revolved around MySql. Sql server handles blobs alot better.</p>
<p>In any case, the memcaching is one method I use alot to help take load off of the DB.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew-J2000</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-71118</link>
		<dc:creator>Andrew-J2000</dc:creator>
		<pubDate>Fri, 20 Oct 2006 19:17:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-71118</guid>
		<description>Hi,

After reading this blog entry I thought I would shed some light with regard to my experience of storing files in databases. I’m sure many of you have known this to be a taboo practice, and I would certainly agree depending on the database. A project I worked on for MTV Networks Europe/International required a completely shared nothing architecture. This meant that MTV’s hosting &amp; operations imposed that I stored files in the database, and expressed my hesitation. To read more, about some of the issues that has not been discussed see below...

&lt;a href=&quot;http://blog.ajohnstone.com/archives/large-binary-data-and-blobs-2/&quot; rel=&quot;nofollow&quot;&gt;Large Binary Data and Blob’s&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>After reading this blog entry I thought I would shed some light with regard to my experience of storing files in databases. I’m sure many of you have known this to be a taboo practice, and I would certainly agree depending on the database. A project I worked on for MTV Networks Europe/International required a completely shared nothing architecture. This meant that MTV’s hosting &amp; operations imposed that I stored files in the database, and expressed my hesitation. To read more, about some of the issues that has not been discussed see below&#8230;</p>
<p><a href="http://blog.ajohnstone.com/archives/large-binary-data-and-blobs-2/" rel="nofollow">Large Binary Data and Blob’s</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Qvist</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-69436</link>
		<dc:creator>Ian Qvist</dc:creator>
		<pubDate>Wed, 18 Oct 2006 13:37:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-69436</guid>
		<description>I would like to see some test on the performance issues with database binaries.
I&#039;ve run into the file identifier problem a couple of times. I created a forum where people can upload their profile pictures, and i had to group the pictures and rename files to an id counter. This could be done a lot easier with a database storing all the profile pictures.</description>
		<content:encoded><![CDATA[<p>I would like to see some test on the performance issues with database binaries.<br />
I&#8217;ve run into the file identifier problem a couple of times. I created a forum where people can upload their profile pictures, and i had to group the pictures and rename files to an id counter. This could be done a lot easier with a database storing all the profile pictures.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68770</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Tue, 17 Oct 2006 13:49:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68770</guid>
		<description>I had a situation once where the website I built had to be replicated out to several servers. Tossing everything into a database, then using MSDE on the outside servers, allowed me to set up replication which copied new content out to the servers automatically - including images, PDFs, etc., all within a few seconds of updates. No need for file copying or worrying about filesystem permissions. Admittedly it wasn&#039;t a big site (MSDE will do up to about 2GB of database), but it worked great for me and kept me as a single developer working on task instead of doing daily chores.</description>
		<content:encoded><![CDATA[<p>I had a situation once where the website I built had to be replicated out to several servers. Tossing everything into a database, then using MSDE on the outside servers, allowed me to set up replication which copied new content out to the servers automatically &#8211; including images, PDFs, etc., all within a few seconds of updates. No need for file copying or worrying about filesystem permissions. Admittedly it wasn&#8217;t a big site (MSDE will do up to about 2GB of database), but it worked great for me and kept me as a single developer working on task instead of doing daily chores.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: wwb_99</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68730</link>
		<dc:creator>wwb_99</dc:creator>
		<pubDate>Tue, 17 Oct 2006 11:59:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68730</guid>
		<description>@linhGB

Valid points, but I think you (and many folks) overstate the performance issues of pulling files out of a database. I have had a server get slashdotted which pulls files out of a database and it handled it like a champ. I have another application that gets slammed hard for three days a year. It also stores alot of images in the database, and it takes it like a champ. They definitely don&#039;t get &quot;hammered to death&quot; with a couple of hundred people accessing the sites.

&lt;blockquote&gt;Dealing with a file system is the same. I think it’s only because you’re thinking from a MS Windows point of view. This is purely server administration stuff and applications aren’t supposed to know or care about it.&lt;/blockquote&gt;



If your app requires a file system that supports transactions, then I would bet it does care about this. I have seen a couple of people wave the linux flag and claim that transactions &amp; full-text search can be handled with file-system functions, but I have yet to see anyone post an example of a reasonably elegant way to handle such things.</description>
		<content:encoded><![CDATA[<p>@linhGB</p>
<p>Valid points, but I think you (and many folks) overstate the performance issues of pulling files out of a database. I have had a server get slashdotted which pulls files out of a database and it handled it like a champ. I have another application that gets slammed hard for three days a year. It also stores alot of images in the database, and it takes it like a champ. They definitely don&#8217;t get &#8220;hammered to death&#8221; with a couple of hundred people accessing the sites.</p>
<blockquote><p>Dealing with a file system is the same. I think it’s only because you’re thinking from a MS Windows point of view. This is purely server administration stuff and applications aren’t supposed to know or care about it.</p></blockquote>
<p>If your app requires a file system that supports transactions, then I would bet it does care about this. I have seen a couple of people wave the linux flag and claim that transactions &amp; full-text search can be handled with file-system functions, but I have yet to see anyone post an example of a reasonably elegant way to handle such things.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: pete</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68290</link>
		<dc:creator>pete</dc:creator>
		<pubDate>Sun, 15 Oct 2006 23:30:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68290</guid>
		<description>There &lt;em&gt;are&lt;/em&gt; advantages to storing binaries in the database, and in some situations it can work well. Specifically, it is valuable in creating a dedicated asset management system with some kind of version control - where your database for the binaries is highly tuned, or even separated from the database managing the metadata. In this way, we can start to see binaries as an attribute of an Asset object rather than metadata+binary. But we are all used to achieving this effect by now with a combination of metadata in the database and binary on the filesystem.

Some of the reasons and scenarios where I avoid databasing binaries are:

1. I often want other applications to have access to the same image store without using an API into the database
2. I&#039;m often building on top an existing collection of images from another system (like asset management)
3. using the filesystem you can export a static/published version of your site using scp/rsync easily in the scenarios where you don&#039;t want to use the database dynamically. Specifically one of the sites I run has millions of hits per day and would be significantly more expensive if I fetched images from the database or had to have enough front-end servers with enough memory to make the caching effective.
3. On an n-tier setup having to transfer large binaries between servers across a database connection is inefficient. As developers, we usually try to reduce the weight of data transferred from the database (think of all those times you don&#039;t &#039;select *&#039; but select only the fields you require).
4. When I want to distribute my images in front of my application, thinking here of edge-caching etc, where I may distribute my static content closer to the user (reverse proxy on my own network or geographically distributed edge-caches). Doing this saves me money as the bandwidth is bought very cheaply in bulk as it only need to serve static content. Of course this could still be supported by exporting from the database to the static locations.

In summary, I think if you are in a non-specialized setup and you are happy that the connection between your front-end/app servers and your database containing the binaries will scale to support the increased load, then designing your application in such a way as to take advantage of your database in the way the article advocated is worth serious consideration.</description>
		<content:encoded><![CDATA[<p>There <em>are</em> advantages to storing binaries in the database, and in some situations it can work well. Specifically, it is valuable in creating a dedicated asset management system with some kind of version control &#8211; where your database for the binaries is highly tuned, or even separated from the database managing the metadata. In this way, we can start to see binaries as an attribute of an Asset object rather than metadata+binary. But we are all used to achieving this effect by now with a combination of metadata in the database and binary on the filesystem.</p>
<p>Some of the reasons and scenarios where I avoid databasing binaries are:</p>
<p>1. I often want other applications to have access to the same image store without using an API into the database<br />
2. I&#8217;m often building on top an existing collection of images from another system (like asset management)<br />
3. using the filesystem you can export a static/published version of your site using scp/rsync easily in the scenarios where you don&#8217;t want to use the database dynamically. Specifically one of the sites I run has millions of hits per day and would be significantly more expensive if I fetched images from the database or had to have enough front-end servers with enough memory to make the caching effective.<br />
3. On an n-tier setup having to transfer large binaries between servers across a database connection is inefficient. As developers, we usually try to reduce the weight of data transferred from the database (think of all those times you don&#8217;t &#8217;select *&#8217; but select only the fields you require).<br />
4. When I want to distribute my images in front of my application, thinking here of edge-caching etc, where I may distribute my static content closer to the user (reverse proxy on my own network or geographically distributed edge-caches). Doing this saves me money as the bandwidth is bought very cheaply in bulk as it only need to serve static content. Of course this could still be supported by exporting from the database to the static locations.</p>
<p>In summary, I think if you are in a non-specialized setup and you are happy that the connection between your front-end/app servers and your database containing the binaries will scale to support the increased load, then designing your application in such a way as to take advantage of your database in the way the article advocated is worth serious consideration.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: LinhGB</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68289</link>
		<dc:creator>LinhGB</dc:creator>
		<pubDate>Sun, 15 Oct 2006 23:29:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68289</guid>
		<description>&lt;blockquote&gt;The biggest advantage is that databases generally support transactions and are ACID compliant. Which means if there is a problem uploading the file, one can easily cancel the entire update and roll back to the previous state. Unlike dealing with a file system, where one must manually clean up any mess one makes.&lt;/blockquote&gt;

Dealing with a file system is the same. I think it&#039;s only because you&#039;re thinking from a MS Windows point of view. This is purely server administration stuff and applications aren&#039;t supposed to know or care about it.

&lt;blockquote&gt;Furthermore, when using a database as the backing store for your files, you need not worry about file name collisions.&lt;/blockquote&gt;

Filenames stored on a filesystem by a web app are usually hashed. Hashed filenames and original filenames are both stored in the database. Problem solved. You can also store such file metadata in the filesystem itself (some support this).

Mark has addressed the other points. I&#039;ll question this:

&lt;blockquote&gt;As I stated above, there is clearly some overhead associated with using a database to store your binary files. The most significant one is that database connections are expensive, and pulling hundreds of KB out of a database does require it be open a lot longer. But this is easily overcome using caching, either to a temporary folder on the disk or-better yet-into memory.&lt;/blockquote&gt;

Only hundreds of KBs? Think bigger please. Even for web apps, it is not unreasonable to expect that people will upload or will demand the ability to upload files in multi-MBs. This is a big problem that not even with the MS Longhorn attitude of &quot;let&#039;s throw GBs of RAM and multi-core CPUs at the performance problem&quot; you can address on web servers. Your server will get hammered to death if only a couple hundreds of people access the file. 

As for caching, you might as well use the filesystem to store the files then. Caches can expire, and your application will have to &quot;clean up the mess&quot;, as you mentioned earlier as one of the supposedly cons of using a filesystem. File caching at web server level or kernel level is far far better than doing it with the overheads associated with a database.

For me, it&#039;s about using the right tool for the right job. Filesystems are better designed to deal with files. It&#039;s as simple as that.

&lt;blockquote&gt;I expect that we’re evolving to a situation where for common situations we’ll have a filesystem that can be queries/backed up/… as if it were a sql dbms.&lt;/blockquote&gt;

Linux has already had this in a KDE wrapper. You can &quot;mount&quot; a sql db on your system. I forgot the exact details as I don&#039;t care much about it but it&#039;s in a recent Linux Journal issue. Google will find more info.</description>
		<content:encoded><![CDATA[<blockquote><p>The biggest advantage is that databases generally support transactions and are ACID compliant. Which means if there is a problem uploading the file, one can easily cancel the entire update and roll back to the previous state. Unlike dealing with a file system, where one must manually clean up any mess one makes.</p></blockquote>
<p>Dealing with a file system is the same. I think it&#8217;s only because you&#8217;re thinking from a MS Windows point of view. This is purely server administration stuff and applications aren&#8217;t supposed to know or care about it.</p>
<blockquote><p>Furthermore, when using a database as the backing store for your files, you need not worry about file name collisions.</p></blockquote>
<p>Filenames stored on a filesystem by a web app are usually hashed. Hashed filenames and original filenames are both stored in the database. Problem solved. You can also store such file metadata in the filesystem itself (some support this).</p>
<p>Mark has addressed the other points. I&#8217;ll question this:</p>
<blockquote><p>As I stated above, there is clearly some overhead associated with using a database to store your binary files. The most significant one is that database connections are expensive, and pulling hundreds of KB out of a database does require it be open a lot longer. But this is easily overcome using caching, either to a temporary folder on the disk or-better yet-into memory.</p></blockquote>
<p>Only hundreds of KBs? Think bigger please. Even for web apps, it is not unreasonable to expect that people will upload or will demand the ability to upload files in multi-MBs. This is a big problem that not even with the MS Longhorn attitude of &#8220;let&#8217;s throw GBs of RAM and multi-core CPUs at the performance problem&#8221; you can address on web servers. Your server will get hammered to death if only a couple hundreds of people access the file. </p>
<p>As for caching, you might as well use the filesystem to store the files then. Caches can expire, and your application will have to &#8220;clean up the mess&#8221;, as you mentioned earlier as one of the supposedly cons of using a filesystem. File caching at web server level or kernel level is far far better than doing it with the overheads associated with a database.</p>
<p>For me, it&#8217;s about using the right tool for the right job. Filesystems are better designed to deal with files. It&#8217;s as simple as that.</p>
<blockquote><p>I expect that we’re evolving to a situation where for common situations we’ll have a filesystem that can be queries/backed up/… as if it were a sql dbms.</p></blockquote>
<p>Linux has already had this in a KDE wrapper. You can &#8220;mount&#8221; a sql db on your system. I forgot the exact details as I don&#8217;t care much about it but it&#8217;s in a recent Linux Journal issue. Google will find more info.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: chrisb</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68234</link>
		<dc:creator>chrisb</dc:creator>
		<pubDate>Sun, 15 Oct 2006 19:52:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68234</guid>
		<description>Depends what you feel a good test of performance is.. the easiest that springs to mind is dynamic images served from a database vs direct in IIS..

Another would be to enable kernel caching for other static files - you&#039;ll likely see a fairly sizable performance gain even against a cached ASP.NET-served version..</description>
		<content:encoded><![CDATA[<p>Depends what you feel a good test of performance is.. the easiest that springs to mind is dynamic images served from a database vs direct in IIS..</p>
<p>Another would be to enable kernel caching for other static files &#8211; you&#8217;ll likely see a fairly sizable performance gain even against a cached ASP.NET-served version..</p>]]></content:encoded>
	</item>
	<item>
		<title>By: wwb_99</title>
		<link>http://www.sitepoint.com/blogs/2006/10/15/binaries-belong-in-the-database-too/comment-page-1/#comment-68182</link>
		<dc:creator>wwb_99</dc:creator>
		<pubDate>Sun, 15 Oct 2006 15:24:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1739#comment-68182</guid>
		<description>@timvw


&lt;blockquote&gt;In both systems you have to worry about the ‘identification’ of entities. In a filesystem this is done via the filename and in a sql dbms this is done via the primary key.&lt;/blockquote&gt;

True, but in most cases the database or application logic will end up handling this issue for the developer. 99% of time the file is either going into a table with its own key, or going into an attachments table referencing keys from somewhere else. Wherease one will often have to gin up their own naming scheme for the filesystem.

&lt;blockquote&gt;It makes me wonder why most sql dbms vendors are proud to announce that the process of making a backup is as simple as copying some files… (To me it seems as if they’re saying that backing up files is easier than backup a dbms :P)&lt;/blockquote&gt;

But that backup is still one file rather than multiple files in multiple folders. Nevermind properly transferring the permissions on said files and folders over. There is alot more that can go wrong restoring multiple entities.

&lt;blockquote&gt;I find that restoring files is much easier than restoring a database schema. Just do a websearch and see how many people have problems restoring a simply mysql dumpfile…&lt;/blockquote&gt;

Alot of people have alot of troubles moving alot of applications. Mostly because making apps transportable is tricky if you do not keep it in mind from day one of design. These same folks would probably also have issues copying folders of randomly named files, fixing configurations and allowing write permissions on said folders.

&lt;blockquote&gt;But you would have to open up whatever your dbms accepts for input..&lt;/blockquote&gt;

Not really. One usually uploads files using the application itself, which is already accessible from the internet (or intranet) and which already must have access to appropriate parts of the database.

&lt;blockquote&gt;So you would give up the ’security’ advantages (not having to write to files) of the dbms for performance??&lt;/blockquote&gt;

I, myself, generally use memory rather than disk. But disk is a viable option, managing caching can be done in such a manner where the app can still never write to itself, maintaining the security advantages.

&lt;blockquote&gt;There are situations where i find it acceptable to store binary data in a dbms too. I expect that we’re evolving to a situation where for common situations we’ll have a filesystem that can be queries/backed up/… as if it were a sql dbms. &lt;/blockquote&gt;

This is definitely coming, see next generation filesystems from Apple &amp; MS.

@Mark

&lt;blockquote&gt;It’s crazy to store binary files in a database. I’ve seen a number of commercial M$ applications do this where there is no check whatsoever on file sizes, hence DB becomes full and sssslllloooowwww.&lt;/blockquote&gt;

Actually, the size of a database should not really hurt query speed unless said database is improperly designed. SELECTing based in indicies or better yet primary keys should be a sub-second operation no matter how many rows are in the database. Now, a huge database will definitely slow down full-text queries, but they still scale very well.

&lt;blockquote&gt;There are plenty of Open Source tools available which can index &amp; search M$ Word &amp; PDF files. Oh and why not run it on a secure system like Linux ? Writing to the file system is much safer :-) &lt;/blockquote&gt;



Last time I checked, Windows 2003/IIS6 had far fewer security vulnerabilities reported than *nix/Apache. On the file-system level, NTFS has never been &quot;cracked.&quot; Not that Linux is a file system, but that is another debate for another time.

Here is a more drawn out explanation of why writing to the file system is not safer:

1) malicious user finds image upload form, and forces it to accept a php script by faking some headers.
2) malicious user then executes php script, which happens to be running in the same context of the application, revealing all kinds of goodies (like that unencrypted database connection information most PHP apps keep as a global variable).
3) you are pwned.

If those uploads were going into a database, then the worst the user could do would be serve other users a corrupt jpg as the stuff in the database is not executed by the host applicaton.

@timvw &amp; mark:

What tools, exactly, can you all use to index PDFs and Word documents? How does one interface it into the website and the rest of the data it contains? What kind of setup or plumbing do these tools require?

@chrisb

Good point on shifting some file-size and performance requirements to the database server and the potential cost increases one can see there.

@all

One thing just hit me: we all say storing files in a file system is more efficent, but are there any good tests of this to see just how big the performance penalty is?</description>
		<content:encoded><![CDATA[<p>@timvw</p>
<blockquote><p>In both systems you have to worry about the ‘identification’ of entities. In a filesystem this is done via the filename and in a sql dbms this is done via the primary key.</p></blockquote>
<p>True, but in most cases the database or application logic will end up handling this issue for the developer. 99% of time the file is either going into a table with its own key, or going into an attachments table referencing keys from somewhere else. Wherease one will often have to gin up their own naming scheme for the filesystem.</p>
<blockquote><p>It makes me wonder why most sql dbms vendors are proud to announce that the process of making a backup is as simple as copying some files… (To me it seems as if they’re saying that backing up files is easier than backup a dbms :P)</p></blockquote>
<p>But that backup is still one file rather than multiple files in multiple folders. Nevermind properly transferring the permissions on said files and folders over. There is alot more that can go wrong restoring multiple entities.</p>
<blockquote><p>I find that restoring files is much easier than restoring a database schema. Just do a websearch and see how many people have problems restoring a simply mysql dumpfile…</p></blockquote>
<p>Alot of people have alot of troubles moving alot of applications. Mostly because making apps transportable is tricky if you do not keep it in mind from day one of design. These same folks would probably also have issues copying folders of randomly named files, fixing configurations and allowing write permissions on said folders.</p>
<blockquote><p>But you would have to open up whatever your dbms accepts for input..</p></blockquote>
<p>Not really. One usually uploads files using the application itself, which is already accessible from the internet (or intranet) and which already must have access to appropriate parts of the database.</p>
<blockquote><p>So you would give up the ’security’ advantages (not having to write to files) of the dbms for performance??</p></blockquote>
<p>I, myself, generally use memory rather than disk. But disk is a viable option, managing caching can be done in such a manner where the app can still never write to itself, maintaining the security advantages.</p>
<blockquote><p>There are situations where i find it acceptable to store binary data in a dbms too. I expect that we’re evolving to a situation where for common situations we’ll have a filesystem that can be queries/backed up/… as if it were a sql dbms. </p></blockquote>
<p>This is definitely coming, see next generation filesystems from Apple &#038; MS.</p>
<p>@Mark</p>
<blockquote><p>It’s crazy to store binary files in a database. I’ve seen a number of commercial M$ applications do this where there is no check whatsoever on file sizes, hence DB becomes full and sssslllloooowwww.</p></blockquote>
<p>Actually, the size of a database should not really hurt query speed unless said database is improperly designed. SELECTing based in indicies or better yet primary keys should be a sub-second operation no matter how many rows are in the database. Now, a huge database will definitely slow down full-text queries, but they still scale very well.</p>
<blockquote><p>There are plenty of Open Source tools available which can index &amp; search M$ Word &amp; PDF files. Oh and why not run it on a secure system like Linux ? Writing to the file system is much safer :-) </p></blockquote>
<p>Last time I checked, Windows 2003/IIS6 had far fewer security vulnerabilities reported than *nix/Apache. On the file-system level, NTFS has never been &#8220;cracked.&#8221; Not that Linux is a file system, but that is another debate for another time.</p>
<p>Here is a more drawn out explanation of why writing to the file system is not safer:</p>
<p>1) malicious user finds image upload form, and forces it to accept a php script by faking some headers.<br />
2) malicious user then executes php script, which happens to be running in the same context of the application, revealing all kinds of goodies (like that unencrypted database connection information most PHP apps keep as a global variable).<br />
3) you are pwned.</p>
<p>If those uploads were going into a database, then the worst the user could do would be serve other users a corrupt jpg as the stuff in the database is not executed by the host applicaton.</p>
<p>@timvw &amp; mark:</p>
<p>What tools, exactly, can you all use to index PDFs and Word documents? How does one interface it into the website and the rest of the data it contains? What kind of setup or plumbing do these tools require?</p>
<p>@chrisb</p>
<p>Good point on shifting some file-size and performance requirements to the database server and the potential cost increases one can see there.</p>
<p>@all</p>
<p>One thing just hit me: we all say storing files in a file system is more efficent, but are there any good tests of this to see just how big the performance penalty is?</p>]]></content:encoded>
	</item>
</channel>
</rss>
