Pros and cons of storing XML in a blob

Hi there,

This is a question that I had originally posed to Rudy but I’d like to open up the discussion to the rest of the community. We’re currently running Microsoft SQL Server (a newer version - heh). One of our developers keeps pulling for a particularly unique and, as far as I know, unconventional storage method.

Basically he’s pushing to have all data stored within the database in XML format in a blob in each table. As far as I understand this has a lot of cons and not many pros. His argument is that it allows flexibility in the data storage. My counter argument is, if the data you’re storing requires flexibility then yes, if it’s typically static then what’s the point?

Also some other negatives as far as I know are that: you can’t perform typical relational db queries (joins and other functions). As far as I know I’m not even sure if this is an effective storage solution - might as well just use XML files. My point here would be that a db engine is designed for efficiency and querying capabilities…by storing data as XML it adds not only overhead but complexity to retrieve data.

What do you guys think about this overall? At the moment I think it’s a ridiculous method. :-/ Rudy what are your initial thoughts?

Thanks,

the biggest problem i see with it is that you cannot define foreign keys, and so there is no relational integrity possible between entities

also, searches on specific columns would be inefficient as heck – imagine doing a table scan of a bazillion rows looking inside a blob column for the contents of a specific xml tag

sure, retrieval of individual rows is easy – get the blob where id=42 – but all the smarts have to be in the application

I spent some more time researching it and I found this:

http://www.devx.com/xml/Article/26766

[LIST]
[]Query performance—The relational model is much simpler than XML, which makes searching easier and faster.
[
]Storage space—XML is notoriously space-inefficient—maybe not a problem for a config file, but size can be an issue for large databases.
[]Locking—The XML type is lockable like any other type. So, if many users are updating the same set of documents, they can run into decreasing performance and even deadlocks.
[
]Legacy—A lot of technological infrastructure supports the relational model, such as reporting software and business intelligence systems.
[]Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML. You can always expose it to relational consumers with an XML view.
[
]Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software. Extensibility is an extra advantage when prototyping, or when working with rapidly changing problem domains such as bioinformatics.
[]Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.
[
]Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.[/LIST]http://www.devx.com/codemag/Article/31628

[LIST]
[]XML types cannot convert to text or ntext data types.
[
]No data type other than one of the string types can be cast to XML.
[]XML columns cannot be used in GROUP BY statements.
[
]Distributed partitioned views or materialized views cannot contain XML data types.
[]Use of the sql_variant instances cannot include XML as a subtype.
[
]XML columns cannot be part of a primary or foreign key.
[]XML columns cannot be designated as unique.
[
]Collation (COLLATE clause) cannot be used on XML columns.
[]XML columns cannot participate in rules.
[
]The only built-in scalar functions that apply to XML columns are ISNULL and COALESCE. No other scalar built-in functions are supported for use against XML types.
[]Tables can have only 32 XML columns.
[
]Tables with XML columns cannot have a primary key with more than 15 columns.
[]Tables with XML columns cannot have a timestamp data type as part of their primary key.
[
]Only 128 levels of hierarchy are supported within XML stored in the database.[/LIST]What would you agree/disagree with? Seems like a pretty good response in both articles…

i’ve used this before in mysql. see http://www.sitepoint.com/forums/showthread.php?t=549636

yes, but you specifically included searchable columns in the table design

imagine the table with only two columns: id, blob

what are the advantages/disavantages?

Actually, with Sql 2005 and better, XML, especially if schema-bound, is pretty powerful–it can be indexed for example. It can also be queried using XPath statements.

That said, don’t do it. I lost the argument on this exact sort of thing on one of our larger applications, and it has alot of downsides. First, database is comparatively ginormous and resource usage is a bit high for my liking. I should say nothing is beyond spec, but it could be alot more streamlined. The bigger disadvantage is what you lose from not having a standard SQL interface to handle back-office tasks or quash the occasional data glitch. Because everything is locked into big blobs of ugly XML, you generally need to write a script to access stuff via a complex object interface rather than good old-fashioned SQL. And not having referential integrety and/or cascading updates has required a little voodoo here and there to work around.

For flexibility and ease of use, you might want to have your developer look at tools like Subsonic or LinqToSql. They should ease the pain for him by making the database a rather transparent resource and significantly reduce the amount of SQL they would have to sling–lots of developers fear the SQL. While keeping the schema human intelligible and human managable.

That is not to say XML, like binaries, does not have a place in the database. It can be very, very handy in some scenarios–such as when one has some sort of object queue table, or for serialized record history. But it is best treated as a payload field rather than the main metadata.

wonderfully descriptive, thanks