I have about 6 years worth of XML ecommerce order backups that I want to convert into an archive database.
Normally most ecommerce backups are CSV. Those are easy to map into a relational DB like MySQL because I can essentially just say all the DB columns map to all the CSV columns 1 to 1. I’ve done this a lot.
XML is a different beast because it’s not two dimensional and it can have unique data and attributes on any given node. And it can have many levels in the subtree.
There is no 1-to-1 mapping here. If I use MySQL, an XML file would have to become multiple tables and I’d have to build in the relationships manually between them.
My question is, are there any systems that can directly convert an XML into a DB format of some kind? Either by creating multiple tables, or by using a different DB entirely, such as a graph-based or document-based DB.
The end goal is to get dozens of XMLs into a single DB so I can build a frontend around it for doing lookups and searches and even reports.
You need to learn about XML schemas as in XML Schema Tutorial. There are also the older DTD things. You are right that XML can be unpredictable but there is probably a schema for the XML files and then the data is likely to be organized in a manner that can be imported. Many databases can process XML files like that. I think there are utilities that can be used to create schemas if the data is in a consistent format that can be recognized.
There are many alternative possibilities. There is something called XML transformations and it is possible to convert XML to CSV format but you need to convert the equivalent of each table to a separate CSV file, as I am sure you already understand.
I can understand wanting to avoid analyzing the data but you probably must do that. It might seem to be overwhelming to do but be stubborn (presistent) and you will succeed.
The most elegant solution would be to create a structure of tables and relationships for your xml data in a database and move it there. But it would be difficult to do if the xml structure is very complex or unpredictable.
An alternative might be a database with some xml support. For example, Postgres has an xml type and a bunch of xml functions to work with it. So you might have a table where you store an xml file per record in a single xml field and use xml functions for searches, reports, etc. You might also index certain search types for speed using function indexes.
Then you can simply move the xml files to the database without any conversions - you will only need to think about general organizational structure, for example, I wouldn’t put xml files of completely unrelated data into the same field of a table - like orders, clients, etc. - if they are separate xml files then I’d put them into separate tables.