Just to be clear, I am reading the XML file, each node has 105 sub-nodes, each of those nodes correlate to a column in the MySQL table. I developed a "mapper" class that 1) figures out what columns are needed for the table, and 2) its data types
That class gets passed to the data provider which is in charge of building its data source (be it a database or flat file system or whatever), processing the feed items into the data source, and then doing any cleanup.
In the end, a table of 105 columns gets generated and the data appropriately inserted.
Just to give you an idea, I actually have two mappers for this project, one for 99% of the nodes in the XML file, the other for a single node (images) that is repeatable per record.
Since storing images doesn't make sense, I also have a second data provider whose sole job is building a flat file system and keeping it updated when the process is executed. So the two mappers and the two data providers handle updating the system as a whole. The data providers are based on an interface so they can be called without knowing what type it is and what it is going to do. Same with the mappers, they are able to figure out the process without really being told much other than the schema of the XML file and a definition file that dictates what to look for.
Overall, I'm extremely happy with the end result as what could look like a spaghetti code mess is 20 lines of setup the mappers, load them, load the feed, process it, send mappers and feed to each data provider, process them, do cleanup and exit. I could make a few pieces smarter, but in the end decided, that may be overreaching as it is best to specifically tell the system what to do in the places I wanted to "make it smarter" (was approaching over thinking it with no real measurable benefit).
Actually, since the number of records I'm dealing with isn't large, I've at this point just gone with INSERT ... ON DUPLICATE KEY UPDATE. I may regret that in the future, but for something that runs once a day and currently takes less than a minute to run, I'm not too worried.
There are a variety of ways I could find differences fairly quickly with my setup, just loading two feed sources into the system and likely performing array_diff would get me there. I could likely also run diff (from the command line) to get an indication of how much has changed too to determine if it makes sense to compare the two files or just flat out replace the whole table.