How to make table scan search faster?

I have a table where I store UPS shipments, each shipment is one row. There are many columns in this table and there is also a text column which stores the whole UPS response in XML format. Each XML is above 30KB and as each day there are 20-30 shipments the table size grows pretty quickly. This causes searches to be slower and slower - in this case an index will not help because I need to search, for example, by the shipment number which is entered partially by the user so I have to use LIKE ‘%…%’, which results in full table scan. At the moment the performance is not that bad but in a few months it can become a problem. What can I do to make the search faster?

So far there is only one solution I can think of: create another table in 1:1 relation to the main table and move only the XML text field there. Removing all the XML data from the table makes the table scan search much faster but this looks like a bit ugly workaround because there’s really no other reason to have a separate table. Are there any other solutions?

You can use Sphinx and delegate all search queries to Sphinx. It will, however, create its own indexes and what not so the whole thing won’t be storage effective.

The other solution, and this is by making an uneducated guess, is that you could pull out data of interest out of the XML (such as this shipment number that you mentioned) and store it in another 1:M table and perform searches based on that.
Also, if shipment number is entered partially, and I’m guessing again that your users will enter only the beginning or the end of the string - you can use methods such as indexing pieces of the string and reverse-indexing pieces of the string.

Example: shipment number is 123-456. Your 1:M table would contain index on ‘123’ and ‘654’ so you can avoid LIKE ‘%%’ searches.
However, seeing it’s a hassle to do this only for one thing that might be of interest and we’re talking about 30kb XML here so there’s plenty that could be of interest - I’d think about delegating searches to engines such as Sphinx and ease up the coding hassle I’d be looking at.

Maybe someone more experienced has more elegant solutions tho, I wish you good luck with the project :slight_smile:

separate table actually sounds like a good idea, especially since you’ve already tested it and found that it works well

what datatype is the xml column? and what version of mysql are you on?

I think I’ll have to do it if there’s no other way. It’s just that I don’t like the idea of changing db structure just for the sake of optimization because I’ll have to change a few things in the application, too.

what datatype is the xml column? and what version of mysql are you on?

It is MEDIUMTEXT. Mysql 5.0.91.

TEXT columns are apparently not stored in the row, but in their own separate storage area, so i’m puzzled why your search improved after you split the xml column off into a separate table – did you clear your buffers between your tests?

is the shipment number part of the xml column?

Interesting idea but unfortunately I cannot install Sphinx on this server. It also sounds like a serious solution for such a small thing…

The other solution, and this is by making an uneducated guess, is that you could pull out data of interest out of the XML (such as this shipment number that you mentioned) and store it in another 1:M table and perform searches based on that.
Also, if shipment number is entered partially, and I’m guessing again that your users will enter only the beginning or the end of the string - you can use methods such as indexing pieces of the string and reverse-indexing pieces of the string.

Example: shipment number is 123-456. Your 1:M table would contain index on ‘123’ and ‘654’ so you can avoid LIKE ‘%%’ searches.
However, seeing it’s a hassle to do this only for one thing that might be of interest and we’re talking about 30kb XML here so there’s plenty that could be of interest - I’d think about delegating searches to engines such as Sphinx and ease up the coding hassle I’d be looking at.

I already pull out important data from the XML to other columns for searches. The shipment number is an 18-character string and cannot be divided into any logical parts. Interesting idea, but still the 1:1 table looks like a simpler solution.

Maybe someone more experienced has more elegant solutions tho, I wish you good luck with the project :slight_smile:

Thanks for your input!

I also thought TEXT column should be stored elsewhere and not affect table scan but it does. The table is now 20MB large, when I remove the TEXT column (actually there are 2 TEXT columns but it’s not important), the table is 224KB and a LIKE ‘%…%’ search takes about 12x as fast.

is the shipment number part of the xml column?

It is but it is also repeated in its separate char(18) column. I do a LIKE search by this char column. The XML text is stored mainly for archival purposes.