SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)

    How to make table scan search faster?

    I have a table where I store UPS shipments, each shipment is one row. There are many columns in this table and there is also a text column which stores the whole UPS response in XML format. Each XML is above 30KB and as each day there are 20-30 shipments the table size grows pretty quickly. This causes searches to be slower and slower - in this case an index will not help because I need to search, for example, by the shipment number which is entered partially by the user so I have to use LIKE '%...%', which results in full table scan. At the moment the performance is not that bad but in a few months it can become a problem. What can I do to make the search faster?

    So far there is only one solution I can think of: create another table in 1:1 relation to the main table and move only the XML text field there. Removing all the XML data from the table makes the table scan search much faster but this looks like a bit ugly workaround because there's really no other reason to have a separate table. Are there any other solutions?

  2. #2
    SitePoint Zealot
    Join Date
    Dec 2010
    Posts
    187
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You can use Sphinx and delegate all search queries to Sphinx. It will, however, create its own indexes and what not so the whole thing won't be storage effective.

    The other solution, and this is by making an uneducated guess, is that you could pull out data of interest out of the XML (such as this shipment number that you mentioned) and store it in another 1:M table and perform searches based on that.
    Also, if shipment number is entered partially, and I'm guessing again that your users will enter only the beginning or the end of the string - you can use methods such as indexing pieces of the string and reverse-indexing pieces of the string.

    Example: shipment number is 123-456. Your 1:M table would contain index on '123' and '654' so you can avoid LIKE '%%' searches.
    However, seeing it's a hassle to do this only for one thing that might be of interest and we're talking about 30kb XML here so there's plenty that could be of interest - I'd think about delegating searches to engines such as Sphinx and ease up the coding hassle I'd be looking at.

    Maybe someone more experienced has more elegant solutions tho, I wish you good luck with the project

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    separate table actually sounds like a good idea, especially since you've already tested it and found that it works well

    what datatype is the xml column? and what version of mysql are you on?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    separate table actually sounds like a good idea, especially since you've already tested it and found that it works well
    I think I'll have to do it if there's no other way. It's just that I don't like the idea of changing db structure just for the sake of optimization because I'll have to change a few things in the application, too.

    what datatype is the xml column? and what version of mysql are you on?
    It is MEDIUMTEXT. Mysql 5.0.91.

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    TEXT columns are apparently not stored in the row, but in their own separate storage area, so i'm puzzled why your search improved after you split the xml column off into a separate table -- did you clear your buffers between your tests?

    is the shipment number part of the xml column?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by furicane View Post
    You can use Sphinx and delegate all search queries to Sphinx. It will, however, create its own indexes and what not so the whole thing won't be storage effective.
    Interesting idea but unfortunately I cannot install Sphinx on this server. It also sounds like a serious solution for such a small thing...
    The other solution, and this is by making an uneducated guess, is that you could pull out data of interest out of the XML (such as this shipment number that you mentioned) and store it in another 1:M table and perform searches based on that.
    Also, if shipment number is entered partially, and I'm guessing again that your users will enter only the beginning or the end of the string - you can use methods such as indexing pieces of the string and reverse-indexing pieces of the string.

    Example: shipment number is 123-456. Your 1:M table would contain index on '123' and '654' so you can avoid LIKE '%%' searches.
    However, seeing it's a hassle to do this only for one thing that might be of interest and we're talking about 30kb XML here so there's plenty that could be of interest - I'd think about delegating searches to engines such as Sphinx and ease up the coding hassle I'd be looking at.
    I already pull out important data from the XML to other columns for searches. The shipment number is an 18-character string and cannot be divided into any logical parts. Interesting idea, but still the 1:1 table looks like a simpler solution.
    Maybe someone more experienced has more elegant solutions tho, I wish you good luck with the project
    Thanks for your input!

  7. #7
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    TEXT columns are apparently not stored in the row, but in their own separate storage area, so i'm puzzled why your search improved after you split the xml column off into a separate table -- did you clear your buffers between your tests?
    I also thought TEXT column should be stored elsewhere and not affect table scan but it does. The table is now 20MB large, when I remove the TEXT column (actually there are 2 TEXT columns but it's not important), the table is 224KB and a LIKE '%...%' search takes about 12x as fast.

    is the shipment number part of the xml column?
    It is but it is also repeated in its separate char(18) column. I do a LIKE search by this char column. The XML text is stored mainly for archival purposes.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •