SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Member pixelsnap's Avatar
    Join Date
    Jul 2007
    Location
    Bowling Green KY USA
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    815,000+ records, DB table query too slow

    Hi Everyone,
    I have a database table with about 815,000 records in it, each record containing the source of individual webpages (with all the tags and multiple spacing stripped out leaving just the words). At first my match boolean queries were fine but once I got to around 100,000+ records it became too slow to use for a web-based search tool. I was able to separate the data into 10 tables and then use AJAX to run the match query on all 10 tables at the same time and that has increased performance.

    My question is, is it possible for me to increase the performance for the 1 table some how instead of resorting to separating the data into multiple tables? Is MySQL the right DB for this kind of setup, would MSSQL or PostgreSQL perform better for larger sets of data? I'm not sure how MySQL 5 clustering works, could that be an option? Any help would be appreciated.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    how have you indexed the table? what does your typical search query look like?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Member pixelsnap's Avatar
    Join Date
    Jul 2007
    Location
    Bowling Green KY USA
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The table has 4 fields: id (int), website_id (int), url (varchar), content (text) and id is the Primary Key and there is an index for content which is a FullText index.

    Query Example:
    SELECT COUNT(id) AS id, COUNT(DISTINCT website_id) AS website_id FROM webpages WHERE MATCH(content) AGAINST('+"search marketing"' IN BOOLEAN MODE);

    Query Stats:
    - The query above took 113.7 seconds to execute on the single table containing 815,000+ records like below
    - The query above took 13.9 seconds to execute on one of the segmented tables that contained 59,000+ records like below

    Data Record Example:
    id - 1
    website_id - 13
    url - http://www.test.com
    content (can sometimes be 3-5x this size) -
    So I recently launched an online candy store called eSnackscom I know you're wondering who the hell buys candy online The short answer is anyone who can't find the candy they want at a local store For example old classic candies like Big Hunk novelty stuff such as Pez Dispensers and popular candy brands with unusual flavors such as the Jolley Rancher Cinnamon Fire The store has had a very good start and has netted about 192010 in less than a month of being open for business Sure it's not much money but the site is brand new requires little time in the day for maintenance and has no overhead It's pretty impressive to see it generate that much in such a short time span I'm looking forward to improving it over time and watching the sales pick up After reading the book The 4 hour work week I am also setting up outsourcing for many of the non-sensitive daily tasks such as product updates to make the business as self-sufficient as possible Some cool things about this project is that it has allowed me to get some experience handling transactions for both Google Checkout and PayPal I do have to say the Google Checkout system is much more organized and easier to use in my opinion plus it currently has no transaction fees which is a HUGE deal when you process an order over 1000

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    thanks, that is excellent information, and answers the question nicely

    unfortunately i'm not a fulltext indexing expert -- not by a long shot -- and so you may have to wait for someone else to come along...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    try it without boolean mode first to see if mysql thinks your fulltext index is valid.

  6. #6
    SitePoint Member pixelsnap's Avatar
    Join Date
    Jul 2007
    Location
    Bowling Green KY USA
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I removed boolean in the query and received the same results although it took longer to execute.

    Query
    SELECT COUNT(id) AS id, COUNT(DISTINCT website_id) AS website_id FROM webpages WHERE MATCH(content) AGAINST('+"search marketing"');

    Query Stats
    - The query above took 840+ seconds to execute on the single table containing 815,000+ records
    - The query above took 135.8 seconds to execute on one of the segmented tables that contained 59,000+ records

  7. #7
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    is this a shared server, or your own dedicated server?

  8. #8
    SitePoint Member pixelsnap's Avatar
    Join Date
    Jul 2007
    Location
    Bowling Green KY USA
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have a "dedicated virtual" account at mediatemple.net for the "extreme" setup. I'm not sure what that would classify under.

  9. #9
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    specifically, i wonder if you have access to change mysql server variables. it sounds like you have a very large fulltext index but not enough memory configured to keep the index in memory.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •