SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2013
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Query for searching in multiple tables – UNION or not?

    I'm trying to create a query in order to search for a specific value in several different tables. I have a table of photos, another one with keywords and a third with people (both photographers and people in the pictures), and finally one where I keep the image captions with a full text index. The fastest solution I've come up with so far is this one:

    Code:
    SELECT various_columns_from_table_"photos"_and_some_columns_from_table_"people"
    FROM captions
    INNER JOIN photos
    ON captions.id = photos.captionid
    LEFT JOIN people ON photos.photografph_id = people.id
    WHERE MATCH(caption) AGAINST(+'hometown')
    
    UNION
    
    SELECT various_columns_from_table_"photos"_and_some_columns_from_table_"people"
    FROM photos
    LEFT JOIN people AS photographers ON photos.photografph_id = photographers.id
    LEFT JOIN keywords_photos as kp
    ON kp.photoid = photos.id
    LEFT JOIN keywords AS k
    ON k.id = kp.wordid 
    LEFT JOIN people_photos AS pp
    ON photos.id = pp.photoid
    LEFT JOIN people as people
    ON pp.personid = people.id  
    WHERE 'hometown'
    IN(various_columns_in_the_tables_"photos"_"people"_and_"keywords")
    GROUP BY photos.id
    Does anyone see any obvious room for improvement or any errors I should correct? I'm not sure about all the LEFT JOINS, but it seems to be working as intended.
    I've tried to join all the tables together in a single SELECT statement, but all my attempts so far has resulted in searches that take several seconds.

  2. #2
    SitePoint Enthusiast
    Join Date
    Apr 2013
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No comments on this, which may be good or not.

    I just got a new idea myself. What if I copy the data from all the columns in the photos, people and keywords tables into a new column in the captions table (excluding words/phrases already present in the caption column) and use both columns for this kind of general search? That way the query could be heavily simplified, much more flexible and hopefully faster:

    Code:
    SELECT various_columns_from_table_"photos"_and_some_columns_from_table_"people"
    FROM captions
    LEFT JOIN photos
    ON captions.id = photos.captionid
    LEFT JOIN people
    ON photos.photografph_id = people.id
    WHERE MATCH(caption,newColumn) AGAINST('blah blah blah' IN BOOLEAN MODE)
    Well, I guess there's only one way to find out ... But it's also nice to be supervised by the more experienced coders who are able to spot obvious mistakes.

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,900
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Copying data across several tables (known as denormalization) is generally not recommended unless the alternative is too slow. Mostly because it's easy to lose sight over which data is the "original" data. I.e., if the values in the original table and the copy are different, which one is the "real" value?

    Anyway, I suppose in this case it depends on how many entries you have in the tables. If it's not that many I'd go with option 1. If option 1 is too slow, go with option 2, or get a dedicated search service like sphinx search -- see http://www.ibm.com/developerworks/li...-sphinxsearch/
    Rιmon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    SitePoint Enthusiast
    Join Date
    Apr 2013
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks. Yes, I've had Sphinx recommended to me a couple of times now, and I guess I'll have to check it out eventually. But using Sphinx would require switching web hotels, and that's not up to me to decide. So for now I'm stuck with the indexing and search options provided by MySQL itself.

    Option 1 takes about 4-5 seconds with a fairly simple search, so yes, I think some other solution is called for. And as long as I make sure that the extra column is updated automatically every time any of its source columns is changed, there shouldn't be a problem? Or should there?

  5. #5
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,900
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by atoroqo View Post
    Option 1 takes about 4-5 seconds with a fairly simple search, so yes, I think some other solution is called for. And as long as I make sure that the extra column is updated automatically every time any of its source columns is changed, there shouldn't be a problem? Or should there?
    No it shouldn't be a problem. As long you remember that that data is a copy and should never be used as canonical data
    Rιmon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  6. #6
    SitePoint Enthusiast
    Join Date
    Apr 2013
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK. I'll make sure to comment the columns and document everything in case I get hit by a train and someone else takes over the project.

  7. #7
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,576
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    Copying data across several tables (known as denormalization) is generally not recommended unless the alternative is too slow. Mostly because it's easy to lose sight over which data is the "original" data. I.e., if the values in the original table and the copy are different, which one is the "real" value?
    I wouldn't be so down on denormalization -- it really is how one makes modern apps go in lots of ways and is really powerful when you get your head around it. We are starting to make things that are fundamentally denormalized and then normalize via code when necessary.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •