SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    ordering within IN clause

    Hi,

    It seems that ordering within an IN clause does not work.

    Code MySQL:
    select user_id
    from table1
    where user_id in (select user_id from table2 order by created_date desc) #desc so that I get the newest entries only
    limit 25

    Running this query will always give me the same results. It seems that the user_id is not being sorted correctly.

    I have been googling and it seems that the field() function may work for me.
    But based on the documentation and examples I've seen so far, it seems that I have to run the select sub query twice? I'm not sure if I'm understanding it correctly.


    Code MySQL:
    select user_id
    from table1
    where user_id in (select user_id from table2)
    order by field(user_id, select user_id from table2) #I need desc here
    limit 25


    But I'm afraid this would also be slow (I'm moving the query from a join to IN instead) because the indexed column would be inside the function.

    I may result in using a stored procedure in order to sort the query and then pass the result set within the IN clause.

    If anyone can shed some light please and thanks.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    here you go
    Code:
    SELECT table1.user_id
      FROM table1
    INNER
      JOIN table2
        ON table2.user_id = table1.user_id 
    ORDER 
        BY table1.created_date DESC LIMIT 25
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,031
    Mentioned
    152 Post(s)
    Tagged
    2 Thread(s)
    If you want to order the output of the query you should put ORDER BY on the outer query, not on the inner query.

    Putting ORDER BY on the inner query does nothing but make the query slower.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    r937 that was the original query I had. Unfortunately it takes about 3 seconds for the whole operation. There are over 100k rows in each table. That is why I was avoiding a join to begin with.

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    rewriting a join as a IN (subquery) construction is often worse performance

    are your join columns properly indexed? and is all you really want to return just the userid?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,031
    Mentioned
    152 Post(s)
    Tagged
    2 Thread(s)
    Wait, are you saying IN is faster than an INNER JOIN? It really shouldn't be!

    Do you have indexes on table1.user_id and table2.user_id ? If not, create them and try again with the INNER JOIN.

    Edit:


    You beat me to it again
    That's twice in one thread
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  7. #7
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I actually got better performance out of using the IN clause. I brought it down from 3 seconds to ~ .01 seconds. The only issue I had was getting the sorting right.

    All I need is the user id and both columns are indexed.

    I think the problem my be how mysql was compiled. It seem's that the cache is not working right neither. Running the same query multiple times from my machine works fine (although slow, the second+ times are much faster due to caching) and terrible on production.

    EDIT:

    Oops, it seems that I made a mistake when posting the query. I actually need their profile_link column as it contains important information.

    It should be like this:

    Code:
    SELECT t1.profile_link
    FROM table t1
    WHERE t1.user_id IN (SELECT t2.user_id FROM table2 t2 ORDER BY t2.created_date desc) #desc so that I get the newest entries only
    LIMIT 25
    table2 contains data that is inserted all the time and I would like the newest entries only.

  8. #8
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,031
    Mentioned
    152 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by proxi View Post
    table2 contains data that is inserted all the time and I would like the newest entries only.
    So why did you put the LIMIT on the outer query and not on the inner query?
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  9. #9
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Because you can't put a limit on the sub query using the IN clause.

    I also forgot to mention that whatever the limit is, it's also the maximum amount of IDs within the IN clause.

    I may have to solve this programmatically if I can't find a valid solution. Maybe query it through the application code and pass the results into this query. Or somewhere around those lines.

    I'm nearly home and will be testing out the same tables but under inno db instead (currently running under myisam).

  10. #10
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I just joined 500,000 from each table and it ran extremely fast (at home). I think the slow speeds are due to the way mysql was configured. I'll have to do more research on the production server.

    Do you guys have any advice on any flags I should be looking for? I'm going to be checking the cnf file and see if there is anything there weird.

    Edit:

    Maybe it's just hardware issue. Hm.

  11. #11
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    maybe it's just your database structure being wrong

    Seeing you mention that you were going to try setting it up with innodb instead of myIsam, I had a feeling it would be quicker.

    Joining Innodb tables is more efficient than with MyISAM tables and, you have the added benefit of being able to enforce *referential integrity*.

  12. #12
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    more efficient as in, afaik, you can't join MyIsam tables.

  13. #13
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, so I figured out why it was slow.

    It's using a group by clause.

    Code:
    SELECT table1.user_id
      FROM table1
    INNER JOIN table2
        ON table2.user_id = table1.user_id
    GROUP BY table2.user_id 
    ORDER 
        BY table1.created_date DESC LIMIT 25
    Looks like when I issue a group by it creates a temporary table. I need to filter out duplicates from this query.

  14. #14
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,707
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by IBazz View Post
    more efficient as in, afaik, you can't join MyIsam tables.
    No problems with JOINs on MyISAM table types, you can mix them with INNODB tables as well.

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by proxi View Post
    Looks like when I issue a group by it creates a temporary table.
    it could also be caused by your ORDER BY clause

    please note, whenever you use a GROUP BY clause, then the ORDER BY clause can name only columns that are in the SELECT clause

    you're trying to sort by a column that isn't included in the SELECT clause

    not only is this problematic (you can't sort by a column that isn't included in the rows being sorted), but it also doesn't make sense, since if you don't display it, you can't really discern the sequence of the results properly

    what's your actual query, i.e. with real not fake column and table names?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    SELECT p.profile_link
      FROM profile p
    INNER JOIN news n
        ON n.user_id = p.user_id
    GROUP BY n.user_id 
    ORDER 
        BY n.created_date DESC LIMIT 25

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    yeah, it is as i thought

    you can't sort by created_date because it isn't in the SELECT clause

    and the GROUP BY clause makes no sense to me at all, you will get unique profile_link values for each separate user, but no way to know which user each profile_link belongs to
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    in the news table the same user can insert new rows. In fact, new rows are inserted by the same user around 5 times a day on a average. If the latest rows are by the same user I would like to avoid repeating the same user multiple times.

  19. #19
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    could you please do a SHOW CREATE TABLE for both tables

    and perhaps some sample rows that illustrate your problem
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  20. #20
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,707
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    could you please do a SHOW CREATE TABLE for both tables

    and perhaps some sample rows that illustrate your problem
    Kinda like pulling teeth isn't it?

  21. #21
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by guelphdad View Post
    No problems with JOINs on MyISAM table types, you can mix them with INNODB tables as well.
    oops, I stand corrected. apologies proxi. Guess what i am going to be re-reading for the next while.
    Last edited by IBazz; Jan 20, 2011 at 16:01. Reason: spelling/grammar

  22. #22
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
     create table profile(
     id int unsigned not null primary key auto_increment,
     fname varchar(100) not null,
     lname varchar(100) not null,
     email varchar(200) not null,
     p_link varchar(200) not null,
     constraint `uq_profile_email` unique(email)
     )engine = innodb;
      
     
     create table news(
     id int unsigned not null primary key auto_increment,
     profile_id int unsigned not null,
     title varchar(200) not null,
     message text not null,
     created_date timestamp not null default current_timestamp,
     constraint `fk_news_profile_id` foreign key (profile_id) references profile (id) 
    ) engine = innodb;
    The data for news table is for internal users to post about anything. Like a blog. There is no limit in the amount of posts they can make. So if a user posts 5 "news" posts, I would not like the user to repeat from the query. Only distinct user posts.

    This also leads me to my next question. i've been trying to write up a query in order to show a count of how many news posts have been posted. Ideally I would like to count based on an interval.

    The table would look something like this:

    id - profile_id - title - message - created_date

    1:1340:"I love this beer!":"...etc....":"2011-01-20 01:25:09"

    2:936:"New inc":"...etc....":"2011-01-20 01:25:12"

    3:38:"School restarts":"...etc....":"2011-01-20 01:25:20"

    4:7352:"Advancing":"...etc....":"2011-01-20 01:25:30"

    5:1340:"I love this beer!":"...etc....":"2011-01-20 01:26:01"

    6:1340:"Oh... here we go again ":"...etc....":"2011-01-20 01:26:16"

    If you notice the pattern in the time stamp. There are on an average 5 posts per minute. I would like to group the count of those posts within that minute.

    The query in this case should return 2 rows. First row with a count of 4 and the second row of count 2.

    If anyone could point me to the right direction I would appreciate it

  23. #23
    SitePoint Zealot
    Join Date
    Apr 2010
    Posts
    106
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You can scratch my second question. I got it working properly.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •