SitePoint Sponsor

User Tag List

Page 1 of 3 123 LastLast
Results 1 to 25 of 57
  1. #1
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Group similar results, listing users

    Hi guys!

    I'd like to group similar results, but list all of the users who've added the same data.

    Just to explain, I have two tables:
    1. a table for the bookmarks, containing the URI by which I'd be grouping, and;
    2. a table for the users and bookmark IDs, linking users to bookmarks.

    I know how to group by the URI, but not how to pull through the names of all of the users who added the same bookmark.

    Maybe this would have to happen within PHP? Right now, I don't know what would be quicker — which is why I'm here, asking you guys!

  2. #2
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,034
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Can you post the output of a SHOW CREATE TABLE query for both of the tables?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Code:
    SELECT bookmarks.uri
         , COUNT(DISTINCT bookmarks.id) AS dupes
         , GROUP_CONCAT(userbookmarks.user_id) AS users
      FROM bookmarks
    LEFT OUTER
      JOIN userbookmarks
        ON userbookmarks.bookmark_id = bookmarks.id
    GROUP
        BY bookmarks.uri
    the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    Can you post the output of a SHOW CREATE TABLE query for both of the tables?
    Hi, and thanks for the reply.

    Code:
    CREATE TABLE IF NOT EXISTS `bookmarks` (
      `id` int(11) NOT NULL auto_increment,
      `url` text,
      `title` text,
      `snippet` text,
      `datetime` datetime default NULL,
      PRIMARY KEY  (`id`),
      KEY `datetime` (`datetime`),
      FULLTEXT KEY `title` (`title`,`snippet`)
    ) ENGINE=MyISAM
    Code:
    CREATE TABLE IF NOT EXISTS `links` (
      `id` mediumint(11) NOT NULL auto_increment,
      `user_id` mediumint(11) NOT NULL,
      `bookmark_id` int(11) NOT NULL,
      `status` enum('public','private') NOT NULL default 'public',
      UNIQUE KEY `id` (`id`),
      KEY `bookmark_id` (`bookmark_id`)
    ) ENGINE=MyISAM

  5. #5
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    Code:
    SELECT bookmarks.uri
         , COUNT(DISTINCT bookmarks.id) AS dupes
         , GROUP_CONCAT(userbookmarks.user_id) AS users
      FROM bookmarks
    LEFT OUTER
      JOIN userbookmarks
        ON userbookmarks.bookmark_id = bookmarks.id
    GROUP
        BY bookmarks.uri
    the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)
    Hi and thanks for the reply.

    Yes, there are duplicate URIs; that's an integral function of the application.

    I'll get back later today, once I've managed to try out the code.

  6. #6
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    Code:
    SELECT bookmarks.uri
         , COUNT(DISTINCT bookmarks.id) AS dupes
         , GROUP_CONCAT(userbookmarks.user_id) AS users
      FROM bookmarks
    LEFT OUTER
      JOIN userbookmarks
        ON userbookmarks.bookmark_id = bookmarks.id
    GROUP
        BY bookmarks.uri
    the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)
    I got an error relating to a non unique alias for the links table, which I removed from the FROM. But then I got another error:
    Code:
    Unknown column 'bookmarks.id' in 'on clause'
    Here's the sequel statement in full:
    Code:
    SELECT bookmarks.id, COUNT(DISTINCT bookmarks.id) AS duplicates, GROUP_CONCAT(links.user_id) AS users, links.user_id, bookmarks.url, bookmarks.title, SUBSTR(bookmarks.snippet, 1, 100) as snippet, GROUP_CONCAT(tags.tag) AS tags, bookmarks.datetime, links.status FROM bookmarks, tags LEFT OUTER JOIN links ON (links.bookmark_id = bookmarks.id) WHERE (links.status = 'public') AND (links.bookmark_id = bookmarks.id) GROUP BY bookmarks.url ORDER BY bookmarks.datetime DESC LIMIT 0, 9
    Needless to say, there most certainly is a column called 'id' for the bookmarks table.

    You'll perhaps notice that there's a duplicate (links.bookmark_id = bookmarks.id) in the statement; that's a hold over from the previous statement. Even if I remove it, I still get the error.

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    okay, we'll have to deconstruct your latest query to focus on what you're really after

    what i gave you in post #3 was based on your original requirements, but your latest query adds all kinds of extra stuff, much of which is quite incompatible with the original GROUP BY clause

    let me give you an analogy to illustrate the conceptual problem you're having

    suppose you have a school consisting of multiple classrooms, where each classroom has multiple students

    now i will ask you to write a query which returns the number of students in each classroom, along with the student's last name

    your first and immediate response should be "which student?"

    it is the same issue -- when you have a GROUP BY clause, every column in the SELECT clause must either be mentioned in the GROUP BY clause or be the argument of an aggregate function, like COUNT() or MAX()

    do you see the issue?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    okay, we'll have to deconstruct your latest query to focus on what you're really after...
    Before I do that, I looked up the error and, apparently, the cause is the usage of the LEFT OUTER JOIN not coming after an originating JOIN. Would that be correct?

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    yeah, it is down to mixing the (deprecated) comma-list join syntax together with explicit JOIN syntax

    but we will have to tear all that apart anyway...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    Do you see the issue?
    Yes, sort of.

    Why would I be asking you which student? The whole purpose of this query is to just return the names of those in the same classrooms, not to return a specific student by name.

    That aside, what do you want me to do? I' assuming the statement will need carving into two, or something like that.

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Forbes View Post
    Why would I be asking you which student?
    because a GROUP BY on the classroom returns a single aggregate row for the classroom, and since there are multiple ~different~ student names in a classroom, asking for one of them, without stating which one, is nonsensical

    similarly, you have both
    Code:
         , GROUP_CONCAT(links.user_id) AS users
         , links.user_id
    in the SELECT clause -- the first one is okay, because it aggregates all the users into a single string, but the second one isn't, because it's not an aggregate function

    let's get back to your fixing your query, though

    we'll begin with simple retrieval from the bookmarks table --
    Code:
    SELECT bookmarks.id
         , bookmarks.url
         , bookmarks.title
         , SUBSTR(bookmarks.snippet, 1, 100) as snippet
         , bookmarks.datetime
      FROM bookmarks
    you mentioned that there could be duplicate urls in this table -- may i ask why?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    We'll begin with simple retrieval from the bookmarks table --
    Code:
    SELECT bookmarks.id
         , bookmarks.url
         , bookmarks.title
         , SUBSTR(bookmarks.snippet, 1, 100) as snippet
         , bookmarks.datetime
      FROM bookmarks
    you mentioned that there could be duplicate urls in this table -- may i ask why?
    Sorry about the late reply — life!

    I'm working on a bookmarking service, so it's a function of that service to allow people to add the same bookmark.

    Over time, the number of times a bookmark has been added will build towards a ranking algorithm.

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    okay, so given that a particular url can be in the table more than once, will ~all~ rows for the same url have identical titles and snippets and datetimes?

    presumably the answer is no

    so let me ask you then, if you were to aggregate/collapse all the rows for a particular url into one result row, which title or snippet or datetime would you like to see? keep in mind the student name analogy for the classroom count
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    okay, so given that a particular url can be in the table more than once, will ~all~ rows for the same url have identical titles and snippets and datetimes?

    presumably the answer is no
    Correct. And here's an example of a bookmark that's been added by the same person.

    So far, Under Cloud can handle duplicate bookmarks on an individual basis, like the aforementioned.

    Quote Originally Posted by r937 View Post
    so let me ask you then, if you were to aggregate/collapse all the rows for a particular url into one result row, which title or snippet or datetime would you like to see? keep in mind the student name analogy for the classroom count
    As with the example I provided, the first instance would be the proper bookmark to use, as all others are more recent.

    So in the index view, when a duplication is discovered, the very first bookmark would be used, along with the date and time by which it was added.

    I suppose the next logical question to you would be, are the duplicates detected within the rows retrieved within the remit of the LIMIT clause, or across the whole of the table?

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Forbes View Post
    So in the index view, when a duplication is discovered, the very first bookmark would be used, along with the date and time by which it was added.
    not sure what you mean by index view

    here's the query which counts the bookmarks, augmented with the "earliest" data...
    Code:
    SELECT bookmarks.url
         , m.dupes
         , bookmarks.title
         , SUBSTR(bookmarks.snippet, 1, 100) as snippet
         , bookmarks.datetime
      FROM ( SELECT url
                  , COUNT(*) AS dupes
                  , MIN(datetime) AS earliest
               FROM bookmarks
             GROUP
                 BY url ) AS m
    INNER
      JOIN bookmarks
        ON bookmarks.url = m.url
       AND bookmarks.datetime = m.earliest
    make sense?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    not sure what you mean by index view
    Index is simply the listing of all bookmarks in date order.

    Quote Originally Posted by r937 View Post
    here's the query which counts the bookmarks, augmented with the "earliest" data...
    Code:
    SELECT bookmarks.url
         , m.dupes
         , bookmarks.title
         , SUBSTR(bookmarks.snippet, 1, 100) as snippet
         , bookmarks.datetime
      FROM ( SELECT url
                  , COUNT(*) AS dupes
                  , MIN(datetime) AS earliest
               FROM bookmarks
             GROUP
                 BY url ) AS m
    INNER
      JOIN bookmarks
        ON bookmarks.url = m.url
       AND bookmarks.datetime = m.earliest
    make sense?
    I just tried that on the live data and it didn't combine the two most recent duplicates "Alan Turing’s Patterns in Nature, and Beyond", which was added by another user and myself.

  17. #17
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,034
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Check the URLs for the duplicates of "Alan Turing’s Patterns in Nature, and Beyond", are the URLs identical?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  18. #18
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    Check the URLs for the duplicates of "Alan Turing’s Patterns in Nature, and Beyond", are the URLs identical?
    They were, but the I've since edited mine (removing extraneous name-value parameters).

    Having run the query again, the "dupes" column is now showing two and there's only one bookmark being listed, and the date and time relates to the first bookmark added by Anna. So that's now working.

    By way of proof, here's the actual bookmark listing, with me as the other user having bookmarked the same article.

    I've just tried adding back in the options for retrieving user information, but to no avail.

  19. #19
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Forbes View Post
    I've just tried adding back in the options for retrieving user information, but to no avail.
    my crystal ball is broken, and i can't see your attempt from here

    mind sharing?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  20. #20
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    my crystal ball is broken, and i can't see your attempt from here

    mind sharing?

    Well it was such an appalling mess, I just reverted back!

    I was trying to merge the parts of the original query (posted previously) with your own.

    I'll be honest, all of this way beyond me — the queries I write are, for the most part, very basic.

    Code:
    SELECT
    bookmarks.id, source.duplicates, GROUP_CONCAT(links.user_id) AS users, links.user_id, bookmarks.url, bookmarks.title, SUBSTR(bookmarks.snippet, 1, 100) as snippet, GROUP_CONCAT(tags.tag) AS tags, bookmarks.datetime, links.status
    FROM (
    	SELECT url, COUNT(*) AS duplicates, MIN(datetime) AS earliest
    	FROM bookmarks GROUP BY url) AS source, links, tags
    INNER JOIN bookmarks ON (bookmarks.url = source.url)
    AND (bookmarks.datetime = source.earliest)
    AND (links.status = 'public')
    AND (links.bookmark_id = bookmarks.id) GROUP BY bookmarks.url
    ORDER BY bookmarks.datetime DESC LIMIT 0, 9
    All of which gives me an error:

    #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ON (bookmarks.url = source.url) AND (bookmarks.datetime = source.earliest) AND (' at line 1
    From what I can grasp, MySQL doesn't know to handle the various references to columns belonging to the links and tags tables, and I don't know where to put the table names.

  21. #21
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    yikes (no offence)

    okay, try this --
    Code:
    SELECT bookmarks.url
         , m.dupes
         , bookmarks.title
         , SUBSTR(bookmarks.snippet, 1, 100) as snippet
         , bookmarks.datetime
         , t.tags
         , l.users
      FROM ( SELECT url
                  , COUNT(*) AS dupes
                  , MIN(datetime) AS earliest
               FROM bookmarks
             GROUP
                 BY url ) AS m
    INNER
      JOIN bookmarks
        ON bookmarks.url = m.url
       AND bookmarks.datetime = m.earliest
    LEFT OUTER
      JOIN ( SELECT bookmark_id
                  , GROUP_CONCAT(tag) AS tags
               FROM tags
             GROUP
                 BY bookmark_id ) AS t
        ON t.bookmark_id = bookmarks.id
    LEFT OUTER
      JOIN ( SELECT bookmark_id
                  , GROUP_CONCAT(user_id) AS users
               FROM links
              WHERE status = 'public'
             GROUP
                 BY bookmark_id ) AS l
        ON l.bookmark_id = bookmarks.id
    ORDER 
        BY bookmarks.datetime DESC LIMIT 0, 9
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  22. #22
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    Okay, try this...
    OK, I tried that and it hung phpMyAdmin as well as my web browser, when I mirrored your code into mine.

  23. #23
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    make (bookmark_id,tag) the PK in tabs table

    make (bookmark_id,user_id) the PK in links table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  24. #24
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    make (bookmark_id,tag) the PK in tabs table

    make (bookmark_id,user_id) the PK in links table
    Just so I know what you're asking, you want me to make the bookmark_id and tag columns in the tags table primary keys, and similarly, the bookmark_id and user_id columns in the links table primary keys, too?

  25. #25
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    not primary keys (plural), because a table can have only one primary key -- but it can be a composite key, consisting of more than one column
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •