SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Evangelist N9ne's Avatar
    Join Date
    Aug 2002
    Location
    UK
    Posts
    596
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Duplicate Fields (not whole row) in DB...

    This question was put to me a while back but I hadn't the time to think about it back then. If I were to have a database full of users, (let's say there are 500 users), and each row has a field called 'ipaddress'. Now what I need to do is find out which rows have the same ipaddress. There could be loads of duplicates, and different duplicates. Doing a SELECT * FROM ... WHERE ipaddress='1.1.1.1' is not an option because there are millions of combinations . Is there some sort of functionality that MySQL has that can do this with a single query? I just need to select rows that have identical ipaddresses, i.e. select every row where the ipaddress is not unique.

    Can DISTINCT be used in anyway with this? Like SELECT NOT DISTINCT FROM... ?

  2. #2
    SitePoint Guru SteveO's Avatar
    Join Date
    Jan 2003
    Posts
    625
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    don't know if this would work but....
    Code:
    SELECT DISTINCT * FROM users AS a
      JOIN users AS b
    	ON a.ip_address = b.ip_address

  3. #3
    SitePoint Evangelist N9ne's Avatar
    Join Date
    Aug 2002
    Location
    UK
    Posts
    596
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Didn't work, it did something quite odd instead . It returned every row and showed each row twice, side by side.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Code:
    select ipaddress
      from yourtable
    group by ipaddress
    having count(*)>1
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Evangelist N9ne's Avatar
    Join Date
    Aug 2002
    Location
    UK
    Posts
    596
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    Code:
    select ipaddress
      from yourtable
    group by ipaddress
    having count(*)>1
    Excellent! Thanks very much, works perfectly.

  6. #6
    SitePoint Addict mr tinkles's Avatar
    Join Date
    Jan 2003
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    select ipaddress
    from yourtable
    group by ipaddress
    having count(*)>1;

    will return the col(ipaddress)from tbl (yourtable) where col exists in more than 1 row.

    how does one get to row?

    select *
    from tbl
    group by ipaddress
    having count(ipaddress)>1;

    [imho]is better[/imho]

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    sorry, mrtinkles, your query does not work the way you think it does

    it relies first of all on very sloppy mysql syntax rules

    in any other database, you must group by all non-aggregate columns in the SELECT list

    mysql, however, lets you have "hidden" columns:
    12.7.3 GROUP BY with Hidden Fields

    MySQL has extended the use of GROUP BY. You can use columns or calculations in the SELECT expressions that don't appear in the GROUP BY part. This stands for any possible value for this group. You can use this to get better performance by avoiding sorting and grouping on unnecessary items. For example, you don't need to group on customer.name in the following query:

    mysql> SELECT order.custid,customer.name,MAX(payments)
    -> FROM order,customer
    -> WHERE order.custid = customer.custid
    -> GROUP BY order.custid;

    In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant if you don't run in ANSI mode.

    Don't use this feature if the columns you omit from the GROUP BY part aren't unique in the group! You will get unpredictable results.
    notice the bold phrase: Don't use this feature

    that's mysql telling you, not me

    did you actually test your query, mrtinkles? no?

    i did --

    Code:
    create table mrtinkles
    ( id tinyint not null primary key auto_increment
    , ipaddress varchar(15)
    , foo varchar(6)
    )
    ;
    insert into mrtinkles (ipaddress,foo)
    values
     ('100.200.300.400','foo')
    ,('123.456.789.000','curly')
    ,('123.456.789.000','larry')
    ,('123.456.789.000','moe')
    ,('001.002.003.004','bar')
    ,('111.222.333.444','tom')
    ,('111.222.333.444','dick')
    ,('111.222.333.444','harry')
    ;
    select * from mrtinkles;
    
    id  ipaddress  foo  
    1 100.200.300.400 foo 
    2 123.456.789.000 curly 
    3 123.456.789.000 larry 
    4 123.456.789.000 moe 
    5 001.002.003.004 bar 
    6 111.222.333.444 tom 
    7 111.222.333.444 dick 
    8 111.222.333.444 harry 
    
    select *
    from mrtinkles
    group by ipaddress
    having count(ipaddress)>1;
    
    id  ipaddress  foo  
    6 111.222.333.444 tom 
    2 123.456.789.000 curly
    you see? it doesn't work the way you thought
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think the idea is that, knowing the IP addresses that are non-unique is probably not terribly helpful because you've got to then go run another query for each IP in the result set to clean up the data (if that's the issue).

    So provided you've got a DBMS that support subqueries, you could do something like:
    Code:
    SELECT *
    FROM user
    WHERE ipaddress IN( SELECT ipaddress
    					 FROM user
    					 GROUP 
    						 BY ipaddress
    					 HAVING COUNT( * ) > 1 )
    edit: I can't get the formatting correct -- for some reason it is changing spaces to tabs?!?!

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by MattR
    I think the idea is that, knowing the IP addresses that are non-unique is probably not terribly helpful because you've got to then go run another query for each IP in the result set to clean up the data (if that's the issue).
    i agree wholeheartedly -- if that's the issue

    and the original poster did say "which rows have the same ipaddress"

    the reason i did not go to that level yet is because i've seen
    plenty of these questions where the ultimate requirement turned out
    to be the need for a DELETE statement (which doesn't look anything
    like the SELECT statement to list the rows containing duplicate column values)

    and plenty more where the requirements were satisfied by listing
    the duplicated values only (e.g. joe runs the query, sees three dupes,
    deletes them by hand, and goes on to his next task, without taking
    the time to code a complex query that will delete the right rows automatically)

    and of course some people need to absorb the HAVING count(*)>1
    concept first before going anywhere else like a subquery
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Addict mr tinkles's Avatar
    Join Date
    Jan 2003
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    sorry, mrtinkles, your query does not work the way you think it does.

    what?

    how do you think my 'SQL' statement works?

    it relies first of all on very sloppy mysql syntax rules

    did you try:

    select ipaddress
    from yourtable
    group by ipaddress
    having count(*)>1

    from your example?

    you get some sloppy SQL error about tbl not existing...
    else, the extra special hidden secret "level"
    the original poster did say "which rows have the same ipaddress"

    the reason i did not go to that level ...

    post the SQL that answers the question?

    [goto level *]
    duplicate rows...chapter 1
    [/goto level *]

    thanks,
    mattR

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    incoherent as usual, mr tinkles

    did you try MattR's solution?

    it only works in mysql 4.1

    i am waiting to hear back from N9ne
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Addict mr tinkles's Avatar
    Join Date
    Jan 2003
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    [coherent]
    so, how does one find duplicate rows mySQL?
    [/coherent]

    huh???

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Code:
    select t1.id
         , t1.ipaddress  
         , t1.foo 
      from mrtinkles t1
    inner
      join mrtinkles t2
        on t1.ipaddress = t2.ipaddress
    group
        by t1.id
         , t1.ipaddress  
         , t1.foo  
    having count(*)>1
    
    id  ipaddress  foo
    2 123.456.789.000 curly 
    3 123.456.789.000 larry 
    4 123.456.789.000 moe 
    6 111.222.333.444 tom 
    7 111.222.333.444 dick 
    8 111.222.333.444 harry
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •