SitePoint Sponsor

User Tag List

Results 1 to 15 of 15

Thread: Cleanup Query?

  1. #1
    SitePoint Enthusiast britton's Avatar
    Join Date
    Jul 2005
    Location
    Tennessee
    Posts
    83
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Cleanup Query?

    Recently I had an issue where friend requests were only half deleted.

    I have a friend table which stores 3 records... user_id1, user_id2, and status of the request. For each friend request there are two records created.

    So for instance, the stored data would look like:

    1, 2, accepted
    2, 1, accepted

    The problem is that the remove friend script was recently corrupted and only removed 1 of the 2 friend records so to many many it appears they still have a relationship whereas to their counterpart it does not.

    What is the best way to clean up records which do not have a counterpart?

    Thanks

  2. #2
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,262
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Quick and dirty, something like this would work (WARNING - backup your database before running!!!)

    Code:
    DELETE
      FROM Friends
     WHERE CAST(user_id1 AS CHAR(10)) + 
        CAST(user_id2 AS CHAR(10)) NOT IN (SELECT CAST(user_id2 AS CHAR(10)) + 
                 CAST(user_id1 AS CHAR(10)) 
               FROM Friends)
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  3. #3
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    I'm very doubtful about what Dave's suggesting.

    If I understood correctly, the "main" filter is:

    Code:
    WHERE request_status = "accepted"
    and to retrieve the left overs, a self join is required.

  4. #4
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,262
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by itmitică View Post
    I'm very doubtful about what Dave's suggesting.

    If I understood correctly, the "main" filter is:

    Code:
    WHERE request_status = "accepted"
    and to retrieve the left overs, a self join is required.
    Not to do what he was asking, which was to cleanup the mess left behind from a faulty query eariler. According to the OP, there are two records for each request, one where user_id1 = friendA and user_id2 = friendB, and one where user_id1 = friendB and user_id2 = friendA. I was just looking for the records where there is no reciprocal record.

    The OP was looking for an one-off query, not to correct the process.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  5. #5
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    1. The table has multiple distinct values for the request status field. Hence, the first filter is about those accepted requests.

    2. After CAST()+CAST(), '6789': could mean (6, 789), (67, 89), (678, 9) while '9876' could mean (9, 876), (98, 76), (987, 6)

    3. Huge performance penalties with the use of CAST()+CAST().

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by itmitică View Post
    2. After CAST()+CAST(), '6789': could mean (6, 789), (67, 89), (678, 9) while '9876' could mean (9, 876), (98, 76), (987, 6)
    not if CHAR(10) works as intended, i.e. with trailing blanks

    did you test it?

    Quote Originally Posted by itmitică View Post
    3. Huge performance penalties with the use of CAST()+CAST().
    define "huge"
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    did you test it?
    Not this one, but I'd never do something like that because...

    Quote Originally Posted by r937 View Post
    define "huge"
    ...it's big big big...

    ...wrong.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by itmitică View Post
    ...it's big big big...

    ...wrong.
    no, it's not big, and neither is it wrong

    please don't be unnecessarily alarmist
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    no, it's not big,
    Yeah, it's very well known that size it's a matter of opinion!


    Quote Originally Posted by r937 View Post
    and neither is it wrong
    Yeah, it is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.


    Quote Originally Posted by r937 View Post
    please don't be unnecessarily alarmist
    I guess the word would be informative.

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by itmitică View Post
    Yeah, [ CAST ] is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.
    perhaps <snip/> you forgot that we were talking about using the CAST function

    which is, i repeat, ~not~ wrong



    <snip/>
    Last edited by Mittineague; Jul 11, 2012 at 22:33.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    <snip/>

    The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. <snip/>

    At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.
    Last edited by Mittineague; Jul 11, 2012 at 15:12.

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by itmitică View Post
    The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it.
    first of all, CAST is not an aggregate function

    and once again, i request you to define "huge" in specific terms

    Quote Originally Posted by itmitică View Post
    At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.
    nonsense

    <snip/>
    Last edited by Mittineague; Jul 11, 2012 at 22:35.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    The assumption: based on OP data so far, a request for friendship is always expressed as records couples: (1|2, 2|1).

    The solution: look for records that are singletons, no matter the request status: pending, denied, accepted.

    The given: the friendship initiator always starts with the accepted value for status. However, this may change for the initiator, since, later on, it may choose to deny the friendship.

    Possible actions:

    1.1 When user 1 initiates a friendship request, two records are inserted:
    Code:
        1|2|accepted
        2|1|pending
    1.2 When user 1 denies the friendship it has initiated, the corresponding records are updated:
    Code:
        1|2|denied
        2|1|pending
    2.1 When the user 2 accepts the request, the corresponding record is updated:
    Code:
        1|2|accepted
        2|1|accepted
    2.2 When the user 2 denies the requests, the corresponding record is updated:
    Code:
        1|2|accepted
        2|1|denied
    along with the rest of the variations.


    The study case table:
    Code:
    FRIENDS_REQUESTS
    ================
    user_id1 | user_id2 | request_status | id
    -----------------------------------------
           1 |        2 | accepted       |  1
           2 |        1 | accepted       |  2
           1 |        3 | accepted       |  3
           3 |        1 | accepted       |  4
           2 |        3 | accepted       |  5
           3 |        2 | pending        |  6
    The cleanup scenario:

    The record having id = 2 has been deleted.
    The record having id = 1 is now unmatched.
    It needs to be deleted.

    The delete process will take id (PK) for the delete filter.
    The id set to be deleted is determined like this:
    Code:
        SELECT fr.id
          FROM friends_requests fr
         LEFT 
    OUTER JOIN friends_requests AS fr2
            ON ( fr.user_id1 = fr2.user_id2
           AND   fr.user_id2 = fr2.user_id1 )
         WHERE fr2.request_status IS NULL
    The query has an average execution time of about 1.5ms for the test table above.


    <hr>


    To address the penalties issues.

    The query proposed by Dave:
    Code:
     SELECT friends_requests.id
       FROM friends_requests
      WHERE CAST(user_id1 AS CHAR(10)) 
         || CAST(user_id2 AS CHAR(10)) 
     NOT IN ( SELECT CAST( user_id2 AS CHAR(10) ) 
                  || CAST( user_id1 AS CHAR(10) ) 
                FROM friends_requests )
    has an average execution time of about 2.0ms for the same test table above.

    The larger the data, I personally believe that this difference will become much much bigger.


    Another thing that's plain wrong, to me.

    Code:
    ( 1 = 1 AND 2 = 2 )
    is far better comparison test, performance wise (or otherwise), then
    Code:
    ( '1         2         ' = '1         2         ' )
    All this while the WHERE...NOT IN, for me, it's a JOIN gone wrong.

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,241
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    very nicely done

    carefully laid out, rationally explained, non-confrontationally

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    SitePoint Enthusiast britton's Avatar
    Join Date
    Jul 2005
    Location
    Tennessee
    Posts
    83
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks to both of you guys for your help!

    I went with Imitica's solution since the friend table has over 5,000,000 records.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •