Friends of Friends

Hi guys I am developing open source social network and my skills dont really lend themselves to mysql.

Trying to implement a feature to find the friends of a friend who I am not already friends with. Its not going well !

My DB looks like this:

TABLE: friends_list

user_1 user_2
2 1
3 2
1 4
1 5

So in theory if my userid is ‘1’ then finding the friends of friends who I am not friends with should return ‘3’.

My current query is the following, although I dont think its even close !

SELECT * FROM friends_list WHERE
user_2 != ‘1’ AND
user_1 IN(SELECT user_2 FROM friends_list WHERE user_1 = ‘1’) AND
user_1 NOT IN(SELECT user_1 FROM friends_list WHERE user_2 = ‘1’);

This might work - and if it does, there might very well be an easier way to do it :wink:

SELECT
 user_2
FROM
 friends_list
WHERE
 user_2 <> 1
AND
 user_1
IN (
 SELECT
  user_2
 FROM
  friends_list
 WHERE
  user_1 = 1
 UNION SELECT
  user_1
 FROM
  friends_list
 WHERE
  user_2 = 1
)
UNION SELECT
 user_1
FROM
 friends_list
WHERE
 user_1 <> 1
AND
 user_2
IN (
 SELECT
  user_2
 FROM
  friends_list
 WHERE
  user_1 = 1
 UNION SELECT
  user_1
 FROM
  friends_list
 WHERE
  user_2 = 1
)

so your relationships are directional? 2 is a friend of 1, but 1 is not a friend of 2?

most people store two rows for a bi-directional relationship, simply because it makes the SQL a lot more efficient

most people store two rows for a bi-directional relationship, simply because it makes the SQL a lot more efficient

That, plus it’s just sound database design. You should never have a many-to-many relation between two tables.

this is a most intriguing statement

could you elaborate a bit?

Because it’s highly impractical, if you use database tables properly. I can’t think of any example in which it is possible to properly design a database with a many-to-many relation between two tables, without a table in between to sort out the who’s-which.

Example:

[list][]A person may own many companies.
[
]A company may be owned by many persons.[/list]

If you were to build this relation using only two tables, you would have two options:

[list][]The person table has as many fields as there are companies, registering potential ownership of each company under each person. This will result in absurdly large tables.
[
]The person table has a single field with a comma-seperated list of companies owned. This will make it extremely inefficient and complicated to extract data using normal database queries, making the point of using a database in the first place moot.[/list]

(Note: I’m aware that this is technically four different options, as the person and company tables can be interchanged in the example, but either solution is equally impractical).

On the subject of relations, one-to-one relations should not be used either, as this is analogous to using a single table data model-wise, but much more complicated and slow when making queries.

Edit: Hmmm… Seing as you are an SQL consultant, I assume this is some sort of trick question? :wink:

okay, if you had said “without a table in between” then i would’ve understood immediately

please allow me to disagree :slight_smile:

table1
id (PK)
bunch o columns

table2
id (PK/FK)
a very large column

this one-to-one relationship is actually better than if you were to stuff table2’s very large column into table1

longer rows = fewer rows per physical block = slower queries

especially if there are queries which need table1 data but not table2 data

make sense?

okay, if you had said “without a table in between” then i would’ve understood immediately

Fair enough - my terminology is that of a hobbyist, and quite possibly a bit crude :slight_smile:

make sense?

It does, yes. I agree that performance issues can call for a one-to-one relation. I will maintain that it’s still analogous data model-wise, though. I doubt that this will affect very many users here, however, compared to the numbers of users who would be negatively affected by using multiple tables where one is sufficient. Those who have to build such large tables are likely to know a lot more about databases than me anyway, and wouldn’t listen to anything I have to say :stuck_out_tongue:

(I would appreciate your input on my SQL, by the way - I’m still learning :))

no problem

for starters, it’s too tall and narrow

:smiley: :smiley:

So you prefer one-liners :stuck_out_tongue:

(I always use the styling conventions which suits me best. Unfortunately, they never seem to be shared by anyone else ;)).

no, that would be way too short and wide :smiley:

i would reformat your query like this –

SELECT user_2
  FROM friends_list
 WHERE user_2 <> 1
   AND user_1 IN 
       ( SELECT user_2
           FROM friends_list
          WHERE user_1 = 1
         UNION 
         SELECT user_1
           FROM friends_list
          WHERE user_2 = 1 )
UNION 
SELECT user_1
  FROM friends_list
 WHERE user_1 <> 1
   AND user_2 IN 
       ( SELECT user_2
           FROM friends_list
          WHERE user_1 = 1
         UNION 
         SELECT user_1
           FROM friends_list
          WHERE user_2 = 1 )

i’m not sure if that’s actually going to return what “justme” is looking for, but it sure looks like it may have some redundancy in it

let’s wait and find out if relationships are stored as two rows or one …

Hope you’re not as picky with the ladies, as you are with your SQL styling :wink:

Your suggestion will take some getting used to (been using my current style for some years now, and coding habits die hard), but I’ll definitely give it some though. Thanks for the suggestion :slight_smile:

Thanks for all the replies. It certainly makes for an interesting read !

The relationships are stored as 1 row only. I saw most DB designs for friends to be something like User_ID, Friend_ID. However friendships are bidirectional, ie when user_2 logs in he is friends with user_1 and vice versa, so either user should be able to appear in the user_1 or user_2 column.

But i’m still not sure this is the best way to go about things. Kudos for the sql statement, so far works just as expected, thanks !

true, but most of the ones i’ve seen also store the relationship in two complementary rows

the SQL is easier, trust me

:slight_smile:

Agreed, the SQL queries have become increasingly complex, but so far ok !

Does having two rows affect performance at all ? Baring in mind I am using this for a open source social network, there is a possiblity of a large number of relationships.

yes

it improves it

:slight_smile:

Thanks for the heads up, I naturally assumed more records = longer query times. However thinking about all the bloated SQL queries I am having, you might just be on the money !

With regards to:


SELECT user_2
  FROM friends_list
 WHERE user_2 <> 1
   AND user_1 IN 
       ( SELECT user_2
           FROM friends_list
          WHERE user_1 = 1
         UNION 
         SELECT user_1
           FROM friends_list
          WHERE user_2 = 1 )
UNION 
SELECT user_1
  FROM friends_list
 WHERE user_1 <> 1
   AND user_2 IN 
       ( SELECT user_2
           FROM friends_list
          WHERE user_1 = 1
         UNION 
         SELECT user_1
           FROM friends_list
          WHERE user_2 = 1 )

How can I tell which is the friend and which is the friend of the friend. I dont really want to make another query call to check who is the friend. Or would this complicated things even more ?

Thanks again :slight_smile:

i’m not sure, that isn’t my query :rolleyes:

what was the original question again? oh yeah, “friends of a friend who I am not already friends with”

do you have a test base? i mean, a number of relationships that you can inspect visually to confirm the correctness of the query?

opps, so it isn’t, i saw the code that you styled and assumed.

I have a codebase of around 100 relationships for testing. Further down the road what I think I will need to do is sort the results by weighting the count of mutual friends. ie the more friends a user has in common with us who is not a friend already will appear as more relevant.