Why is this sql showing the same school twice? New to JOINS

This sql:

SELECT DISTINCT
schools.id, schools.school, mascots.id, mascots.mascot
FROM
schools,
mascots
LEFT JOIN
schools s ON s.school_mascot = mascots.id
WHERE
schools.school_year = ‘2020’;

Shows this output. Why am I getting the same schools twice?

id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |1| Cowboys|
|1| Wyoming |2| Tigers|
|2| Memphis |2| Tigers|

When it should be:

id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |2| Tigers|

Any help would be appreciated.

The problem is the above. This is joining every row in the schools table (2) with every row in the mascots table (2), resulting in 4 rows, i.e. every combination. This is referred to as a cartesian product, since there’s nothing in the query to give only the rows that have a relationship.

The LEFT JOIN you have added to the result of the above cartesian product part of the query, is just joining the schools table to that result again.

Going back a step, the naming and organization of your tables/columns is in need of help. A mascot is related to a school, not the other way around. A school can have none, one, or more than one mascot at any time and this can change over time (which is why I suspect you have the year comparison in the query.) The mascot table should have - id, school_id, name, start (school) year, end (school) year (for the current mascot, this should be set to a maximum permitted year value, 9999), and any other columns related to the existence of each mascot. If your schools.school and mascots.mascot columns are the school name and mascot name, the columns should be called name.

To get a list of schools and any mascots for any particular school year, the query would look like -

SELECT s.id, s.name, m.id, m.name
FROM schools s
LEFT JOIN mascots m ON s.id = m.school_id
WHERE 2020 BETWEEN m.start_year AND m.end_year
ORDER BY s.name, m.name
2 Likes

excellent analysis, @mabismad

one small point – in the query you posted for the redesigned mascot table, the WHERE condition should be moved to the ON clause, otherwise the join functions as an inner join

1 Like

Thanks for all the great advice. I am curious though if I am designing it right though. For example, Memphis was once called Memphis State. How would I be able to make the schools table pull up for example:

1995 Memphis State Tigers
1996 Memphis State Tigers
1997 Memphis Tigers

See how the name of the school changed from one year to the next?

Also, mascots also change:

2000 Arkansas State Indians
2001 Arkansas State Red Wolves

Would I be better having a linking table?

There are other factors too.
Coaches change all the time for each team and can coach for many different teams.
Players can play a couple years at one team then transfer to a new school.

Would linking tables be better suited for information handling such as this?

Thanks again for the help.

The school table would hold only the unique/one-time information about the school. This would establish a school id. You would have a school_name table, with - id, school_id, name, start date, and end date columns. (The mascot table above should probably use a full date, rather than just a year, then you would need to use a full date as the input value being compared against. To find information based on a school year, you use the full date that corresponds to the start date of the school year.)

The primary table for a coach or student, contains only the unique/one-time information. This establishes a coach id or a student id. You would have a coach_school table, with - id, coach_id, school_id, start date (a full date), end date (a full date), and any other columns needed to hold the unique values about the instance of a coach at a school. This establishes a coach school id, used to relate any other multiple same meaning data back to each instance of a coach at a school.

Repeat for any other information. All of this falls under a subject called data normalization.

1 Like

So, would the tables you mentioned be similar to the ones below?

tbl_schools
id
name

tbl_school_names
id
school_id
name
start_year
end_year

tbl_mascots
id
school_id
name

tbl_coaches
id
name

tbl_coach_school
id
coach_id
school_id
start_year
end_year

tbl_players
id
name

tbl_player_school
id
player_id
school_id
start_year
end_year

I haven’t heard of Normalization. I didn’t know you could have so many tables!
Would JOINS on this many tables for one school or many be really extensive?
Thanks again for the help.

Yes.

Yes.

An additional point about the sql I posted above, if fetching the data as an associative array, you would need to use alias name(s) for the same name columns being selected, so that you can reference each column in the fetched data.

I have to be honest that I am a little confused about the two school tables…tbl_school and tbl_school_names.

tbl_school.id and tbl_school.name change, don’t they?

tbl_school

id name
1 Memphis State ( for year 1996 )
2 Memphis ( for year 1997 )

Wouldn’t I just put a school_id that is the same for both of them in tbl_school?

That way, I can just say:
SELECT DISTINCT tbl_school.name
FROM tbl_school
WHERE tbl_school.school_id = 1
ORDER BY tbl_school.school_year;

(( I am not sure this sql works or not. Just TRYING to give you an idea of where my mind is. ))

And if Memphis State and Memphis both have 1 as their school_id, it would list all the names the school has been called in the past?

Since the name is a ‘variable’ it wouldn’t exist in the school table. The school known as Memphis State/Memphis is the same entity, it only has one row/id in the school table. If your confusion is due to my blanket Yes given to that list of tables, sorry.

I really appreciate you working with me.
Would it be too much to ask how you would create the two tables that are associated with schools, please?

For example, If in tbl_school id 1 is Memphis State 1996 and previous years, would id 1 change to Memphis for 1997-onward?

The school table would have - id, state, city, established date, and any other unique value columns, to identify the school -

id state city    est_date
1  TN    Memphis 1912-09-10

The school_name table, for the entries you are mentioning (I’m not sure we are talking about the same school), would be -

id school_id name          start_year  end_year
1  1         Memphis State 1957        1996
2  1         Memphis       1997        9999

To find all the names when you know the school id, you would use it in a WHERE clause to query the school_name table to find all matching name values.

SELECT name
 FROM school_name
 WHERE school_id = 1
 ORDER BY start_year

If don’t know the id, you would write a query to match what pieces of information you do have, such as all schools in TN, all schools in Memphis TN, all schools that start with the letter ‘m’ or have the word Memphis in their current name, all schools that start with the letter ‘m’ or have the word Memphis in any name, … This would give you a result containing the matching school id(s), that you can then use to query to get all the names for each found school id.