Why is this sql showing the same school twice? New to JOINS

This sql:

schools.id, schools.school, mascots.id, mascots.mascot
schools s ON s.school_mascot = mascots.id
schools.school_year = ‘2020’;

Shows this output. Why am I getting the same schools twice?

id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |1| Cowboys|
|1| Wyoming |2| Tigers|
|2| Memphis |2| Tigers|

When it should be:

id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |2| Tigers|

Any help would be appreciated.

The problem is the above. This is joining every row in the schools table (2) with every row in the mascots table (2), resulting in 4 rows, i.e. every combination. This is referred to as a cartesian product, since there’s nothing in the query to give only the rows that have a relationship.

The LEFT JOIN you have added to the result of the above cartesian product part of the query, is just joining the schools table to that result again.

Going back a step, the naming and organization of your tables/columns is in need of help. A mascot is related to a school, not the other way around. A school can have none, one, or more than one mascot at any time and this can change over time (which is why I suspect you have the year comparison in the query.) The mascot table should have - id, school_id, name, start (school) year, end (school) year (for the current mascot, this should be set to a maximum permitted year value, 9999), and any other columns related to the existence of each mascot. If your schools.school and mascots.mascot columns are the school name and mascot name, the columns should be called name.

To get a list of schools and any mascots for any particular school year, the query would look like -

SELECT s.id, s.name, m.id, m.name
FROM schools s
LEFT JOIN mascots m ON s.id = m.school_id
WHERE 2020 BETWEEN m.start_year AND m.end_year
ORDER BY s.name, m.name

excellent analysis, @mabismad

one small point – in the query you posted for the redesigned mascot table, the WHERE condition should be moved to the ON clause, otherwise the join functions as an inner join

1 Like

Thanks for all the great advice. I am curious though if I am designing it right though. For example, Memphis was once called Memphis State. How would I be able to make the schools table pull up for example:

1995 Memphis State Tigers
1996 Memphis State Tigers
1997 Memphis Tigers

See how the name of the school changed from one year to the next?

Also, mascots also change:

2000 Arkansas State Indians
2001 Arkansas State Red Wolves

Would I be better having a linking table?

There are other factors too.
Coaches change all the time for each team and can coach for many different teams.
Players can play a couple years at one team then transfer to a new school.

Would linking tables be better suited for information handling such as this?

Thanks again for the help.

The school table would hold only the unique/one-time information about the school. This would establish a school id. You would have a school_name table, with - id, school_id, name, start date, and end date columns. (The mascot table above should probably use a full date, rather than just a year, then you would need to use a full date as the input value being compared against. To find information based on a school year, you use the full date that corresponds to the start date of the school year.)

The primary table for a coach or student, contains only the unique/one-time information. This establishes a coach id or a student id. You would have a coach_school table, with - id, coach_id, school_id, start date (a full date), end date (a full date), and any other columns needed to hold the unique values about the instance of a coach at a school. This establishes a coach school id, used to relate any other multiple same meaning data back to each instance of a coach at a school.

Repeat for any other information. All of this falls under a subject called data normalization.

1 Like

So, would the tables you mentioned be similar to the ones below?








I haven’t heard of Normalization. I didn’t know you could have so many tables!
Would JOINS on this many tables for one school or many be really extensive?
Thanks again for the help.



An additional point about the sql I posted above, if fetching the data as an associative array, you would need to use alias name(s) for the same name columns being selected, so that you can reference each column in the fetched data.

I have to be honest that I am a little confused about the two school tables…tbl_school and tbl_school_names.

tbl_school.id and tbl_school.name change, don’t they?


id name
1 Memphis State ( for year 1996 )
2 Memphis ( for year 1997 )

Wouldn’t I just put a school_id that is the same for both of them in tbl_school?

That way, I can just say:
SELECT DISTINCT tbl_school.name
FROM tbl_school
WHERE tbl_school.school_id = 1
ORDER BY tbl_school.school_year;

(( I am not sure this sql works or not. Just TRYING to give you an idea of where my mind is. ))

And if Memphis State and Memphis both have 1 as their school_id, it would list all the names the school has been called in the past?

Since the name is a ‘variable’ it wouldn’t exist in the school table. The school known as Memphis State/Memphis is the same entity, it only has one row/id in the school table. If your confusion is due to my blanket Yes given to that list of tables, sorry.

I really appreciate you working with me.
Would it be too much to ask how you would create the two tables that are associated with schools, please?

For example, If in tbl_school id 1 is Memphis State 1996 and previous years, would id 1 change to Memphis for 1997-onward?

The school table would have - id, state, city, established date, and any other unique value columns, to identify the school -

id state city    est_date
1  TN    Memphis 1912-09-10

The school_name table, for the entries you are mentioning (I’m not sure we are talking about the same school), would be -

id school_id name          start_year  end_year
1  1         Memphis State 1957        1996
2  1         Memphis       1997        9999

To find all the names when you know the school id, you would use it in a WHERE clause to query the school_name table to find all matching name values.

 FROM school_name
 WHERE school_id = 1
 ORDER BY start_year

If don’t know the id, you would write a query to match what pieces of information you do have, such as all schools in TN, all schools in Memphis TN, all schools that start with the letter ‘m’ or have the word Memphis in their current name, all schools that start with the letter ‘m’ or have the word Memphis in any name, … This would give you a result containing the matching school id(s), that you can then use to query to get all the names for each found school id.

1 Like

I didn’t know if I should make a NEW post or not, seeing as this is related to the above questions.

Page Output

New Coach: [Select Coach]

Coach: Tom Smith

Year Team W-L-T CW-CL PF PA
2020 - Memphis Tigers 8-0-0 5-0 250 67
2019 - Memphis Tigers 7-2-1 4-3 224 182
2018 - Arkansas State Red Wolves 11-2-0 8-0 359 24

Database Tables

ID 1
Coach Tom Smith

ID 1
SchoolID 1
School Memphis
StartYear 1901
EndYear 2020

ID 1
CoachID 1 (Tom Smith)
TeamID 1 (Memphis)
Year 2020

ID 1
HomeTeam 1 (Memphis)
HomeCoach 1 (Tom Smith)
HomeScore 35
Year 2020
ConfGame 0 (Off for No: Boolean)
IsPlayed 1 (On for Yes: Boolean)
Playoff 0 (Off for No: Boolean)
GuestTeam 2 (Arkansas State)
GuestCoach 2 (Bob Wilson)
GuestScore 7

Would this take MANY Selects and Joins, Counting each game and scores for each coach?
To me this seems a daunting task and I may be making the table structures wrong.
It would need to be able to:
Count each game for each year and who the team was an the coach
It would need to read if it is played or not in order to COUNT the HomeScore and add them
along with the GuestScore and add them for a total
And several other things.

Would there be many queries on a single page to get all these results?

Personally I would store this sort of historical data in a fixed part of the linking table for ease of retrieval.

Why? Because the data is historical, the number of recalculations of the data are going to be minimal - you don’t really need the database to recalculate the sum of Tom’s team scores from 2018 over and over again, because the data isn’t going to change. Additionally, as your page is only showing the summations here, it makes for a much shorter query.

The CURRENT year’s stats, will certainly ‘change’ more frequently (most likely in a linearly ascending fashion as each row is additive), but does that additional calculation/update query outweigh the calculation required when retrieving the data, if, from the display, you don’t actually need the individual data?.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.