SELECT DISTINCT
schools.id, schools.school, mascots.id, mascots.mascot
FROM
schools,
mascots
LEFT JOIN
schools s ON s.school_mascot = mascots.id
WHERE
schools.school_year = ‘2020’;
Shows this output. Why am I getting the same schools twice?
id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |1| Cowboys|
|1| Wyoming |2| Tigers|
|2| Memphis |2| Tigers|
When it should be:
id school id mascot
|1| Wyoming |1| Cowboys|
|2| Memphis |2| Tigers|
The problem is the above. This is joining every row in the schools table (2) with every row in the mascots table (2), resulting in 4 rows, i.e. every combination. This is referred to as a cartesian product, since there’s nothing in the query to give only the rows that have a relationship.
The LEFT JOIN you have added to the result of the above cartesian product part of the query, is just joining the schools table to that result again.
Going back a step, the naming and organization of your tables/columns is in need of help. A mascot is related to a school, not the other way around. A school can have none, one, or more than one mascot at any time and this can change over time (which is why I suspect you have the year comparison in the query.) The mascot table should have - id, school_id, name, start (school) year, end (school) year (for the current mascot, this should be set to a maximum permitted year value, 9999), and any other columns related to the existence of each mascot. If your schools.school and mascots.mascot columns are the school name and mascot name, the columns should be called name.
To get a list of schools and any mascots for any particular school year, the query would look like -
SELECT s.id, s.name, m.id, m.name
FROM schools s
LEFT JOIN mascots m ON s.id = m.school_id
WHERE 2020 BETWEEN m.start_year AND m.end_year
ORDER BY s.name, m.name
one small point – in the query you posted for the redesigned mascot table, the WHERE condition should be moved to the ON clause, otherwise the join functions as an inner join
Thanks for all the great advice. I am curious though if I am designing it right though. For example, Memphis was once called Memphis State. How would I be able to make the schools table pull up for example:
1995 Memphis State Tigers
1996 Memphis State Tigers
1997 Memphis Tigers
See how the name of the school changed from one year to the next?
Also, mascots also change:
2000 Arkansas State Indians
2001 Arkansas State Red Wolves
Would I be better having a linking table?
There are other factors too.
Coaches change all the time for each team and can coach for many different teams.
Players can play a couple years at one team then transfer to a new school.
Would linking tables be better suited for information handling such as this?
The school table would hold only the unique/one-time information about the school. This would establish a school id. You would have a school_name table, with - id, school_id, name, start date, and end date columns. (The mascot table above should probably use a full date, rather than just a year, then you would need to use a full date as the input value being compared against. To find information based on a school year, you use the full date that corresponds to the start date of the school year.)
The primary table for a coach or student, contains only the unique/one-time information. This establishes a coach id or a student id. You would have a coach_school table, with - id, coach_id, school_id, start date (a full date), end date (a full date), and any other columns needed to hold the unique values about the instance of a coach at a school. This establishes a coach school id, used to relate any other multiple same meaning data back to each instance of a coach at a school.
Repeat for any other information. All of this falls under a subject called data normalization.
So, would the tables you mentioned be similar to the ones below?
tbl_schools
id
name
tbl_school_names
id
school_id
name
start_year
end_year
tbl_mascots
id
school_id
name
tbl_coaches
id
name
tbl_coach_school
id
coach_id
school_id
start_year
end_year
tbl_players
id
name
tbl_player_school
id
player_id
school_id
start_year
end_year
I haven’t heard of Normalization. I didn’t know you could have so many tables!
Would JOINS on this many tables for one school or many be really extensive?
Thanks again for the help.
An additional point about the sql I posted above, if fetching the data as an associative array, you would need to use alias name(s) for the same name columns being selected, so that you can reference each column in the fetched data.
Since the name is a ‘variable’ it wouldn’t exist in the school table. The school known as Memphis State/Memphis is the same entity, it only has one row/id in the school table. If your confusion is due to my blanket Yes given to that list of tables, sorry.
The school table would have - id, state, city, established date, and any other unique value columns, to identify the school -
id state city est_date
1 TN Memphis 1912-09-10
The school_name table, for the entries you are mentioning (I’m not sure we are talking about the same school), would be -
id school_id name start_year end_year
1 1 Memphis State 1957 1996
2 1 Memphis 1997 9999
To find all the names when you know the school id, you would use it in a WHERE clause to query the school_name table to find all matching name values.
SELECT name
FROM school_name
WHERE school_id = 1
ORDER BY start_year
If don’t know the id, you would write a query to match what pieces of information you do have, such as all schools in TN, all schools in Memphis TN, all schools that start with the letter ‘m’ or have the word Memphis in their current name, all schools that start with the letter ‘m’ or have the word Memphis in any name, … This would give you a result containing the matching school id(s), that you can then use to query to get all the names for each found school id.
I didn’t know if I should make a NEW post or not, seeing as this is related to the above questions.
Page Output
New Coach: [Select Coach]
Coach: Tom Smith
Year Team W-L-T CW-CL PF PA
2020 - Memphis Tigers 8-0-0 5-0 250 67
2019 - Memphis Tigers 7-2-1 4-3 224 182
2018 - Arkansas State Red Wolves 11-2-0 8-0 359 24
Database Tables
Tbl_Coaches
ID 1
Coach Tom Smith
Tbl_Schools
ID 1
SchoolID 1
School Memphis
StartYear 1901
EndYear 2020
Linking_Tbl_Coaching
ID 1
CoachID 1 (Tom Smith)
TeamID 1 (Memphis)
Year 2020
Tbl_Games
ID 1
HomeTeam 1 (Memphis)
HomeCoach 1 (Tom Smith)
HomeScore 35
Year 2020
ConfGame 0 (Off for No: Boolean)
IsPlayed 1 (On for Yes: Boolean)
Playoff 0 (Off for No: Boolean)
GuestTeam 2 (Arkansas State)
GuestCoach 2 (Bob Wilson)
GuestScore 7
Would this take MANY Selects and Joins, Counting each game and scores for each coach?
To me this seems a daunting task and I may be making the table structures wrong.
It would need to be able to:
Count each game for each year and who the team was an the coach
It would need to read if it is played or not in order to COUNT the HomeScore and add them
along with the GuestScore and add them for a total
And several other things.
Would there be many queries on a single page to get all these results?
Personally I would store this sort of historical data in a fixed part of the linking table for ease of retrieval.
Why? Because the data is historical, the number of recalculations of the data are going to be minimal - you don’t really need the database to recalculate the sum of Tom’s team scores from 2018 over and over again, because the data isn’t going to change. Additionally, as your page is only showing the summations here, it makes for a much shorter query.
The CURRENT year’s stats, will certainly ‘change’ more frequently (most likely in a linearly ascending fashion as each row is additive), but does that additional calculation/update query outweigh the calculation required when retrieving the data, if, from the display, you don’t actually need the individual data?.