Giving tables aliases

Hi there, I just wanted to work on some of sql query skills and I just needed help to better understand this query. Why is the table “world” from the query below being given an alias?

The query below works, it finds the largest country by area in each continent.

SELECT continent, name, area
FROM world x
WHERE area >= ALL (SELECT area 
                  FROM world y
                  WHERE x.continent = y.continent 
                  AND y.area > 0)

Thanks for any help, highly appreciated.

1 Like

Because you’re selecting from multiple instances of the same table, so it needs a way to differentiate the two tables in the where clause of the sub query.

That being said, that query looks VERY inefficient, especially if there are a large number of records for each continent. This seems like it should be more efficient as it will build a list of the largest areas for each continent before it returns the appropriate record, but @r937 will probably school me as to why I’m an idiot

SELECT w.continent
     , name
     , maxArea
  FROM world w
  JOIN (SELECT continent
             , MAX(area) AS maxArea
          FROM world
         WHERE area > 0
         GROUP by continent) SQ ON w.continent = SQ.continent AND w.area = SQ.maxArea
1 Like

i’m not an sql performance guy, i’m an sql language nuance guy

that said, i have to admit it wouldn’t have occurred to me to use the >= ALL ( subquery ) approach… it’s just too weird for me

anyhow, it might be interesting to see @jlisec01’s EXPLAINs and time trials…

2 Likes

I’m just going by sqlzoo questions for now. The query is still a bit confusing for me, especially the part x.continent = y.continent

1 Like

All that is is a hook for the SQL engine to know how to combine the two queries.

So basically what you have are two queries running, and you’re trying to match the results up,

  1. The sub-query is pulling a list of all continents and their areas which are greater than 0. This is pulling the information from the world table.
  2. In the main query, it’s searching for the name of the area which is the largest for each continent - it does this by taking each row and comparing it to ALL of the rows returned in the query/recordset from step 1. This is also looking at the world table since it’s where the information is.

Since #1 and #2 are pulling information from the same table (world), it needs to have a way to know how to compare them, and so needs to have a way to tell which recordset instance of world to look at. So the person who built that query chose to use y for the sub-query, and x to represent the instance in the main query.

It looks a little cludgy because you’re drawing from the same table both times, but it’s valid SQL. Just not the approach I personally would have taken.

I don’t know if this will help or not. This sql statement is looking to find the name of the largest area in each continent. The approach the developer took is just a little unorthodox.

Say you have data which looks like this:

Continent,Name,Area
Alpha,Beta,10
Alpha,Gamma,5
Alpha,Delta,15
Omega,Epsilon,12
Omega,Zeta,23
Omega,Upsilon,10

  1. First, the main query (X) looks at the first record (Beta area of Alpha continent). It then fires the sub query (Y) to find all entries for the Alpha continent that have an area > 0 (in this case three records) It then looks at the area for the Beta area of Alpha and compares it to each of the areas in the sub query. If it is greater than or equal to that value, it keeps it to return it to the final recordset. In this case, it will NOT keep it because there is an area which is larger.
  2. Rinse and Repeat for each record in the continent table.

Another goofy example is you and your buddy are trying to find the car with the best gas mileage for each car manufacturer. So you sit on one side of the table with one stack of brochures, and your buddy sits on the other side with the exact same stack of brochures, but for some reason his stack only has the names of the manufacturers and the gas mileages. So you take the top brochure off your stack (say the Ford Focus) and you ask your buddy “Hey Bob. Is 30MPG the best gas mileage for Ford?” He looks through his stack and looks at each Ford Brochure, makes a list of the gas mileages for Ford and tells you yes or no. If he says yes, you put the brochure on your left, otherwise you put it on your right. He then throws that list away.

You then take the next brochure from your stack and ask your buddy “Hey Bob, is 28 MPG the best gas mileage for Ford?” He once again looks through his stack and looks at each Ford Brochure, makes a list of the gas mileages for Ford and tells you yes or no then throws his list away again (guess he likes making lists). You repeat this process for each brochure in your stack.

Doesn’t make sense to do it that way, does it?

In the approach I gave you, Bob will take his stack and sort each manufacturer and put the brochure with the highest gas mileage for each manufacturer off to the side and throws the rest out. He then will take that list and tell you “Find the brochure for the Ford car which gets 32MPG”. You sort through your Ford Brochures and pull that one out and throw the rest away. Rinse and repeat per manufacturer.

not exactly

it’s a correlated subquery, and it doesn’t pull up all continents, just the one continent which happens to match the continent in the outer query

mind you, by the time the outer query goes through all its continents, the subquery will have returned all its continents too, but that’s not the point – a correlated subquery works by restricting the results of the subquery to only those rows which correlate to the outer query

this is also why a correlated subquery can often perform poorly – for each row of the outer query, you don’t want the correlated subquery executing a pass of the entire table, but sometimes that’s exactly the way it runs

in the given example, the “x” inside the subquery is the correlation variable, and the “y” alias isn’t strictly needed – as can be evidenced from the fact that it’s actually missing on the “area” in the subquery’s SELECT list

Grr. Thanks for catching that. Was trying to explain it as clearly as I could, but missed that obvious point :angry:

At least I can always count on you catching my stupidity :smiley:

there is no such thing as stupidity, other than an unwillingness to learn, which obviously is neither your problem nor mine

:purple_heart: :yellow_heart: :green_heart: :blue_heart: :heart:

3 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.