SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,718
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)

    Adding a related table count to results

    I was going to create this post to get some help, but I might not need to.

    I want to get some people from my database, and to include with their information a separate field that sums the number of publications each person appears in.

    For example:

    Code:
    ID First Last PubCount
     1 Tom   Jones       6
     2 Joe   Bloggs      8
    So I think that it might be as easy as creating the two tables separately, then combining the two together. Thoughts of sugar plums, and inner joins with sub queries go running through my head.

    So the basic query is going to be:

    Code SQL:
    SELECT ID, FirstNames, Surname, PubCount
    FROM Person
      INNER JOIN (
        ...
      ) AS Publications
        ON Person.ID = Publications.PersonID

    with some magic happening in the subquery part.

    I know that I want to select the count of publications for each person, so I am reminded of the GROUP command. Could that work?

    Code SQL:
    SELECT PersonID, COUNT(PersonID) AS PubCount
    FROM PublicationPerson
    GROUP BY PersonID

    There are going to be duplicates within that, but they should be weeded out by the inner join.

    Code SQL:
    SELECT ID, FirstNames, Surname, PubCount
    FROM Person
      INNER JOIN (
        SELECT PersonID, COUNT(PersonID) AS PubCount
        FROM PublicationPerson
        GROUP BY PersonID
      ) AS Publications
        ON Person.ID = Publications.PersonID

    It seems to run, and not too badly too. But I'm not happy about the inner join of the subquery - it might be an expensive operation.

    With nearly 100,000 people and 250,000 publications, it takes 0.13 seconds to get 10,000 people.

    What if the the count was pulled up to the main select, so that the subquery isn't needed?

    Code SQL:
    SELECT Person.ID, FirstNames, Surname, COUNT(PersonID) AS PubCount
    FROM Person
      INNER JOIN PublicationPerson
        ON Person.ID = PublicationPerson.PersonID
    GROUP BY PersonID

    For 10,000 results it's 5 times faster, taking only 0.028 seconds.

    So ultimately what I'm asking is: whether the second technique is the better one. Is plain speed a better determining factor than other factors?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    i actually like the first one, where you do the count in a subquery

    it's more flexible, as it allows future joins to other related tables without compromising the count -- for example, suppose you also wanted to count the number of subscriptions people have along with their publications

    in the second method, the simple join with the GROUP BY in the outer query, you'd get cross join effects

    mind you, if there is a difference in execution speed (assuming, of course, that there are the appropriate indexes on all tables), it's always practical to go with the faster one until such time as further enhancement to the query is actually required
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,718
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by r937 View Post
    i actually like the first one, where you do the count in a subquery

    it's more flexible, as it allows future joins to other related tables without compromising the count -- for example, suppose you also wanted to count the number of subscriptions people have along with their publications
    Thanks for that advice, for I am (engage shudders of horror) taking over from a departed developer.

    This is some of the code that I've been left with:

    Code SQL:
    (
      SELECT Person.Id, FirstNames, Surname, NeeName, AlsoKnownAs, NeeName, FormerName, DeathDate, BirthDate, Age, AgeMeasure, Country.Name AS Country, CityTown.Name AS CityTown, c2.Name AS FormerCityTown, FuneralHome.Name AS FuneralHome, Charity.Name AS Charity
      FROM Person
        LEFT JOIN Country ON Country.Id = Person.CountryId
        LEFT JOIN CityTown ON CityTown.Id = Person.CityTownId
        LEFT JOIN CityTown AS c2 ON Person.FormerCityTownId = c2.Id
        LEFT JOIN Charity ON Charity.Id = Person.CharityId
        LEFT JOIN FuneralHome ON FuneralHome.Id = Person.FuneralHomeId
      WHERE Person.AddedOn > DATE_SUB( NOW( ) , INTERVAL ' . $period . ' )
    )
    UNION DISTINCT
    (
      SELECT Person.Id, FirstNames, Surname, NeeName, AlsoKnownAs, NeeName, FormerName, DeathDate, BirthDate, Age, AgeMeasure, Country.Name AS Country, CityTown.Name AS CityTown, c2.Name AS FormerCityTown, FuneralHome.Name AS FuneralHome, Charity.Name AS Charity
      FROM (
        SELECT PersonId
        FROM PublicationPerson
        WHERE AddedOn > DATE_SUB( NOW( ) , INTERVAL ' . $period . ' )
      ) AS RecentPublications
      INNER JOIN Person ON Person.Id = RecentPublications.PersonId
      LEFT JOIN Country ON Country.Id = Person.CountryId
      LEFT JOIN CityTown ON CityTown.Id = Person.CityTownId
      LEFT JOIN CityTown AS c2 ON Person.FormerCityTownId = c2.Id
      LEFT JOIN Charity ON Charity.Id = Person.CharityId
      LEFT JOIN FuneralHome ON FuneralHome.Id = Person.FuneralHomeId
    )
    ORDER BY Surname ASC , FirstNames ASC

    I'm hopeful that the following is a better rendition of the above, where the publication people are instead joined on, and included by the use of NOT NULL

    Code SQL:
    SELECT Person.Id, FirstNames, Surname, NeeName, AlsoKnownAs, NeeName, FormerName, DeathDate, BirthDate, Age, AgeMeasure, Country.Name AS Country, CityTown.Name AS CityTown, FormerCityTown.Name AS FormerCityTown, FuneralHome.Name AS FuneralHome, Charity.Name AS Charity
    FROM Person
      LEFT OUTER JOIN Country ON Country.Id = Person.CountryId
      LEFT OUTER JOIN CityTown ON CityTown.Id = Person.CityTownId
      LEFT OUTER JOIN CityTown AS FormerCityTown ON Person.FormerCityTownId = FormerCityTown.Id
      LEFT OUTER JOIN Charity ON Charity.Id = Person.CharityId
      LEFT OUTER JOIN FuneralHome ON FuneralHome.Id = Person.FuneralHomeId
      LEFT OUTER JOIN (
        SELECT DISTINCT PersonId
        FROM PublicationPerson
        WHERE AddedOn > DATE_SUB( NOW( ) , INTERVAL 7 DAY )
      ) AS RecentPublications ON RecentPublications.PersonId = Person.Id
    WHERE
      Person.AddedOn > DATE_SUB( NOW( ) , INTERVAL 7 DAY ) OR
      RecentPublications.PersonId IS NOT NULL
    ORDER BY Surname ASC , FirstNames ASC

    To which, the inclusion of publication counts should be more usefully achievable.

    When it comes to more expansive queries such as this, what sort of performance information would it be useful for me to learn about?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    regarding performance, you should learn how to read and interpret the results of EXPLAINs

    by the way, those LEFT OUTER JOINs should all be INNER JOINs (except for the one to publications, as a person might not have any)

    LEFT OUTER JOIN allows, of course, for the possibility that the row from the left table has no matching row in the right table, and you want the left table row returned anyway

    consider this --
    Code:
    Person LEFT OUTER JOIN Country 
    ON Country.Id = Person.CountryId
    the LEFT OUTER JOIN here says that you have a person with a countryid, and you want that person returned even if that person's countryid doesn't exist in the country table

    that's a sure sign of a problem, wouldn't you say?



    Off Topic:

    i just noticed quirksmode in your sig... do you know ppk? if so, say hello for me
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,718
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by r937 View Post
    consider this --
    Code:
    Person LEFT OUTER JOIN Country 
    ON Country.Id = Person.CountryId
    the LEFT OUTER JOIN here says that you have a person with a countryid, and you want that person returned even if that person's countryid doesn't exist in the country table

    that's a sure sign of a problem, wouldn't you say?
    How much of a problem is it where the country is not known?

    Which leads me to another question. Is it considered better to have null values for unknown or unavailable data, or to have a placeholder value (0), in the table being joined to the right?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,276
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by pmw57 View Post
    How much of a problem is it where the country is not known?
    i guess that would depend on the business rules for the application, whether it's okay to accept a person into the database without a country

    and it is by far better to use NULL instead of a dummy placeholder

    for one thing, you will never be able to actually declare a true foreign key with a dummy value
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •