Help with join query?

Hello all, I have a query I am using to retrieve all posts of a specific type, status and date. The query is dynamic in that one of these things doesn’t need to be present and the query will omit them.

I would like to extend this query to also allow the search to be refined by the category to which the post belongs but this will involve a join as there are 2-3 tables involved.

POSTS table that contains the post, it’s status and type.
RELATIONSHIPS which contains the id of the post and the id of the category
CATEGORIES which contains category details.

I’m not 100% that I need to join it to categories since I have the category_id from $_GET and the category_id in RELATIONSHIPS, but either way I am struggling to think how to create an appropriate join where I can include all of the other search specifics and the join. At it’s most specific, my query is something like this:

SELECT * FROM posts WHERE post_type = 'post' AND post_status = 'draft' AND YEAR(FROM_UNIXTIME(date_created))=2011 AND MONTH(FROM_UNIXTIME(date_created))=12 LIMIT 10

I think the query I want would be something like:

SELECT * FROM p.posts 
	
			LEFT OUTER JOIN relationships AS r 
							ON r.category_id = ".$_GET['category_id']. "
						LEFT OUTER JOIN categories AS c
							ON c.category_id = r.category_id 
							
							
			WHERE p.post_type = 'post' AND p.post_status = 'draft' 
				AND YEAR(FROM_UNIXTIME(p.date_created))=2011 AND MONTH(FROM_UNIXTIME(p.date_created))=12 LIMIT 10"

Which does seem to be working, but it’s also returning 5000 results when I only have a few hundred posts, it duplicates them all in massive quantities, so I’m doing something wrong. I tried the following, adding a “p.” identifier for the post table:

SELECT COUNT( * ) AS num_rows
FROM posts AS p
LEFT OUTER JOIN relationships AS r ON r.category_id =1
LEFT OUTER JOIN categories AS c ON c.category_id = r.category_id
WHERE p.post_type =  'post'
AND p.post_status =  'draft'
AND YEAR( FROM_UNIXTIME( date_created ) ) =2011
AND MONTH( FROM_UNIXTIME( date_created ) ) =11

But this just fails because of a syntax error.

SELECT * FROM posts AS p LEFT OUTER JOIN relationships AS r ON r.category_id = 1 AND r.post_id = p.post_id LEFT OUTER JOIN categories AS c ON c.category_id = r.category_id WHERE p.post_type = 'post' AND p.post_status = 'draft' AND YEAR(FROM_UNIXTIME(date_created))=2011 AND MONTH(FROM_UNIXTIME(date_created))=11 LIMIT 10

Tried the above and it seems to fix it a little, only 22 results, but it is returning posts where it says the category_id is NULL even though it has a correctly assigned number in RELATIONSHIPS, so I’m still doing something wrong.

before we get started, a minor pont – in any query that involves more than one table, it is imperative (for your sanity) that you qualify every single column mentioned in the query with its table name or table alias

i think what you’re “doing wrong” is counting rows, rather than column values
compare the two counts from this query results of this –

SELECT COUNT(*)             AS row_count
     , COUNT(c.category_id) AS category_count
  FROM posts AS p 
LEFT OUTER 
  JOIN relationships AS r 
    ON r.post_id = p.post_id 
   AND r.category_id = 1       -- looking for only one category
LEFT OUTER 
  JOIN categories AS c 
    ON c.category_id = r.category_id 
 WHERE p.post_type = 'post' 
   AND p.post_status = 'draft' 
   AND YEAR(FROM_UNIXTIME([COLOR="#FF0000"]p.[/COLOR]date_created))=2011 
   AND MONTH(FROM_UNIXTIME([COLOR="#FF0000"]p.[/COLOR]date_created))=11

>< I missed the two date_created when I was adding “p.” to the fields. Your query does seem to solve the problem, it returns:

row_count 22
category_count 10

But how would I do this for just a general query where I want the information returned instead of a count?


SELECT * 
FROM posts AS p
LEFT OUTER JOIN relationships AS r 
ON r.post_id = p.post_id
AND r.category_id =1
LEFT OUTER JOIN categories AS c 
ON c.category_id = r.category_id
WHERE p.post_type =  'post'
AND p.post_status =  'draft'
AND YEAR( FROM_UNIXTIME( p.date_created ) ) =2011
AND MONTH( FROM_UNIXTIME( p.date_created ) ) =11

As this still returns 22 rows, including 12 with NULL info. I’m not very clear what part of the SQL is returning the NULL rows. I tried replacing the OUTER with INNER joins but that just throws an SQL Syntax error.

I instead tried this, replacing the OUTER JOIN between posts and relationships with an INNER one to force it to abandon those NULL rows (hopefully):

SELECT * 
FROM posts AS p
INNER JOIN relationships AS r 
ON r.post_id = p.post_id
AND r.category_id =1
LEFT OUTER JOIN categories AS c 
ON c.category_id = r.category_id
WHERE p.post_type =  'post'
AND p.post_status =  'draft'
AND YEAR( FROM_UNIXTIME( p.date_created ) ) =2011
AND MONTH( FROM_UNIXTIME( p.date_created ) ) =11

And that returns 10 rows and seems to be returning the correct rows, but I’d appreciate input on whether this would be an acceptable SQL entry. I get lost pretty easily with JOINs. I assume it might also be more efficient to specify what fields I want from POSTS so it only checks CATEGORIES and doesn’t return those fields?

first of all, for an inner join, every row in the result set corresponds to a match between rows of the tables being joined, whereas for an outer join, all the rows in the result set correspond to rows of one of the tables, with or without matching data from the other

so that’s the main difference between inner and outer

as for LEFT OUTER JOIN and RIGHT OUTER JOIN, these are similar in function, with the only difference being which table is the one that all rows are returned for

often, an outer join is used for an optional relationship

let’s illustrate with an example

compare these two queries –

SELECT * 
  FROM posts AS p
[COLOR="#0000FF"]LEFT OUTER[/COLOR]
  JOIN relationships AS r 
    ON r.post_id = p.post_id
   AND r.category_id =1
 WHERE p.post_type = 'post'
   AND p.post_status = 'draft'
   AND p.date_created >= UNIX_TIMESTAMP('2011-11-01')
   AND p.date_created  < UNIX_TIMESTAMP('2011-12-01')
SELECT * 
  FROM posts AS p
[COLOR="#0000FF"]INNER[/COLOR] 
  JOIN relationships AS r 
    ON r.post_id = p.post_id
   AND r.category_id =1
 WHERE p.post_type = 'post'
   AND p.post_status = 'draft'
   AND p.date_created >= UNIX_TIMESTAMP('2011-11-01')
   AND p.date_created  < UNIX_TIMESTAMP('2011-12-01')

if your intention is to return posts only in category 1, you’d use the inner join

but if your intention is to show all qualified posts (as filtered by the WHERE clause), with or without an indication that they belong to category 1, then you’d use the left outer join

but if you use the inner join, then when you go to add the next join in the query, to the categories table, you wouldn’t use a left outer join because it’s impossible for a row to exist in the relationships table that doesn’t have a matching row in the categories table

well, impossible if you utilize relational integrity, but that’s a separate thread :wink:

note: i changed your date range criteria to allow optimization (whenever you apply a function to a table column in the WHERE clause, the database engine cannot use an index and has to scan the entire table)

Yes, my intention is that if the user wants to further filter the posts they are seeing by the category to which they belong, they will be shown posts only in that specific category, plus obviously the other limiters like date.

Thanks for explaining the JOINs, SQL really needs a site like php.net that just flat out says what things do and how they are meant to be implemented. This query appears to be working excellently, nice that for once I wasn’t mega far away from the answer. Thanks as always for the prompt help r937!

http://dev.mysql.com/doc/refman/5.5/en/

I agree, it isn’t always easy to find what you’re looking for (in php.net that’s much easier IMO) but it’s all there (or almost).

Thank you for that Guido. Often when I search for things related to MySQL I get w3schools (which is fine but not very detailed on everything) and then just technical support forum postings of people also having issues with SQL :stuck_out_tongue: whereas if I google implode, explode, is_array, or whatever, it almost always immediately gives me php.net (which isn’t perfect but it’s there at least). I will save your link and have a quick read through the pages.