Whats wrong with this SQL query

Hi,

I have the following SQL query, which is selecting from 10 different tables…


SELECT * FROM
                    ConfPaper c,
                    JournalArticle j,
                    EdBookChapter eb,
                    Thesis t,
                    Patent p,
                    Monograph m,
                    Unpublished u,
                    Misc ms,
                    Website w,
                    TechReport tr
                  LEFT JOIN Category ct
                  ON
                    ct.CTid = c.Pid AND ct.CTid = j.Pid AND ct.CTid = eb.Pid
                    AND ct.CTid = t.Pid AND ct.CTid = p.Pid AND ct.CTid = m.Pid
                    AND ct.CTid = u.Pid AND ct.CTid = ms.Pid AND ct.CTid = w.Pid
                    AND ct.CTid = tr.Pid
                  WHERE
                    c.CPdeleted = 0 AND j.JAdeleted = 0 AND eb.EBCdeleted = 0
                    AND t.THdeleted = 0 AND p.PTdeleted = 0 AND m.MOdeleted = 0
                    AND u.UNdeleted = 0 AND ms.MIdeleted = 0 AND w.WSdeleted = 0
                    AND tr.TRdeleted = 0
                 ORDER BY rand()
                 LIMIT 6

But i keep getting an error saying “Unknown column c.Pid” when this column DOES exist…?

Any ideas why i am having this problem, also can i improve this sql query in any way?

Thanks

my sincere advice: write a separate query for each table, and do not try to do it all in one query

the main reason is because you cannot join these tables, they are unrelated, so that approach to a single query is a non-starter

you ~could~ write a single query with a bunch of unions, but my advice to you is to avoid this because it will be too complicated for you

separate queries

that way you can easily cover off all the things you’re having trouble with

for example, you won’t have to worry about identifying which table something came from, because each query is tailored to a specific table

you can also then more easily control how to pick only 6 examples from 10 tables

remember what albert einstein said: make things as simple as possible, but never use the dreaded, evil “select star”

:cool:

Ok, so you have

  • ConfProceedings has a 1:N relation ConfPaper
  • JournalIssue has a 1:N relation JournalArticle
  • EditedBook has a 1:N relation EdBookhapter

and

  • Thesis
  • Patent
  • Monograph
  • Unpublished
  • Misc
  • Website
  • TechReport

are stand-alone (not related to any other tables) tables.

And now you want to pull 6 records at random from the collection of all rows in the parent and/or stand-alone tables.

Is that correct?

Nearly correct Scallio :wink:

I not only have Conference Proceedings, i have Journals and Edited Books, etc…

And they are not related to one another. Just to give you more of an insight, don’t want to be inappropriate as i don’t expect you to look through this, but check this document, it’s basically my database design…

http://www.prima.cse.salford.ac.uk:8080/~ibrarhussain/database.doc

You will see how the table structure is. ConfPaper, EdBookChapter, and JournalArticle are all part of a parent table ConfProceedings, EditedBook, and JournalIssue…

I have added more fields such as the <sometable>.deleted column etc but you can see what i am trying to create from this…

So from looking at this, can you see why i have the 10 tables?

I just need a method where i can get the records from any table easily, such as the main problem i am facing my very first post…

thank you, that explains it well

oh yes… a humoungous problem, which i already explained

:slight_smile:

so you’ve explained what you want, but not why

can you explain the business purpose of obtaining 6 random rows from 10 tables? what kind of a web page are the results going to appear on?

:slight_smile: Ok well it’s not so much as a business purpose, its just to give users access to different areas of the website, the purpose of the website is a Digital Library, where users can upload papers, journals, documents etc. So the main aspect of the site will be an advanced search. So when it comes to doing the search i can see myself having this same problem again because i won’t know which table to select from.

All of the records from the 10 tables are forms of documents that users will want to search for, so all i am doing is wanting to select a few to display on the website homepage, just so users can see whats being added and view something that may interest them.

How can i overcome this problem i am facing? And more importantly the initial query which ScallioXTX recommended, can i make this work?

Okay, I think I know what’s going on. You have a database of conference papers and the papers can be on different subjects and each entry in the ConfTable table references the table that is specific to that subject? Am I correct?

If so, I’d first of all get rid of all the <sometable>.deleted columns and create a deleted column in the ConfPaper table. Storing the deleted column for each and every table separately just doesn’t make any sense.

When you’ve done that run the following query:

SELECT
   something
 , anything
 , just_not_the_dreaded_star
FROM
  ConfPaper cp
WHERE
  deleted=0
ORDER BY
  rand()
LIMIT
  6

Then, from the results of that table fire a new query for each of the results to fetch the data from the table specific for the subject the conference paper is on.
Normally I would advice against this approach because you need one main query plus n (which is 6 your case) additional queries to fetch related data but, as Rudy pointed out, creating a JOIN spanning 10 tables is a Bad Idea ™.

Ok let me explain, i know how LEFT & INNER JOINS work, but i don’t see any other way of doing this?

Basically all i am trying to do RANDOMLY select 6 records from any of the 10 tables and LIMIT it to 6…

Thats all :confused:

In relation to how i am going about doing it, can any of you guys see a problem?

billy, what exactly are you trying to do?

do you know what a cross join is?

do you know how a LEFT OUTER JOIN works?

Hmmm, well the reason i am doing this is i am just SELECTING the 6 newest entries and want to display them on a page. I know maybe having 10 separate tables may not be the best way of storing the details, but i couldn’t have just one table because each table has it’s own specific fields…

Thanks, I will try to post an indented query next time, i’ve never posted like that, i always think it will all look wrong :confused:

Ok, That query works perfectly, however for each of the 10 tables, i have different fields, so i need to know somehow which table the results are from and then output from that table.

To give an example:


                     <?
                     $document = DocumentModel::selectNewestUploads();
                     while($row_newest = mysql_fetch_object($document)):
                            echo ""; //How do i know what to echo out??
                     endwhile;
                      ?>

So i call the selectNewestUploads method from the DocumentModel class using the code that you have suggested. But then i need to select from lets say the following 3 tables. (Didn’t want to show EVERY table as it would look rather confusing :))…

ConfPaper
CPid, CPtitle, CPinfo, CPother …

JournalArticle
JAid, JAname, JAdate, JAdesc …

Website
Wid, Wtitle, Wlastvisited

So when i want to now echo something out, how would i know which table? To crack this, i tried to add a Prefix to each table so i created a new table called Category, and in each of the 10 tables added a Pid, this can be seen in the LEFT JOIN in the query…


LEFT JOIN Category ct ON ct.CTid = c.Pid

But again i am stuck as if the results which are limited to 6 are from 6 different tables how do i know which tables and what to echo out?

Have i confused you :sick:

Thanks :wink:

there is something really, really wrong with that query

billy, perhaps you can describe in words what the query is supposed to do

as it stands, you are going to get a humoungous cross join – every row of every table matched with every row of every table… and then, a left outer join to match a category to the results, but only if it matches every single table’s Pid (whatever that is), but no fear, if there is no category that matches, return the humoungous cross join anyway, because after all it’s an outer join

for the sake of argument, let’s say that each table has only 5 rows, then the cross join will produce over a billion rows

The problem you’re having is that LEFT JOIN takes precedence over the , JOIN.
What MySQL does is first perform the following JOIN


  TechReport tr
LEFT JOIN
  Category ct
ON
  ct.CTid = c.Pid AND ct.CTid = j.Pid AND ct.CTid = eb.Pid
  AND ct.CTid = t.Pid AND ct.CTid = p.Pid AND ct.CTid = m.Pid
  AND ct.CTid = u.Pid AND ct.CTid = ms.Pid AND ct.CTid = w.Pid
  AND ct.CTid = tr.Pid

before performing the other JOINs. Since the only tables referenced in the JOIN above are tr and ct all references to other tables are not resolvable by MySQL.

To overcome this you should replace all , JOINs with INNER JOINs. Also, I’d incorporate the WHERE clause in the ON clause.

SELECT
   something
 , anything
 , just_not_the_dreaded_start
FROM
   ConfPaper c
      INNER JOIN
   JournalArticle j
      INNER JOIN
   EdBookChapter eb
      INNER JOIN
   Thesis t
      INNER JOIN
   Patent p
      INNER JOIN
   Monograph m
      INNER JOIN
   Unpublished u
      INNER JOIN
   Misc ms
      INNER JOIN
   Website w
      INNER JOIN
   TechReport tr
      LEFT JOIN
   Category ct
      ON
   ct.CTid = c.Pid AND c.CPdeleted = 0
      AND
   ct.CTid = j.Pid AND j.JAdeleted = 0
      AND
   ct.CTid = eb.Pid AND eb.EBCdeleted = 0
      AND
   ct.CTid = t.Pid AND t.THdeleted = 0
      AND
   ct.CTid = p.Pid AND p.PTdeleted = 0
      AND
   ct.CTid = m.Pid AND m.MOdeleted = 0
      AND
   ct.CTid = u.Pid AND u.UNdeleted = 0
      AND
   ct.CTid = ms.Pid AND ms.MIdeleted = 0
      AND
   ct.CTid = w.Pid AND w.WSdeleted = 0
      AND
   ct.CTid = tr.Pid AND tr.TRdeleted = 0
ORDER BY
   rand()
LIMIT
   6

As for optimizing, don’t use the evil * in the SELECT clause, but actually write out the fields you need, and incorporate the WHERE clause in JOIN creteria (ON clause) like I did above.
Without further knowledge of the tables I can’t make any more suggestions beside try an EXPLAIN for the query to see if the indices are correct.

PS. The next time you post a query make sure you use proper indenting to make it easier for others to read, like my query above. Thanks :slight_smile:

Ok, i appreciate the advice, which is why i have simply shown 3 main sections that pull out the latest documents. Fact of the matter is that it’s not an important aspect of the website, i could have just had plain text describing something, but i thought it would be good to get something out of the database…

As i was saying, the main functionality would be the Search function, which is what i need to work on next, and again i can see myself having the same problem. The search function can be seen on the homepage as one textbox which allows the user to do a search…

But as you guys know i have 10 tables to select from i won’t be able to do single queries for this??

How would you advise i tackle this?

Thanks

Let’s think about your idea for a few seconds.
You have data in ten different tables, and wish to present a random selection of six rows from them, in response to a query.

Why???

Why only six??? And why randomised??

If I was still doing research, I’d like to be able to look at everything, on a source by source basis, not six random items - the six could all be from just two or three tables, but the important items may all be in a different table or two, so I’d never find them, due to the huge number of combinations of random groups of six.

So just let me search on a source by source basis, and let me see all that each source has based on my search criteria.

So on a practical research basis, I’d never use your system.

You could make the opening page offer me a type of document/source to search, using a drop down list, and then insert my search criteria into the appropriate single query.

Or you could insert my criteria into each of the single queries and present the first five from each query on the same results page, making me then consider which type of source search is likely to be most useful.

Thanks everyone for all of your responses. After thinking about the importance of this query and the time it would take to spend on building such a complex query to select from all 10+ tables i have simply created 3 separate queries to select from only 3 tables…

However, the main importance of the website is it’s search facility, which will need to select through ALL of the 10+ tables, and i can’t use single queries for this. So what i would need to do is select from the tables where the records match the search criteria the user will input.

What would the best method be to do this?

To show you the website, visit this link:

http://www.prima.cse.salford.ac.uk:8080/~ibrarhussain/

And login as:

sitepoint@sitepoint.com
testing

You can see the firs thing you see is the search… :confused:

I’m trying to think of the best way to go about doing this baring in mind i have over 10 tables to select from… :slight_smile:

You could achieve it with a single query but it would be a disaster. This is three tables imagine 10+! You will be much better off hitting the database 10 times with simple selects individually I would think.


SELECT
     'ConfPaper' table_name
     ,c.CPid id
     ,c.CPtitle title
  FROM
     ConfPaper c
 WHERE
     c.deleted = 0
 ORDER
    BY
     RAND()
 LIMIT
     6
UNION ALL
SELECT
     'JournalArticle'
     ,j.JAid
     ,j.JAname
  FROM
     JournalArticle j
 WHERE
     j.deleted = 0
 ORDER
    BY
     RAND()
 LIMIT
     6
UNION ALL
SELECT
     'Website'
     ,w.Wid
     ,w.Wtitle
  FROM
     Website w
 WHERE
     w.deleted = 0
 ORDER
    BY
     RAND()
 LIMIT
     6

Ok, thats exactly right Scallio.

I have the tables set up all correctly, just need to build a method where i can select from each easily. If i was to use separate queries as r937 has pointed out it will be simple but if i could do this in one query and use the same method in other areas of the website also i.e. when it comes to the advanced search later it would help me out hugely.

So what do you guys suggest? Can i do this?

Thanks for your help and advice so far, really appreciate it.

no, it’s too complicated

use separate queries