SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 27
  1. #1
    ☆★☆★ silver trophy vgarcia's Avatar
    Join Date
    Jan 2002
    Location
    in transition
    Posts
    21,235
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)

    SQL Server: TOP queries not available?

    This has been driving me crazy for a few weeks at work. Our instance of SQL Server doesn't seem to allow TOP queries (i.e. SELECT TOP 5 * FROM users ORDER BY entrydate). I've worked around it by using SET ROWCOUNT, but I'm wondering why it's not available in the first place, and if there's any way to re-enable it.

    It's a SQL Server 2000 database by the way. Any ideas?

  2. #2
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    unclear formulation....

    ORDER BY in views is not part of ANSI 92 SQL.

    A view represents a table, which itself represents a SET, which itself is unordered.

    Try it out in Query Analyzer instead, or this code

    Code:
       SELECT  TOP 5 *
     	   FROM 
      		  (
      			SELECT TOP 100 PERCENT * 
      				  FROM users ORDER BY entrydate
      		  ) AS DT

  3. #3
    ☆★☆★ silver trophy vgarcia's Avatar
    Join Date
    Jan 2002
    Location
    in transition
    Posts
    21,235
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)
    Okay, bad example of a query on my part. I just threw that up as an example.

    The problem is, TOP queries don't work at all on this server. I have no idea why. Other SQL Server instances in the company handle TOP queries just fine, but for some reason this one doesn't.

  4. #4
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, TOP itself is not part of the core ANSI standard, it is an SQL server extension since version 7.
    So you shouldn't be using it anyway

    You should check what compatability level the database you are using is set to. Maybe someone set it to 65 or 60.

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by asterix
    So you shouldn't be using it anyway
    this cannot go unquestioned

    what, pray tell, would you have us use instead?

    i know a couple of ways, but i want to hear your esteemed advice first

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    what, pray tell, would you have us use instead?

    i know a couple of ways, but i want to hear your esteemed advice first
    groan

    Well, I guess putting that information here might be useful for someone who comes to view this thread later.

    So, how do you get to use the first 5 records from your query's resultset, without using the syntax "select top 5 * from ..."? If you ever need to work with SQL Anywhere 5, or SQL Server v. 6.5 (and perhaps other databases too) you'll need to know this.

    But first a digression about TOP. It is only reasonably to use TOP in queries which return an ordered resultset. If someone says: "Of all the 250 capital cities in the cities table, give me the top 15." then it is reasonable to ask "On what basis do you want to decide which are the top 15?". It could be the first 15 in an alphabetical sorting of their name, or the top 15 based on population size, or any ORDER BY criteria on any attribute. But there must, reasonably, be a sort criteria in any SELECT statement which specifies top. And so here I will only use examples which ask for sorted data. In relational theory all datatsets contain unordered tuples, in RDBMS systems the underlying ordering of the rows in a table is (in theory) arbitrary.

    The first way, and without a doubt the worst possible way, is to simply code:

    Code:
     select dateofbirth, lastname 
        		  from  people
        		  order by dateofbirth desc, lastname asc
    And then, in the external schema (your application) you just discard all the rows you don't want to process. Of course the problem is that the DB may return 1.000.000 rows, and you only want 10, which is why you wanted to use TOP anyway

    The second way may not be possible in all DB systems. Using some kind of LIMIT n or SET ROWCOUNT n feature which your DB offers, you can force the database to only return some of the records which it has actually prepared for you. If you are really lucky, your RDBMS may actually only prepare n records for returning to you. But you can't assume that. If you write:

    Code:
     select dateofbirth, lastname 
       		  from  people
       		  order by dateofbirth desc, lastname asc
       		  Limit 100,200
    How can you be sure that the RDBMS didn't actually prepare all 1.000.000 rows before just returning the 100 rows you specified? (most likely the RDBMS didn't, of course, but theoretically it had to).

    So now we have solved the problem of limiting the resultset arbitrarily to n rows, but we need to look at that again.

    More fundamentally, the problem with "TOP 5" lies in specifying "5". Why do you want 5? Why is 5 better than 16? Or returning 2? There is probably no good reason, other than some arbitrary decision such as "we have 5 lines available in the report" or we will award only 200 people with the prize. This might make sense in the real world, but in the world of database theory systems all we care about is logic.

    There are serious problems involved with TOP when the data does not produce discrete values:

    Imagine the following data:

    name SUM(salesq1)
    ----- ------
    jones 52345
    smith 22345
    bake 90122
    brownc 22345
    green 00618
    ....

    and someone specifies "Give the names of the three salespeople with the most sales in Q1; they have won a week in Florida!".

    How are you going to satisfy that query? Yes, now you have to use: "select top 3 WITH TIES" but this is just bandaging up a wounded implementation of relational theory. There are not three, but four rows which must be in the resultset.

    So: specifying a fixed number of rows to return is arbitrary and should be avoided. It is much more beneficial to actually think about the data, which rows should be retuned, how they should be ordered and which columns you need, and not how many rows should be returned. Actually, people who only care about the top n rows should be using Excel or something, but not an RDBMS.

    And if you only want the first row? That's a common situation/ If the database supports it, most people would write:

    Code:
     select top 1 dateofbirth, lastname 
        		  from  people
        		  order by dateofbirth desc, lastname asc
    To find the youngest person whose name appears first alphabetically. But we can write that differently, and in a way which is compatible with the standards:

    Code:
     SELECT lastname, dateofbirth
     FROM people
     WHERE lastname =
     	(SELECT MIN(lastname)
     		FROM people where 
     		dateofbirth = (select max(dateofbirth) from people)
     	)
    We can use predicate logic, working on set theory, to return what we want. And if your database does not support subqueries then it is not standards compliant.

    The schema of the table I used is this:

    CREATE TABLE [people] (
    [lastname] [nvarchar] (32) NOT NULL,
    [dateofbirth] [smalldatetime] NOT NULL ,
    CONSTRAINT [PK_people] PRIMARY KEY CLUSTERED
    (
    [lastname],
    [dateofbirth]
    )
    )
    Last edited by asterix; Feb 23, 2005 at 02:20. Reason: tiepo

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    <SIGH />

    okay, it is worse than i feared

    you say TOP shouldn't be used, so i ask you for alternatives, and you don't give any!!!!

    your first "solution" is clearly not the right solution, since it returns everything, and everyone will agree this isn't the top 10

    your second solution actually begs the question

    you say "using some kind of feature which your DB offers" but this is exactly what LIMIT is for mysql and what TOP is for sql server

    you were supposed to be offering a non-specific solution instead of TOP

    did you misunderstand my question "what would you have us use instead?"

    because you haven't answered that question (except for TOP 1 employee with lowest lastname who has the latest birthdate, a ludicrous example)

    but the real killer is "Actually, people who only care about the top n rows should be using Excel or something, but not an RDBMS."

    this is just so preposterous i will not even dignify it with ridicule

    by the way, i'm not disagreeing with your points, which are stated masterfully (you really should write more, you're good at it)

    especially the part about ties, nice job

    that's the problem with LIMIT and ROWCOUNT -- they don't handle ties correctly

    at least with TOP, you can get correct results

    but the question remains, how would you get the top 10 results, let's say the top 10 employees by salary?

    and please, dumping the entire employees table into excel is not the right answer

    i do have a standard sql solution, but i'll give you one more attempt before i show it
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    okay, it is worse than i feared

    you say TOP shouldn't be used, so i ask you for alternatives, and you don't give any!!!!
    I'm gonna try really, really hard to be patient here. :'(
    I gave three alternatives with working code + descriptions.

    Quote Originally Posted by r937
    your first "solution" is clearly not the right solution, since it returns everything, and everyone will agree this isn't the top 10
    The first solution is entirely valid. If you remember the discussion about constraints / business rules from the other day, it is perfectly valid to use the application logic to discard all but the first 10 rows. It is a huge performance killer, yes, but it is still possible.

    And preassuming that "everyone will agree" with your point of view is a beginners mistake

    Quote Originally Posted by r937
    your second solution actually begs the question

    you say "using some kind of feature which your DB offers" but this is exactly what LIMIT is for mysql and what TOP is for sql server

    you were supposed to be offering a non-specific solution instead of TOP

    did you misunderstand my question "what would you have us use instead?"

    because you haven't answered that question (except for TOP 1 employee with lowest lastname who has the latest birthdate, a ludicrous example)
    I didn't misunderstand the question, I gave an alternative to using TOP: that is, use ROWCOUNT or LIMIT. These keywords are clearly not "TOP", so they are alternatives.

    Quote Originally Posted by r937
    but the question remains, how would you get the top 10 results, let's say the top 10 employees by salary?

    i do have a standard sql solution, but i'll give you one more attempt before i show it
    Is it a competition? well I don't mind competing because I would like to win something. Answering ludicrous questions like "list the top 10 employees by salary" is a favourite passtime of mine, people pay me to do it.

    Code:
    select LastName, Salary from (
     
     SELECT count(*) RecNum, e1.LastName, e2.salary
     FROM employees e1 join 
     	employees e2
     		  on e1.LastName >= e2.LastName
     
     	 group by e1.LastName, e1.salary
     	 order by salary desc
     ) i
     where i.recnum < 11
    What have I won?

  9. #9
    SitePoint Guru
    Join Date
    Jan 2004
    Location
    Uppsala, sverige
    Posts
    700
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    vgarcia:

    Have you checked the compatibility mode of your server instance? If its is set to 65 then top will not work.

  10. #10
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Compatability mode is not set at the server instance level, it is set at the database level.

    And I already suggested that strategy, see #4.

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by asterix
    I didn't misunderstand the question, I gave an alternative to using TOP: that is, use ROWCOUNT or LIMIT. These keywords are clearly not "TOP", so they are alternatives.
    oh, that's sad, and you are playing games

    excuse me for not quoting you perfectly and bolding the part that was important

    here, let me do it properly:
    ... not part of the core ANSI standard... So you shouldn't be using it anyway
    your objection was to TOP not being part of the standard, i asked for alternatives, and you come back with LIMIT and ROWCOUNT which aren't part of the standard

    how sad

    Is it a competition? well I don't mind competing because I would like to win something. Answering ludicrous questions like "list the top 10 employees by salary" is a favourite passtime of mine, people pay me to do it....
    What have I won?
    so far nothing, as this query contains a syntax error

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so far nothing, as this query contains a syntax error
    Do you actually expect me to test my code?

    Give me a minute.

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    and while we're at it, asterix, let's make sure your top 10 ansi-compatible query runs in sql server

    even after correcting your syntax error, i still can't get it to work

    (and i don't understand why you've got a theta join on lastname)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, I used the pubs database, so you can check it out.

    This answers the question: "List me the ten most expensive books, no more than 10 books, books tied for tenth place are disregarded".

    Code:
    USE pubs
    GO
    -- example using self join
    select price,title from
    	(SELECT count(*) RecNum,
    		   a.price, a.title
    		 FROM titles a , titles b
        	 where  cast(left(cast(a.price as varchar(10)) + a.title,4) as money) <= cast(left(cast(b.price as varchar(10)) + b.title,4) as money)
    		 group by a.title, a.price
    	) i
    where i.recnum <=10
    order by price desc
    GO
    Please don't come and say "ah, but the row numbers are discontinous!"

    What do I win?

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    why did you say "no more than 10, books tied for 10th place disregarded"?

    to quote someone whose logic is quite respectable, "So: specifying a fixed number of rows to return is arbitrary and should be avoided. It is much more beneficial to actually think about the data, which rows should be retuned, how they should be ordered and which columns you need, and not how many rows should be returned."

    "arbitrary and to be avoided"

    i would like to see the top 10 including ties

    you say they pay you for this type of query?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I went a little over the top on the where condition.
    I should have used

    Code:
    where  a.price<= b.price

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    okay, i shall put you out of your misery
    Code:
    select lastname,salary
      from employees X 
     where ( select count(*) 
               from employees  
              where salary > X.salary ) < 10
    order by salary desc
    feel free to adapt this to pubs, i'm not going to try to keep up to you always changing examples

    and by the way, you're welcome
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
     select title,price
       from titles X 
      where ( select count(*) 
     		   from titles  
     		  where price > X.price ) < 10
     order by price desc
    And this is also a self join in effect, just (much) more elegant than my query.
    And yes, I would like to see the books tied for 10th place too!
    (I don't know how to do that).

    Thanks!

  19. #19
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I attached the execution plan which compares bith queries.

    Surprisingly, SQL Server created different execution plans...
    Attached Images Attached Images

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by asterix
    And yes, I would like to see the books tied for 10th place too!
    that query does include ties over the 10th place

    i don't have PUBS so i can't test it on the titles

    feel free to test it on the data that i painstakingly constructed for your previous example

    create table employees
    ( id smallint not null primary key identity
    , lastname varchar(9)
    , salary integer
    )
    insert into employees (lastname,salary) values ('jones',5000)
    insert into employees (lastname,salary) values ('smith',16000)
    insert into employees (lastname,salary) values ('white',17000)
    insert into employees (lastname,salary) values ('brown',18000)
    insert into employees (lastname,salary) values ('o''toole',19000)
    insert into employees (lastname,salary) values ('smythe',15000)
    insert into employees (lastname,salary) values ('smart',11000)
    insert into employees (lastname,salary) values ('black',4000)
    insert into employees (lastname,salary) values ('stuart',23000)
    insert into employees (lastname,salary) values ('clark',11000)
    insert into employees (lastname,salary) values ('clarke',11000)
    insert into employees (lastname,salary) values ('wilson',14000)
    insert into employees (lastname,salary) values ('jones',13000)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    the point about the theta join is that it's quite expensive, right?

    my advice to people is to go ahead and use TOP 10 WITH TIES because it involves only a sort, it's relatively efficient (compared to the theta join), and it does give the "correct" answer (the same cannot be said for mysql's LIMIT, which operates strictly on row count), assuming that "correct" means that ties should be included

    an efficient query that involves proprietary syntax is often a better pragmatic solution than one which avoids proprietary syntax but runs poorly

    for comparison: returning the auto-increment value of the row just inserted can be solved with ansi sql, but most practioners would recommend using the built-in functions of the particular dbms
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  22. #22
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    i don't have PUBS so i can't test it on the titles
    Here is the PUBS create script. It is OK to distribute it for educational / research purposes
    Sorry about that, thats a lot of insert scripts which I didn't want to write.

    (Your) Schema:
    Code:
     create table employees
      ( id smallint not null primary key identity
      , lastname varchar(9)
      , salary integer
      )
      insert into employees (lastname,salary) values ('jones',5000)
      insert into employees (lastname,salary) values ('smith',16000)
      insert into employees (lastname,salary) values ('white',17000)
      insert into employees (lastname,salary) values ('brown',18000)
      insert into employees (lastname,salary) values ('o''toole',19000)
      insert into employees (lastname,salary) values ('smythe',15000)
      insert into employees (lastname,salary) values ('smart',11000)
      insert into employees (lastname,salary) values ('black',4000)
      insert into employees (lastname,salary) values ('stuart',23000)
      insert into employees (lastname,salary) values ('clark',11000)
      insert into employees (lastname,salary) values ('clarke',11000)
      insert into employees (lastname,salary) values ('wilson',14000)
      insert into employees (lastname,salary) values ('jones',13000)
    -- With ties
    Code:
    select lastname,salary
         from employees X 
        where ( select count(*) 
       		   from employees  
       		  where salary > X.salary ) < 10
       order by salary desc
       go
    -- Without ties
    Code:
    select lastname,salary from
       	(SELECT count(*) RecNum,
       		   a.salary, a.lastname
       		 FROM employees a , employees b
       		 where a.salary <= b.salary
       		 group by a.lastname, a.salary
       	) i
       where i.recnum <=10
       order by salary desc
    Attached Files Attached Files

  23. #23
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    the point about the theta join is that it's quite expensive, right?
    No, it is hardly any more expensive that what you had, almost the same.

    The subquery must be evaluated for every row in the intermediate table, which is just as bad on IO unless the DB does some RAM caching (which it does).

    The fastest solution is by far:
    select top 10 --<<with ties>>

    because SQL server truncates its intermediate table when the results can not possibly meet the criteria, and then returns the top 10.

  24. #24
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    actually, we both had theta joins, that was my point

    and yes, TOP 10 WITH TIES is the best solution

    but, according to you, we shouldn't be using it

    you asked earlier what you win

    well, as in a previous thread --
    http://www.sitepoint.com/forums/show...6&postcount=16
    http://www.sitepoint.com/forums/show...8&postcount=17

    ... i would definitely say you didn't win this thread either

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  25. #25
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •