SitePoint Sponsor

User Tag List

Page 2 of 2 FirstFirst 12
Results 26 to 45 of 45
  1. #26
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    rich, read the question again

    What if a developer needs to return every column and is sure of it?

    he's asking if SELECT * requires a table scan
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  2. #27
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    rich, read the question again

    What if a developer needs to return every column and is sure of it?

    he's asking if SELECT * requires a table scan
    Yes, it someone wants to return every row and every column in that row then it will require a table scan. I thought, by your answer, that he was asking the question relative to the LIKE operator and how it uses indexes when there are wildcards involved.

    Rich

  3. #28
    SitePoint Zealot
    Join Date
    Nov 2006
    Posts
    127
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok looks like I'm not being clear. Let me do this by example:

    Suppose a table has 5 columns, a/b/c/d/e.

    There are two ways to do a query:

    1. "select * from table where b=5"

    2. "select a,b,c,d,e from table where b=5"

    Which will execute faster? Will there be a performance difference?

  4. #29
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    i believe i understood you the first time

    2. will always be as fast or faster than 1.

    you should never use 1.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #30
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    i believe i understood you the first time

    2. will always be as fast or faster than 1.

    you should never use 1.
    If this is true, that there is a perceptible performance difference between these two statements, then someone has really written a poor optimizer somewhere, and * should be removed from the syntax.

    Here is a good article on how a sargable predicate should be analyzed, and in practice a good optimizer should make the syntax of the SQL transparent to the RDBMS engine. In fact, that is the whole purpose of the optimized and the relational language. Whether or not MySQL has a good optimizer or not, I do not know from practical experience. However, I do have practical experience with many other RDBMS optimizers and using a * should be acceptable, especially if end-users are going to access the RDBMS.

    http://www-128.ibm.com/developerwork...kuznetsov.html

    Rich

  6. #31
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by richrf View Post
    If this is true, that there is a perceptible performance difference between these two statements, then someone has really written a poor optimizer somewhere, and * should be removed from the syntax.

    No.

    The reason why select * is slightly slower then select a,b,c is exactly because the optimizer has to work out what * actually means.

    Last edited by Jelena; Apr 3, 2007 at 04:50.

  7. #32
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by asterix View Post
    No.

    The reason why select * is slightly slower then select a,b,c is exactly because the optimizer has to work out what * actually means.
    Yes, I did say perceptible. My guess is that it is totally insignifcant since the table defintions should be cached as should the SQL parser logic. But if someone has some hard numbers, I would be very interested. I would estimate the difference can be measured in nanoseconds. If someone really believes there is a meaningful difference, then I will go over to the MySQL support forum to verify. I would be pretty surprised indeed.

    Rich

  8. #33
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    asterix!! where have you been, man, what a long hiatus!!! nice to see you back again

    actually, i believe the optimizer also checks each column name when you write SELECT a,b,c,..., because otherwise you could write SELECT a,x,c,... where x is not a column in the table

    that the dreaded, evil "select star" should be removed from the syntax is wishful thinking, that's not gonna happen any sooner than microsoft gets rid of support for =* and *= or oracle gets rid of support for (+)

    meanwhile, we can all do our part and advise anyone and everyone, at each opportunity, not to fall victim to the dreaded, evil "select star"

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #34
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    asterix!! where have you been, man, what a long hiatus!!! nice to see you back again

    actually, i believe the optimizer also checks each column name when you write SELECT a,b,c,..., because otherwise you could write SELECT a,x,c,... where x is not a column in the table

    that the dreaded, evil "select star" should be removed from the syntax is wishful thinking, that's not gonna happen any sooner than microsoft gets rid of support for =* and *= or oracle gets rid of support for (+)

    meanwhile, we can all do our part and advise anyone and everyone, at each opportunity, not to fall victim to the dreaded, evil "select star"

    There is certainly nothing at all evil about a *, however, for logical data independence, there may be reasons to enumerate the columns as opposed to using the *. However, this has nothing to do with performance.

    Here is one good article that explains the issues fairly well:

    http://searchoracle.techtarget.com/a...301455,00.html

    As a matter of good programming practice, I think it is well advised to enumerate columns, but if you need all the columns, there is no difference in performance. I think that this article is a bit over the top in calling the * "evil", and note that in the context of the question that is being responded to in this thread, the article specifically states:

    "That said, if you do actually need all columns, then SELECT * is fine, provided that you never change the table. Or the query."

    Rich

  10. #35
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by richrf View Post
    Here is one good article that explains the issues fairly well:

    http://searchoracle.techtarget.com/a...301455,00.html
    thank you, thank you very much

    i think i should change that picture, i'm definitely a lot older than that now


    Quote Originally Posted by me
    "That said, if you do actually need all columns, then SELECT * is fine, provided that you never change the table. Or the query."
    i'm gonna stand behind this as strongly as ever

    IF you do actually need all columns, and
    IF you never change the table, and
    IF you never change the query...

    ... then i guess it's okay
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #36
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Rudi, yeah, I took a break from spf, it was getting a bit, hmm, time consuming

    Rich: well, I think that (for me personally) the problem with select * is that you are not controlling what happens when you join (many) tables:

    select * from a
    left join a using aid
    left join b using bid
    left join c using cid
    ....

    gives you absolutely no idea about how many columns will be returned. It could be thousands, when all you really wanted were two or three. I have seen so many errors in applications because the developers used "select *", and then later added a column or joined a table.

    So I tend to agree that select * is not so much a performance issue (usually) as it is a quality issue. If we don't care what columns the tables hold then really we should be asking if we need to use an RDBMS at all. Just some big object tree dumped as XML would probably be as useful and conceptually less messy

  12. #37
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    thank you, thank you very much

    i think i should change that picture, i'm definitely a lot older than that now


    i'm gonna stand behind this as strongly as ever

    IF you do actually need all columns, and
    IF you never change the table, and
    IF you never change the query...

    ... then i guess it's okay
    Good article. :-) This issue is a good one to understand, but I think it should be better couched as a "logical data independence" issue, as opposed to a performance issue. Certainly, for ad-hoc user queries, there is little reason to be concerned, and therefore will probably remain in the syntax for very good reason. If an end-user or developer wants to see all of the columns in a table (as I often do), why have to enumerate every column?

    Rich

  13. #38
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by richrf View Post
    to see all of the columns in a table (as I often do), why have to enumerate every column?
    Better: if you want to list all the columns, why enumerate them using something horrible like

    Code:
    select * from mytable limit 1
    ?

    Wouldn't it be better to do something like

    Code:
    show columns from mytable
    (sort of assuming this is a mysql talk...)

  14. #39
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by asterix View Post
    Better: if you want to list all the columns, why enumerate them using something horrible like

    Code:
    select * from mytable limit 1
    ?

    Wouldn't it be better to do something like

    Code:
    show columns from mytable
    (sort of assuming this is a mysql talk...)
    Sorry, what I meant to say was to show all data in all columns. Clearly, to show just the column names, you would have to either interrogate the catalog or use a catalog command command as you suggested.

    I wonder if we have answered the question yet? :-)

    Rich

  15. #40
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    yup, hasn't happened yet

    who knows, though, maybe someday...

    I think you miss the point. The question was what query would guarantee that all data in all columns would be returned. The answer is SELECT * FROM table_name. This works, even if the underlying table structure has changed (e.g. more columns have been added), and the developer was not aware of this. The only other way to do this, would be to first read the catalog, get the column names, and then dynamically build the query based upon the result set.

    In regard to your other statement that 2 will always be as fast or faster than 1, this may or may not be true depending upon whether the program first has to read the catalog (as many ad hoc query tools do), in order to satisfy the SELECT * FROM table_name requirement.

    Rich

  16. #41
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by richrf View Post
    I think you miss the point. The question was what query would guarantee that all data in all columns would be returned. The answer is SELECT * FROM table_name.
    well, i think you missed the question

    "what query would guarantee that all data in all columns would be returned?"

    nobody asked that question

    except perhaps you

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  17. #42
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    well, i think you missed the question

    "what query would guarantee that all data in all columns would be returned?"

    nobody asked that question

    except perhaps you

    O.K. let's go back to this then:

    1) LIKE '%anything' gives a table scan

    Yes, this is true. '%anything' is not sargable.

    2) LIKE 'anything%' doesn't

    Absolutely not true. It depends upon whether the optimizer decides a table scan is more efficient depending upon the distribution of data (statistics that the optimizer keeps), and the number of rows that are maintained in a block. Often the optimizer will choose a table scan or default to a table scan if insufficient distribution data being collected.

    3) the dreaded, evil "select star" should never be used, full stop

    A matter of opinion. Clearly there is a place for '*' if an user wants to ensure that they are returning all columns from a table. It is used all the time, and has an obvious place in the SQL syntax. However, I would agree that unless there is a requirement to "always return all columns", then one should use enumerated column names.

    Hope this clears things up. You posed all sorts of questions, such as performance issues, which have been all muddled up. But suffice to say, there are no performance issues when using '*".

  18. #43
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    you know rich, i was really tempted to leave it at that, so that you have the last word, but i wanted to add one more thing, so please do reply again, so that indeed you will have had the last word

    here's what i wanted to add:

    richrf says: using SELECT * will have "no performance issues... full stop"

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  19. #44
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    you know rich, i was really tempted to leave it at that, so that you have the last word, but i wanted to add one more thing, so please do reply again, so that indeed you will have had the last word

    here's what i wanted to add:

    richrf says: using SELECT * will have "no performance issues... full stop"

    Yes, when compared to Select a, b, c, .... , as you suggested in your post. Both will read the whole table.

    Rich

  20. #45
    SitePoint Member
    Join Date
    Nov 2007
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have to agree with the really old post from Asterix.

    From a standpoint of someone who often creates custom interfaces for various database tables, I would say the practice of always using one auto-increment PK is a massive time saver.

    One of the many reasons is that functions only return one value. When you want to query for an index, using a function (in PHP, for instance), what do you return if you have a compound key?

    I understand that this is very easy to get around, I understand that there are many options for dealing with this situation, but the fact remains that having one single number that always references a specific field is simply easier to work with.

    Not to mention that if you always use a single primary key, you can make your functions more "generic" and it is easier to reuse code.

    While some might call my databases "bloated" or "slow" because they have this "extra" key, this design has saved me massive development time, and the performance hit is negligible, not even worth mentioning.

    Along the same lines, SELECT * is a great tool when you are writing queries that work on tables where you don't know how many fields there are. The time it would take to query the database for the names and/or number of fields is certainly longer than just a single SELECT *.

    While my tables might be faster if I used the above suggestions, the development time (and therefore cost) is more valuable to me, as well as my customers (who wants to pay for all this extra development time, to save .0001 seconds on your queries?).

    The caveat is often used queries (which, if you are honest, are truly the outliers). In often used queries you really should take the time to optimize as much as possible.

    For most of us the vast majority of the queries you write will only be executed occasionally and have no significant effect. It is those tables and queries that are critical to load time that actually have significant enough return on investment that it is worth optimizing.

    I really think many developers undervalue their own time.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •