rich, read the question again
What if a developer needs to return every column and is sure of it?
he's asking if SELECT * requires a table scan
| SitePoint Sponsor |


rich, read the question again
What if a developer needs to return every column and is sure of it?
he's asking if SELECT * requires a table scan



Ok looks like I'm not being clear. Let me do this by example:
Suppose a table has 5 columns, a/b/c/d/e.
There are two ways to do a query:
1. "select * from table where b=5"
2. "select a,b,c,d,e from table where b=5"
Which will execute faster? Will there be a performance difference?


i believe i understood you the first time
2. will always be as fast or faster than 1.
you should never use 1.


If this is true, that there is a perceptible performance difference between these two statements, then someone has really written a poor optimizer somewhere, and * should be removed from the syntax.
Here is a good article on how a sargable predicate should be analyzed, and in practice a good optimizer should make the syntax of the SQL transparent to the RDBMS engine. In fact, that is the whole purpose of the optimized and the relational language. Whether or not MySQL has a good optimizer or not, I do not know from practical experience. However, I do have practical experience with many other RDBMS optimizers and using a * should be acceptable, especially if end-users are going to access the RDBMS.
http://www-128.ibm.com/developerwork...kuznetsov.html
Rich


Yes, I did say perceptible. My guess is that it is totally insignifcant since the table defintions should be cached as should the SQL parser logic. But if someone has some hard numbers, I would be very interested. I would estimate the difference can be measured in nanoseconds. If someone really believes there is a meaningful difference, then I will go over to the MySQL support forum to verify. I would be pretty surprised indeed.
Rich


asterix!! where have you been, man, what a long hiatus!!! nice to see you back again
actually, i believe the optimizer also checks each column name when you write SELECT a,b,c,..., because otherwise you could write SELECT a,x,c,... where x is not a column in the table
that the dreaded, evil "select star" should be removed from the syntax is wishful thinking, that's not gonna happen any sooner than microsoft gets rid of support for =* and *= or oracle gets rid of support for (+)
meanwhile, we can all do our part and advise anyone and everyone, at each opportunity, not to fall victim to the dreaded, evil "select star"
![]()


There is certainly nothing at all evil about a *, however, for logical data independence, there may be reasons to enumerate the columns as opposed to using the *. However, this has nothing to do with performance.
Here is one good article that explains the issues fairly well:
http://searchoracle.techtarget.com/a...301455,00.html
As a matter of good programming practice, I think it is well advised to enumerate columns, but if you need all the columns, there is no difference in performance. I think that this article is a bit over the top in calling the * "evil", and note that in the context of the question that is being responded to in this thread, the article specifically states:
"That said, if you do actually need all columns, then SELECT * is fine, provided that you never change the table. Or the query."
Rich


thank you, thank you very much
i think i should change that picture, i'm definitely a lot older than that now
i'm gonna stand behind this as strongly as everOriginally Posted by me
IF you do actually need all columns, and
IF you never change the table, and
IF you never change the query...
... then i guess it's okay![]()
Hi Rudi, yeah, I took a break from spf, it was getting a bit, hmm, time consuming
Rich: well, I think that (for me personally) the problem with select * is that you are not controlling what happens when you join (many) tables:
select * from a
left join a using aid
left join b using bid
left join c using cid
....
gives you absolutely no idea about how many columns will be returned. It could be thousands, when all you really wanted were two or three. I have seen so many errors in applications because the developers used "select *", and then later added a column or joined a table.
So I tend to agree that select * is not so much a performance issue (usually) as it is a quality issue. If we don't care what columns the tables hold then really we should be asking if we need to use an RDBMS at all. Just some big object tree dumped as XML would probably be as useful and conceptually less messy![]()


Good article. :-) This issue is a good one to understand, but I think it should be better couched as a "logical data independence" issue, as opposed to a performance issue. Certainly, for ad-hoc user queries, there is little reason to be concerned, and therefore will probably remain in the syntax for very good reason. If an end-user or developer wants to see all of the columns in a table (as I often do), why have to enumerate every column?
Rich




I think you miss the point. The question was what query would guarantee that all data in all columns would be returned. The answer is SELECT * FROM table_name. This works, even if the underlying table structure has changed (e.g. more columns have been added), and the developer was not aware of this. The only other way to do this, would be to first read the catalog, get the column names, and then dynamically build the query based upon the result set.
In regard to your other statement that 2 will always be as fast or faster than 1, this may or may not be true depending upon whether the program first has to read the catalog (as many ad hoc query tools do), in order to satisfy the SELECT * FROM table_name requirement.
Rich




O.K. let's go back to this then:
1) LIKE '%anything' gives a table scan
Yes, this is true. '%anything' is not sargable.
2) LIKE 'anything%' doesn't
Absolutely not true. It depends upon whether the optimizer decides a table scan is more efficient depending upon the distribution of data (statistics that the optimizer keeps), and the number of rows that are maintained in a block. Often the optimizer will choose a table scan or default to a table scan if insufficient distribution data being collected.
3) the dreaded, evil "select star" should never be used, full stop
A matter of opinion. Clearly there is a place for '*' if an user wants to ensure that they are returning all columns from a table. It is used all the time, and has an obvious place in the SQL syntax. However, I would agree that unless there is a requirement to "always return all columns", then one should use enumerated column names.
Hope this clears things up. You posed all sorts of questions, such as performance issues, which have been all muddled up. But suffice to say, there are no performance issues when using '*".


you know rich, i was really tempted to leave it at that, so that you have the last word, but i wanted to add one more thing, so please do reply again, so that indeed you will have had the last word
here's what i wanted to add:
richrf says: using SELECT * will have "no performance issues... full stop"
![]()


I have to agree with the really old post from Asterix.
From a standpoint of someone who often creates custom interfaces for various database tables, I would say the practice of always using one auto-increment PK is a massive time saver.
One of the many reasons is that functions only return one value. When you want to query for an index, using a function (in PHP, for instance), what do you return if you have a compound key?
I understand that this is very easy to get around, I understand that there are many options for dealing with this situation, but the fact remains that having one single number that always references a specific field is simply easier to work with.
Not to mention that if you always use a single primary key, you can make your functions more "generic" and it is easier to reuse code.
While some might call my databases "bloated" or "slow" because they have this "extra" key, this design has saved me massive development time, and the performance hit is negligible, not even worth mentioning.
Along the same lines, SELECT * is a great tool when you are writing queries that work on tables where you don't know how many fields there are. The time it would take to query the database for the names and/or number of fields is certainly longer than just a single SELECT *.
While my tables might be faster if I used the above suggestions, the development time (and therefore cost) is more valuable to me, as well as my customers (who wants to pay for all this extra development time, to save .0001 seconds on your queries?).
The caveat is often used queries (which, if you are honest, are truly the outliers). In often used queries you really should take the time to optimize as much as possible.
For most of us the vast majority of the queries you write will only be executed occasionally and have no significant effect. It is those tables and queries that are critical to load time that actually have significant enough return on investment that it is worth optimizing.
I really think many developers undervalue their own time.
Bookmarks