rich, read the question again
What if a developer needs to return every column and is sure of it?
he's asking if SELECT * requires a table scan
Ok looks like I'm not being clear. Let me do this by example:
Suppose a table has 5 columns, a/b/c/d/e.
There are two ways to do a query:
1. "select * from table where b=5"
2. "select a,b,c,d,e from table where b=5"
Which will execute faster? Will there be a performance difference?
Here is a good article on how a sargable predicate should be analyzed, and in practice a good optimizer should make the syntax of the SQL transparent to the RDBMS engine. In fact, that is the whole purpose of the optimized and the relational language. Whether or not MySQL has a good optimizer or not, I do not know from practical experience. However, I do have practical experience with many other RDBMS optimizers and using a * should be acceptable, especially if end-users are going to access the RDBMS.
asterix!! where have you been, man, what a long hiatus!!! nice to see you back again
actually, i believe the optimizer also checks each column name when you write SELECT a,b,c,..., because otherwise you could write SELECT a,x,c,... where x is not a column in the table
that the dreaded, evil "select star" should be removed from the syntax is wishful thinking, that's not gonna happen any sooner than microsoft gets rid of support for =* and *= or oracle gets rid of support for (+)
meanwhile, we can all do our part and advise anyone and everyone, at each opportunity, not to fall victim to the dreaded, evil "select star"
Here is one good article that explains the issues fairly well:
As a matter of good programming practice, I think it is well advised to enumerate columns, but if you need all the columns, there is no difference in performance. I think that this article is a bit over the top in calling the * "evil", and note that in the context of the question that is being responded to in this thread, the article specifically states:
"That said, if you do actually need all columns, then SELECT * is fine, provided that you never change the table. Or the query."
i think i should change that picture, i'm definitely a lot older than that now
i'm gonna stand behind this as strongly as everOriginally Posted by me
IF you do actually need all columns, and
IF you never change the table, and
IF you never change the query...
... then i guess it's okay
Hi Rudi, yeah, I took a break from spf, it was getting a bit, hmm, time consuming
Rich: well, I think that (for me personally) the problem with select * is that you are not controlling what happens when you join (many) tables:
select * from a
left join a using aid
left join b using bid
left join c using cid
gives you absolutely no idea about how many columns will be returned. It could be thousands, when all you really wanted were two or three. I have seen so many errors in applications because the developers used "select *", and then later added a column or joined a table.
So I tend to agree that select * is not so much a performance issue (usually) as it is a quality issue. If we don't care what columns the tables hold then really we should be asking if we need to use an RDBMS at all. Just some big object tree dumped as XML would probably be as useful and conceptually less messy
In regard to your other statement that 2 will always be as fast or faster than 1, this may or may not be true depending upon whether the program first has to read the catalog (as many ad hoc query tools do), in order to satisfy the SELECT * FROM table_name requirement.
1) LIKE '%anything' gives a table scan
Yes, this is true. '%anything' is not sargable.
2) LIKE 'anything%' doesn't
Absolutely not true. It depends upon whether the optimizer decides a table scan is more efficient depending upon the distribution of data (statistics that the optimizer keeps), and the number of rows that are maintained in a block. Often the optimizer will choose a table scan or default to a table scan if insufficient distribution data being collected.
3) the dreaded, evil "select star" should never be used, full stop
A matter of opinion. Clearly there is a place for '*' if an user wants to ensure that they are returning all columns from a table. It is used all the time, and has an obvious place in the SQL syntax. However, I would agree that unless there is a requirement to "always return all columns", then one should use enumerated column names.
Hope this clears things up. You posed all sorts of questions, such as performance issues, which have been all muddled up. But suffice to say, there are no performance issues when using '*".
you know rich, i was really tempted to leave it at that, so that you have the last word, but i wanted to add one more thing, so please do reply again, so that indeed you will have had the last word
here's what i wanted to add:
richrf says: using SELECT * will have "no performance issues... full stop"
I have to agree with the really old post from Asterix.
From a standpoint of someone who often creates custom interfaces for various database tables, I would say the practice of always using one auto-increment PK is a massive time saver.
One of the many reasons is that functions only return one value. When you want to query for an index, using a function (in PHP, for instance), what do you return if you have a compound key?
I understand that this is very easy to get around, I understand that there are many options for dealing with this situation, but the fact remains that having one single number that always references a specific field is simply easier to work with.
Not to mention that if you always use a single primary key, you can make your functions more "generic" and it is easier to reuse code.
While some might call my databases "bloated" or "slow" because they have this "extra" key, this design has saved me massive development time, and the performance hit is negligible, not even worth mentioning.
Along the same lines, SELECT * is a great tool when you are writing queries that work on tables where you don't know how many fields there are. The time it would take to query the database for the names and/or number of fields is certainly longer than just a single SELECT *.
While my tables might be faster if I used the above suggestions, the development time (and therefore cost) is more valuable to me, as well as my customers (who wants to pay for all this extra development time, to save .0001 seconds on your queries?).
The caveat is often used queries (which, if you are honest, are truly the outliers). In often used queries you really should take the time to optimize as much as possible.
For most of us the vast majority of the queries you write will only be executed occasionally and have no significant effect. It is those tables and queries that are critical to load time that actually have significant enough return on investment that it is worth optimizing.
I really think many developers undervalue their own time.