SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 45
  1. #1
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    SQL and RDBMS Database Design DO's and DON'Ts

    This is a document I've been slowly adding to as I come across things 'in the wild'. Maybe it can be of some use?

    SQL and RDBMS Database Design DO's and DON'Ts
    ------------------------------------------
    Generic DOs:
    1. Develop a backup strategy before you encounter a catastrophic database failure, and TEST it regularly so you will know what to do when disaster strikes.
    2. Always sanitize your input from the user to strip out quotes for strings and non-numeric data for number types. URL
    3. Perform table normalization in your DESIGN phase. It is much easier to change your tables 'on paper' than when it is in production use. URL
    4. Choose datatypes which are logical and fit your model. Designating a US Social Security number as a CHAR( 255 ) is very wasteful since they will not exceed 11 characters (XXX-XX-XXXX). CHAR( 11 ) is a perfect match.
    5. Run every query through your RDBMS' query tuning tool and ensure that correct indexes are being used and that a table scan or Cartesian product (in a join) is not occurring when you do not want it to. MS SQL, <snip/> MySQL, Sybase


    Generic DON'Ts:
    1. Do not fall into the trap of what I like to call 'Auto_Increment Induced Insanity'. Not every table needs a sequential integer as the primary key. Design your primary key logically based upon your data, and if need be then consider a sequential integer if traditional primary key design fails.
    2. Do not try and join mismatched datatypes. For instance, if you have an INT as a primary key in one table and a CHAR( 10 ) as the foreign key in another, problems will occur when you try and join the two columns, usually manifesting as a table-scan on one of the mis-matched tables. Or to put it another way -- ensure datatypes match across tables.
    3. Remember queries which use wild cards such as SELECT * WHERE myCharColumn LIKE '%anything' will always result in a table scan, so only use leading wild cards when absolutely necessary.
    4. Don't create redundant indexes. In most, if not all, RDBMS', designating something as a PRIMARY KEY will also create a unique index on the column(s). Creating an additional index on the leading column will be redundant and a waste of space. For example, creating an index on Col1, Col2 will be used on queries of type: WHERE Col1 = 34 AND Col2 = 'Something'. It will also be used on queries such as WHERE Col1 = 123.
    5. Don't use the $dbms_seek() functions in PHP to simulate pagination of a result set. Use LIMIT in MySQL, TOP n in MS SQL, etc. to achieve true pagination.


    MySQL Specific DON'Ts:
    1. Do not enclose numeric values in quotes. This is very non-standard and ONLY works on MySQL. For example, WHERE someIntegerColumn = '1';. This also pertains to integer values in CREATE TABLE statements such as CREATE TABLE bob ( bobID INT DEFAULT '0' ) <-- bad. DEFAULT 0 is the correct method.
    2. <snip/> Do not use the INT( M ) syntax unless you are using the ZEROFILL MySQL proprietary SQL extension.


    edit: added another Don't
    Last edited by Mittineague; Dec 19, 2010 at 03:55. Reason: pre-new-sticky cleanup

  2. #2
    SitePoint Wizard Goof's Avatar
    Join Date
    Feb 2001
    Location
    Pittsburgh, PA
    Posts
    1,154
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: SQL and RDBMS Database Design DO's and DON'Ts

    Originally posted by MattR
    Generic DON'Ts:
    1. Do not fall into the trap of what I like to call 'Auto_Increment Induced Insanity'. Not every table needs a sequential integer as the primary key. Design your primary key logically based upon your data, and if need be then consider a sequential integer if traditional primary key design fails.
    I guess for people programming smaller systems this would be the case, but if you program anything that has potential to be somewhat large, then I think using auto incremented ints for primary keys should be the default action. The reason I say this is because of their size.

    For instance, an average username of 10 digits is 10 bytes, while a nice int is only 4 bytes (MS SQL Server). If the users table was the only place that used the key, then it's no big deal (even with hundreds of thousands of users, the size wasted wouldn't be a big deal). However, the nature of the primary key is to use it to link to other data in other tables, which means that the extra 6 bytes are probably in just about every record (for a nice user/content management system, forum, or something along those lines). 10 Million records means 10 megs of storage (with 120 gig drives selling for $100 bucks, this might not be a big deal, but why waste space if you don't need to?) The other reason you might want to use integers instead of strings is that most servers can perform opperations on integers much faster than on strings. Again, for a small system no big deal, but for a larger system it might be something to keep in mind.

    Just thought I'd share my two cents,
    Goof
    Nathan Rutman
    A slightly offbeat creative.

  3. #3
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Goof, that makes sense and is often how I design. It wasn't what I had in mind with that comment, though.

    Something along the lines of this I have seen a lot:
    Code:
    CREATE TABLE relationship(
      relationship_id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
      table1_id       INT                NOT NULL,
      table2_id       INT                NOT NULL
    )
    Here there is no need for the relationship_id tacked on there.

    For a user's table the userid is nice as well since you don't have to worry about the user editing his or her username and then being forced to cascade it throughout many tables (big performance hit).

    However 'by default' is not a good habit to get into. What would you do if you had a state table, or a country table?

  4. #4
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by MattR
    Goof, that makes sense and is often how I design. It wasn't what I had in mind with that comment, though.

    Something along the lines of this I have seen a lot:
    Code:
    CREATE TABLE relationship(
    relationship_id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
    table1_id INT NOT NULL,
    table2_id INT NOT NULL
    )
    Here there is no need for the relationship_id tacked on there.
    Hi Matt, thanks for your great article, but I have to admit that I don't entirely agree with you here.

    What you are saying is the the FKs here are used in an identifying relationship with master tables, and that using the two FKs here as a PK is the ideal way since it avoids an essentially redundant auto_increment PK.

    You assumption is, that the parties to n:m relationship will never cahnge, since if they do there is no alternative but to delete the PK and add the two FKs, one of which has changed, as the replacement PK.

    Now this may be true most of the time, but some times it is necessary to update a n:m relationship. For example, when one company replaces another company in the n part, on the m side you might have companies who are the customers. If you can reference the tuple using a single ID value, you can change the FK references without destroying the semantics of your schema.

    So I would suggest keeping it simple. For this reason I long ago stopped worrying about compound PKs and decided to add an ID (long integer) as the PK to every entity. It certainly makes OO coding easier, every Data Object (Value Object for Java folks) can be instantiated and persisted with knowledge of a single ID. This keeps the coding simple and consistent for every persistable object.

    Just my tuppence.
    Richard

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by asterix
    Now this may be true most of the time, but some times it is necessary to update a n:m relationship. For example, when one company replaces another company in the n part, on the m side you might have companies who are the customers. If you can reference the tuple using a single ID value, you can change the FK references without destroying the semantics of your schema.
    without destroying the semantics? huh?

    even if your database does not support ON UPDATE CASCADE you can still accomplish the same yourself -- just update the FK, which also happens to be part of the PK

    i don't understand how you would even get around to referencing the surrogate id this case, because in order to find the rows you need to update the keys of, you have to follow a join from one of the company tables, and this join fer shure ain't gonna be using the surrogate id

    the only reason for wanting a surrogate PK in a many-to-many table that i can think of is when that table is the parent of another table and the child table wants, for some reason, to minimize the size of its FK to the parent

    for example,

    team_player ( teamid, playerid )
    team_player_roster ( teamid, playerid, startdate, enddate )

    here you could use a surrogate id for the team_player table, and thus also for the team_player_roster table's FK (and thereby remove both team_id and player_id from the team_player_roster table), but there's a major drawback for this design -- now you cannot search just the team_player_roster table by itself to find out which players have played more than once for the cubbies (team_id=37), without joining to the team_player table on the surrogate key just in order to find team_id=37

    all in all i don't like surrogate keys, especially for many-to-many tables, although i do use them in other situations
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    even if your database does not support ON UPDATE CASCADE you can still accomplish the same yourself -- just update the FK, which also happens to be part of the PK
    Nope, won't work. You are assuming that the new relationship doesn't already exist, if it does you'll violate a constraint.

    Quote Originally Posted by r937
    i don't understand how you would even get around to referencing the surrogate id this case, because in order to find the rows you need to update the keys of, you have to follow a join from one of the company tables, and this join fer shure ain't gonna be using the surrogate id
    Yep, coming from either side you'll need to use the FK. But thats less than challenging for an RDBMS! The point is, that when you need to update the PK, you don't need to search on two fields, since you'll likely already have retrieved your autoID previously.

    I'm not saying that this method is pretty, I'm just saying that programmers can deal with n:m tables in the same way as any other persistent data. A consistent autoID field in every entity makes life a lot easy for programmers, even if does take your DB from 3NF to 2NF.

    Quote Originally Posted by r937
    can think of is when that table is the parent of another table and the child table wants, for some reason, to minimize the size of its FK to the parent

    for example,

    team_player ( teamid, playerid )
    team_player_roster ( teamid, playerid, startdate, enddate )
    Good exmple. But I prefer a different reason: Primary Keys should not have any intrinsic meaning (semantic) whatsoever at all. If you say "the PK is built from the FK of the two participitating entities", then you could have something like this:

    Employee (SocSecNumber, Name....)
    ProjectTeam (ProjectName, Customer...)
    EmployeeToTeam (SocSecNumber,ProjectName)

    Meaning that your PK now contains meiningful information, it violates (IMHO) the basic tenet that a PK's meaning is only to identify a record. Now the PK contains meaning over and above its original purpose.

    OK, if the PK is built from 2 other autoID fields, then the meaning is somewhat reduced, but it nevertheless confers over and above a means of identifying a record (actually the entire record is the PK).

    Quote Originally Posted by r937
    all in all i don't like surrogate keys, especially for many-to-many tables, although i do use them in other situations
    I guess that this is a topic which we will disagree about all day

    I can see your point: you don't like surrogate (technical) primary keys, you prefer identifying some existing attribute(s) which will confer uniqueness to your record. The thinking being that this is "pure", "minimalistic", "using the data", and not adding any level of data which is of no use to anyone except maybe some DBAs or programmers.

    I've been there and done that, spent many painful hours rejuggling a logical schema when a new requirement killed my previous assumptions about what made a good PK way back when.

    I give some leeway about using compound PKs on intersection tables. At least until I have to explain to a programming team why their elaborate, custom built code generation tools will fail when modelling n:m relationships using compound keys. Then I take the easy route: I give them an autoID, they are happy, I am happy, the database has suffered by forcing it to manage a further 4 bytes per record.

    In summing up, I would like to suggest that issuing generalities "do..., don't..." whilst certainly being very useful in some cases, can lead to the creation of dogmas, stifling originality at the expense of "received wisdom".

    Cheers,
    Richard

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    hey richard, i agree, let's agree to disagree
    Nope, won't work. You are assuming that the new relationship doesn't already exist, if it does you'll violate a constraint.
    this is true whether or not the many-to-many table has a surrogate PK
    The point is, that when you need to update the PK, you don't need to search on two fields, since you'll likely already have retrieved your autoID previously.
    that does not follow -- if you need to update one of the two columns in the PK, then just do it: UPDATE M_N_TABLE SET FK2 = new value WHERE FK2 = old value (with the added proviso that the new value does not exist already -- but your app takes care of that, right?)
    A consistent autoID field in every entity makes life a lot easy for programmers, even if does take your DB from 3NF to 2NF.
    i'd love to hear your explanation of how the presence of a surrogate PK changes a table from 3NF to 2NF
    ...Meaning that your PK now contains meiningful information, it violates (IMHO) the basic tenet that a PK's meaning is only to identify a record. Now the PK contains meaning over and above its original purpose.
    not at all, it has meaningful information only if the PKs that the two FKs refer to do
    ... (actually the entire record is the PK)
    and the problem with that is...??
    In summing up, I would like to suggest that issuing generalities "do..., don't..." whilst certainly being very useful in some cases, can lead to the creation of dogmas, stifling originality at the expense of "received wisdom".
    on this last point, i totally agree

    therefore, in keeping with the topic of this thread, let me summarize:

    DO use natural keys instead of surrogate keys, but DON'T do it if it will get you into trouble


  8. #8
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> DO use natural keys instead of surrogate keys, but DON'T do it if it will get you into trouble

    But don't natural keys violate 3nf? SSN would on the surface seem to be a good natural key, but we all know in reality that is not true. In fact, many developers agree that there is no place for natural keys in any application of serious consequence.
    -ChaCha

  9. #9
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How do natural keys violate 3NF?

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by ChaCha
    In fact, many developers agree that there is no place for natural keys in any application of serious consequence.
    developers on mars, maybe

    here on earth there are plenty of serious database developers who understand the difference between natural and surrogate keys

    any developer who uses surrogates exclusively, by rote or by conviction, is, in my books, not serious

    and no, to answer your question, natural keys do not violate 3NF, developers violate 3NF

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I understand the difference between natural and surrogate keys. I am from the school of thought that a good key is one whose sole purpose is identification of the row. Any other column is innapropriate for unique identification purposes, as it could potentially change. Take the SSN example, not only does the US recycle SSN's after someone deceases, but what if they type it in incorrectly? A column that is used as an identifier should never be seen, edited, or heard from by a user using the consuming application. It is an atifact of the rdbms. Even industry standard EIN numbers aren't always used as primary keys in the database. Agree or disagree, but it is not that difficult to always use a surrogate key - either a guid or an autoincrement - and in my opinion the benefits outweight the costs.
    Code:
    natural keys do not violate 3NF, developers violate 3NF
    Wrong. developers who choose natural keys run the risk of violating 3NF.
    -ChaCha

  12. #12
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I should clarify, though, that my dogmatic position contrary to the use of natural keys is only my opinion, and only for tables that are updated frequently.
    -ChaCha

  13. #13
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ChaCha
    I am from the school of thought that a good key is one whose sole purpose is identification of the row.
    "Natural" keys fulfill this requirement.

    Any other column is innapropriate for unique identification purposes, as it could potentially change.
    So what?

    Wrong. developers who choose natural keys run the risk of violating 3NF.
    You still haven't explained this. Please do.

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    yeah, chacha, i'm interested in your theory too

    just how does a natural key violate 3nf?

    please, do reveal your thought process

    oh, and by the way, if you cannot design a data model without surrogate keys, don't take this personally, but i think you are missing some basic theory
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    "Natural" keys fulfill this requirement.
    No, they don't because they originally were to identify the data in the real world and were generated outside your system. As far as your system is concerned it should have no special meaning because it exists independently of your schema. Is it possible to find a natural key that will work, sure - but you are hoping that some unforeseen event never happens.

    So what?
    So you can't guarantee your key is immutable, that's not a big deal? Where did the data for the natural key come from? Users? You? Your boss? If the answer to any of these is yes, then it is untrustworthy data. What if the user typed his/her social security number wrong. What if the supplier typed the EIN number wrong - but it was unique at the time of insert. Now, conceptually, you have a problem. You have a unique identifier that is transient.

    You still haven't explained this. Please do.
    In my mind, data in a row can't possibly be dependent on a transient value - which as I stated earlier is what a natural key is when the data is not generated inside your system. Once I identify a row of data in my database, I don't ever want that identifier to change. If I were to script out a sql query into a textfile the moment after insertion that looked like...
    Code:
    select * from Reliable where pk = SCOPE_IDENTITY()
    then in 50 years if that row is still there and the USA is dead and gone I want to be able to find it. There is no guarantee that will happen with natural keys. That's why.
    -ChaCha

  16. #16
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    oh, and by the way, if you cannot design a data model without surrogate keys, don't take this personally, but i think you are missing some basic theory
    Sure, I COULD design a data model without using surrogate keys, buy why would I want to? Don't tell me performance, because no query is going to suffer appreciably from adding 1 column. Plus you can still index the other column. It's this whole stupid DBA bravado thing - like look at my database - I understand the domain so well that I can use natural keys. Why would you waste your time on such minutia. It is laughable. Why risk using a natural key that *might* change in the future when you can use one that will not ever change - ever?
    -ChaCha

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    well, sure, that makes a lot of sense -- if you've never heard of ON UPDATE CASCADE

    i could give you lots of examples of where i don't waste my time on declaring surrogate keys i don't need

    but let's agree to disagree on that point

    what i want to know is where's your explanation of how natural keys violate 3nf?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    if you've never heard of ON UPDATE CASCADE
    I don't doubt that you can easily cascade the update throughout the database if the natural key's value has changed. The point is that a key should not be transient. Like I said, the natural key has special meaning outside the context of your database. That special status can be revoked by the powers that be without your consent, rendering your key useless.
    i could give you lots of examples of where i don't waste my time on declaring surrogate keys i don't need
    And for every one I could probably show you a remotely possible event that would render that key not unique.
    what i want to know is where's your explanation of how natural keys violate 3nf?
    If a column value is transient then simply put the other columns in the row can't be 'functionally dependent' on that column to provide identification for the tuple.

    Why insist on using natural keys when the overhead for the alternative is so low, and gives more reliable identification?
    -ChaCha

  19. #19
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    CA
    Posts
    210
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, I will. I don't have an axe to grind, but I do thank you for resting the discussion. I think this will be instructive for people learning to design data models. They can consider both sides and do as they please. However, I still am waiting to hear why, given the argument I outlined, anyone would ever insist on using a natural key whose value is dependent on a strategy that lies outside your system. That is the fundamental issue at heart. Whether the uniqueness of the key is dependent on something that lies outside your control. Thanks for the discussion.
    -ChaCha

  20. #20
    SitePoint Enthusiast
    Join Date
    Aug 2004
    Location
    Houston, TX
    Posts
    71
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Be careful about modifying the user data on the fly prior to insert/update in the database. Changing the user's data without them being aware should be discouraged. If quotes are not allowed, then have the application inform the user rather than silently fixing the data. Or, be sure to put it into the documentation. Most users don't read documentation so it may be better to provide a confirmation message of how the data was changed.

  21. #21
    SitePoint Member
    Join Date
    Jul 2006
    Posts
    12
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Since this is a general do's and don'ts thread, I throw in one regarding a dimensionally modeled reporting enviornment, replacated from the source system.

    When building a star or snowflake schema in a dimensional model for reporting purposes, DO NOT reuse the surragate keys from the source system as the join path from the fact tables to the dimensions. If something about a record changes in the source system, it's the job of the reporting enviornment to capture the record as it was originally, as well as what it was updated to. This will be impossible if you reuse the source system's surragate keys.

  22. #22
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    wow.. some heated debate.
    Personally when developing apps using mysql, i've found myself almost always using auto increment columns, the only exception being when there is a good single natural key column of a numeric or short fixed length data type. I always add an alternate key for the natural key of the table though (if i didnt use it as th pk). Something like last name, first name, phone number i would never consider using as a pk.. not only would it be extremely slow to join but the coding would be quite a bit more complicated. Also, update cascade doesnt work in all dbms's (ORACLE!!!) and its annoying to have to write triggers in every case. So far i've found it easiest to code with auto increment's, and i'll stick to it.
    Web forms typically have things like checkboxes and select dropdowns, if you need to get at a row its nice to have a single value key so you don't have to do extra work parsing the data after the fact.

  23. #23
    SitePoint Zealot
    Join Date
    Nov 2006
    Posts
    127
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Remember queries which use wild cards such as SELECT * WHERE myCharColumn LIKE '%anything' will always result in a table scan, so only use leading wild cards when absolutely necessary.
    Are you guys sure using * always leads to a table scan? What if a developer needs to return every column and is sure of it?

  24. #24
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Skeptical View Post
    Are you guys sure using * always leads to a table scan?
    not what he said

    LIKE '%anything' gives a table scan

    LIKE 'anything%' doesn't

    the dreaded, evil "select star" should never be used, full stop

    actually, there are a few instances where it can be used, but these are obscure outliers
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  25. #25
    Non-Member
    Join Date
    Dec 2006
    Posts
    269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Skeptical View Post
    Are you guys sure using * always leads to a table scan? What if a developer needs to return every column and is sure of it?
    It depends upon the WHERE condition, and whether an index can be used.

    In the case of the LIKE operator, an index can be used if the wildcard (%) appears at the end of the character string. Why? Because the index is sorted in ascending or descending order. The RDBMS can then scan the index (which has a hierarchical tree design), looking for a match for the first few characters of the string. It can quickly find the index pages that it needs to search, because of the index structure. It will only search the index pages that qualify and will only access the data pages that are needed.

    However, if the wildcard is at the beginning of the string, then there is no way for the the optimizer to know how many qualifying occurrences exist in the database (no statistics, therefore cannot use the index to refine the search), it chooses to gut it out and simply scan the whole table, as its only possible strategy which it defaults to. Different optimizers work differently, and this explanation only applies to relational databases. Obviously, text search databases have entirely different strategies for parsing queries and scanning their databases.

    Rich


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •