SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    the big, bad thread of "MySQL Best Practices and Other Useful Information"

    in this thread will be posted various Best Practices and Other Useful Infomation, gleaned from the experienced members lurking around the forums and good other external resources.

    please observe the following when posting in this thread:
    • make sure to create a good title when replying to this thread. see my first couple posts for examples.
    • this is not the place to ask questions about a problem you're having. if you need help with something but want to reference a topic in this thread, please right-click the post number in the upper-right of the post and pick copy link. then start a new thread (not a reply) and paste that link in.
    • if you disagree with a point, go ahead and post in this thread, but include some really good evidence.
    • give credit where credit is due. if you get your information from an external source, make sure to reference your source.
    • if you post sample code, make sure it's not vulnerable to SQL injection or other problems, or state the that code is not inherintly secure and needs additional consideration.
    • if your post is version specific, include that information in your post.

    let the knowledge feast begin!

  2. #2
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    mysql "features" to avoid, part 1: hidden group by fields

    mysql is not very strict in certain situations. this leads to problems: other SQL servers are often more strict and won't accept mysql's syntax, the feature becomes unsupported when mysql releases newer versions (good SQL compliance is a goal of mysql), or the results returned are unexpected. this is the first in a series detailing which of these features to avoid.

    standard SQL requires that all non-aggregate fields in a select statement with a group by clause appear in that group by clause. mysql does not have this requirement. this "feature" is supposed to be used in situations where the "hidden" fields do not change. for example:
    Code:
    select u.user_id
         , u.username
         , count(p.id)
      from users u
    left outer
      join posts p
        on p.user_id = u.user_id
    group
        by u.user_id
    this query will return a list of users and the number of posts they've made. notice that one of the non-aggregate fields (u.username) does not appear in the group by clause.

    according to the mysql manual, u.username does not need to be in the group by clause because that value will not change. however, in situations where you DO need to include a field in the group by clause, mysql doesn't warn you that your query will return indeterminate results. here's a common mistake:
    Code:
    select f.name
         , max(dateposted)
         , fp.title
      from forum f
      join forumpost fp
        on fp.forum_id = f.forum_id
    group
        by f.name
    the user who wrote this query mistakenly assumes that fp.title will contain the title of the latest post from that forum. it will not. according to the mysql manual, "The server is free to return any value from the group, so the results are indeterminate unless all values are the same".

    references:
    http://dev.mysql.com/doc/refman/5.0/...n-columns.html
    Last edited by Mittineague; Sep 15, 2010 at 17:40. Reason: link update

  3. #3
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    mysql version comparison

    some useful information: features that were added and removed in the different versions of mysql. this is not an exhaustive list, but instead important things you need to keep in mind when developing for a specific platform.

    mysql 3.23 (retired)
    • the first version where innodb tables were included and turned on by default
    • partial support for inner joins
    • later versions added partial support for outer joins
    • no union support


    mysql 4.0 (end-of-life)
    • full support for union
    • full support for inner and outer joins
    • upgraded myisam tables to replace isam tables
    • query cache


    mysql 4.1 (end-of-life)
    • added support for per-table character sets and collations (but you absolutely need to use a later release in this series because early versions had some evil problems with multibyte character set support in certain functions)
    • better password hashing
    • subqueries (early versions in the 4.1 series did not allow some clauses inside subqueries, like LIMIT. this restriction was eventually lifted.)
    • spatial extensions
    • clustering using the ndbcluster in-memory tables


    mysql 5.0 (current stable release)
    • added support for per-column character sets and collations
    • stored procedures
    • triggers (in early versions of the mysql 5.0 and 5.1 series, triggers could not refer to any data other than the row that caused the trigger or run non-deterministic procedures. in later releases, this restriction was lifted and triggers can run just about any query from any table, and just about any procedure.)
    • views
    • HUGELY improved replication
    • INFORMATION_SCHEMA database added (allows access to database objects via standard SELECT statements)


    mysql 5.1 (beta)
    • horizontal partitioning
    • event scheduling (like a built-in cron)
    • XML data manipulation in VARCHAR and TEXT columns
    • disk based clustering


    if i missed anything significant, please let me know via PM and i'll add it to the list.
    Last edited by ScallioXTX; May 8, 2011 at 09:00.

  4. #4
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    mysql "features" to avoid, part 2: non-SQL standard quoting

    here are the quotes i will be refering to in this post:
    ' single quote, a.k.a. apostrophe
    " double quote
    ` backtick (on the same key as ~ on US keyboards)


    let's start by discussing ANSI SQL standard quote usage. in ANSI SQL, single quotes (and only single quotes) are used to signify a string. if a string contains a single quote, double the quote. for purposes of correctly processing a query, this is the ONLY escaping that needs to be done and is completely supported by mysql. here's an example:
    Code:
    insert
      into users
         ( username, first, last, active )
           values
         ( 'brian', 'Brian', 'O''Reilly', 0 )
       , ( 'billy', 'Billy', 'O''Toole', 1 )
    note that the last names contain quotes and therefore have the single quotes doubled. when mysql stores the data, the extra single quotes will be removed. therefore, there's no need to try and remove doubled single quotes when retrieving data.

    best practice: escape quotes only by doubling the single quotes. since it's ANSI SQL standard, it work with most other database servers. even though mysql supports escaping by using backslashes, this is not always portable to other servers. since doubling the single quotes is ANSI SQL standard, it's very likely to be portable.

    however, if you adopt a database abstraction layer for database connectivity and it uses slashes for escaping, make sure the only function from the mysql client library it uses to insert those slashes is mysql_real_escape_string() as it is the only function guaranteed to be appropriate for mysql. (by extension, yes, PHP's addslashes() and magic_quotes are NOT appropriate)

    notice also that the numbers were not in quotes. ANSI SQL does not allow numbers to be in quotes.

    best practice: for maximum portability, do not put numbers in quotes.

    next up is the double quote. in ANSI SQL, double quotes are used to quote identifiers (database, table and column names). use them in cases where the names of your objects contain spaces, operators (. , + / etc.) or are reserved words. here's an example:
    Code:
    select id
         , "product name"
         , "count"
      from inventory
     where size = 'XL'
    unfortunately, by default mysql does not allow double quotes to be used for identifiers. by default, double quotes in mysql can also be used to identify strings, just like single quotes. this can be changed by setting the SQL mode, but it is unusual in mysql installations to see this. mysql instead uses backticks to quote identifiers. unfortunately, this backtick syntax is not widely supported by other servers.

    best practice: use names that do not requires quoting. this is the only thing guaranteed to be supported by servers from different vendors since it's ANSI SQL compatible. staying away from reserved words and limiting yourself to letters, numbers and underscores is a good way to stay safe.

    bonus tip: DO NOT use microsoft word for ANYTHING relating to SQL development. microsoft word has this nasty habit of converting single and double quotes to and , which are not recognized by SQL, mysql or otherwise. do yourself a favor and get a real code editor.

    references:
    http://dev.mysql.com/doc/refman/5.0/...-sql-mode.html
    http://dev.mysql.com/doc/refman/5.0/...pe-string.html
    http://dev.mysql.com/doc/refman/5.0/...ng-syntax.html
    Last edited by longneck; Nov 7, 2006 at 16:31.

  5. #5
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    why does count(*) take so long?

    this query can sometimes take longer than you would suspect to complete:
    Code:
    select count(*)
      from sometable
    usually, a complaint of a slow count(*) is asked in the context of an innodb table, so i will start with an explanation of how myisam tables handle this query.

    with myisam tables, the number of rows in the table is kept in a special counter in the table header. a request for count(*) on a single myisam table with no where clause returns that value instantly without consulting any data or indexes.

    with innodb tables, there is no row counter. and even if there was, it would not always be accurate because of transactions. that only leaves one option: actually count all the rows.

    the worst case scenario is when the transaction isolation mode is set to repeatable read or higher. in these modes, every row that you "view" has to be duplicated so that if you go back and read it again the same result is returned. hence the mode name "repeatable read". when you ask for count(*), you've asked to examine every row in the table. innodb has to duplicate the entire table so when you ask for one of those rows again or another count(*), you get the same value.

    note: innodb doesn't actually create a copy of all of the rows in all cases, but that is the worst case scenario. this process is called multi-versioning and innodb's implementation is discussed in the innodb section of the mysql manual. the performance implications are effectively the same when compared to myisam's behavior: multiple orders of magnitude difference in execution time.
    Last edited by Shyflower; Apr 17, 2011 at 14:01.

  6. #6
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    query optimization: benchmarking functions

    this is the first in a series of posts on query optimization. there are loads of good articles and tutorials on the internet about using EXPLAIN to optimize queries, so i'll skip that particular tool and concentrate on lesser-known tools.

    questions on the sitepoint forums often yield multiple answers, each with a different approach. some are obviously better than others, but others are more difficult to compare. using EXPLAIN is a good way to make sure indexes are being used, but it shows nothing about the efficiency of the constituent statements within a query. this is where the valuable tool called benchmark() comes in.

    for example, compare the following two queries:
    Code:
    select case when 'foo'='foo' then 1 else 0 end;
    select case 'foo' when 'foo' then 1 else 0 end;
    since both statements are very similar and return the same result set, you would think that the performance would be the same. let's test that theory using the benchmark() function:
    Code:
    mysql> select benchmark(10000000, case when 'foo'='foo' then 1 else 0 end);
    +--------------------------------------------------------------+
    | benchmark(10000000, case when 'foo'='foo' then 1 else 0 end) |
    +--------------------------------------------------------------+
    |                                                            0 |
    +--------------------------------------------------------------+
    1 row in set (1.11 sec)
    
    mysql> select benchmark(10000000, case 'foo' when 'foo' then 1 else 0 end);
    +--------------------------------------------------------------+
    | benchmark(10000000, case 'foo' when 'foo' then 1 else 0 end) |
    +--------------------------------------------------------------+
    |                                                            0 |
    +--------------------------------------------------------------+
    1 row in set (0.95 sec)
    as you can see, the second method is slightly faster. the difference isn't much, but in a CPU bound query with a couple of case statements and thousands or rows, the difference can add up.

    here are some other optimization lessons that i have learned:
    Last edited by longneck; Nov 9, 2006 at 13:02.

  7. #7
    malloc
    Join Date
    Nov 2006
    Posts
    272
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    comparing long strings takes longer than short strings, so if you're matching 1000+ characters, computing the md5() hash would probably be faster
    Can I just clarify that you mean cache the md5 hashes and compare them, not simply replace string comparison with md5 calc + comparison?

    It took me a couple of reads through to figure that one out.

  8. #8
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    when should i use indexes?

    here are some common cases where indexes should be used:

    join clauses
    columns that are part of joins should be indexed in both tables. for example:
    Code:
    select a.foo
         , b.foo
      from tablea a
      join tableb b
        on a.bar = b.tablea_bar
    the columns tablea.bar and tableb.tablea_bar should both have indexes. this can improve query execution speed from minutes down to seconds.

    where clauses
    columns that are used in where clauses should generally be indexed. for example:
    Code:
    select foo
      from tablea
     where bar = 5
    the column tablea.bar should be indexed. additionally, you could create a compound index on (bar, foo) to really speed things up. with that index, mysql will not have to pull any data records off the disk to find the corresponding value of foo. most likely, it will have already been brought in to memory when mysql pulled the index off disk. this is how to create a compound index:
    Code:
    alter table tablea
    add index (bar, foo)
    a compound index will also help queries with multiple items in the where clause:
    Code:
    select wee
         , ping
      from tablea
     where bar = 5
       and foo = 11
    this query could benefit from the index created above. note that mysql will only sometimes recognize that reordering elements in the where clause will result in an index being able to be used. rule of thumb: try order your where clause so that the elements match the order of an compound indexes you have.

    group by
    Code:
    select bar
         , count(*)
      from tablea
    group
         by bar
    if bar is indexed in the above query, mysql can quickly consult the index and return the answer without ever having to read data rows.
    Code:
    select bar
         , count(*)
      from tablea
     where foo = 11
    group
        by bar
    in this case, creating independent indexes on foo and bar would be somewhat helpful, but a compound index on (foo, bar) would be most helpful. to understand why, you need to know that: a) mysql processes the WHERE clause before the GROUP BY clause, b) mysql will only use one index per table, and c) compound indexes can only be used in the order they are created. so create your compound indexes so that mysql can take advantage of more than one part.

    in all cases...
    make sure that any columns that you have indexes for appear on the left side of any equality operators by themselves. for example, in the following query, an index can not be used:
    Code:
    select foo
      from tablea
     where year(some_date_column) = 2004
    because some_date_column is being processed by the year function, mysql can not consult an index. rewriting your query like this will allow an index to be used:
    Code:
    select foo
      from tablea
     where some_date_column between '2004-01-01' and '2004-12-31'

  9. #9
    secure webapps for all Aleksejs's Avatar
    Join Date
    Apr 2008
    Location
    Riga, Latvia
    Posts
    755
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Common Queries
    Extension to chapter 9 of the book "Get It Done With MySQL 5&6" by Peter Brawley and Arthur Fuller

    From their site:
    Common queries
    Our collection of common MySQL queries soon outgrew Chapter 9. It runs now to more than 120 printed pages, and it's still growing, so we turned it into a PHP page driven by a pair of MySQL tables implementing an edge list tree model (for how to use edge list and other tree models in MySQL, see Chapter 20 of the book). Click here for the Common Queries page, and here for a panelled tree view of the same material.

  10. #10
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,032
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Don't use != for "not equal to" as it is specific to MySQL, instead use <>
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  11. #11
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,182
    Mentioned
    191 Post(s)
    Tagged
    2 Thread(s)
    I don't know if I'd go as far to say it's specific to MySQL, there are lots of different databases. But it seems there are at least some that don't recognize it.

    So if you know your code will be used only with MySQL it doesn't much matter. If you're concerned about developing a good habit so you won't have problems while working with a different database, then using <> is the wise choice.

  12. #12
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,182
    Mentioned
    191 Post(s)
    Tagged
    2 Thread(s)
    Determining what version of MySQL you have

    There are times when knowing what version of MySQL you are using is helpful. eg. When posting a thread asking for help

    One way is to use this query
    Code MySQL:
    SHOW  VARIABLES  LIKE  "%version%"

  13. #13
    SitePoint Addict bimalpoudel's Avatar
    Join Date
    Feb 2009
    Location
    Kathmandu, Nepal
    Posts
    279
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If your data is really sensitive, I suggest to use the following additional flags:
    added_on, modified_on, sink_weight, is_active as:
    Code:
    `added_on` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Added on',
    `modified_on` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Last modified on',
    `sink_weight` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Sorting weight',
    `is_active` enum('N','Y') NOT NULL DEFAULT 'N' COMMENT 'Active Record?',
    Here are my feelings:
    Added On (added_on)
    This defaults to UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) when you first make your INSERT query.
    This means, you log the time when this entry was made to the database. Later, you can reuse
    this column to calculate the age of the record, sort chronologically, or any other purpose.
    But my aim is to secretly log the first entry time of the record.

    Modified On (modified_on)
    Record when your data was last modified on. Use this during your UPDATE query.
    Set to the UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) at the moment.
    Also, while deleting a record (see below).
    This might be useful in identifying which record has the fresh information.
    As seen in PHPBB forum updates (if you reply to the old posts, it appears on the top again).

    Sinking weight (sink_weight)
    The buoyancy: heavier objects sink deep (appear on the bottom of the list) and
    the reverse. Less the number, the record appears on the top. By this way, you can create
    an admin interface to allow a user control the sinking amount. Or, just symbolically the
    move up and move down arrows. Move up means decreate the sink weight value that the one
    just above it currently. Move down means increase the sink weight just one more than the one
    in the bottom of current record.

    Is Active (is_active)
    Secretly administer your data! Never DELETE any data physically. But rather mark them as
    deleted. If Y, it means, you CANNOT show the data anywhere else. If N, use the data a general
    record. By this way, you can survive for a customer's hearing: Oh, I mistakenly deleted my
    precious data, help me! - Just unflag it to N again, manually.

    Final points:
    If you don't like to use them, never mind; your database design might have been still robust.
    But these flags are administrative flags only. These days, you do not have to worry about
    the space that will be added to each record of all the tables. Even if you do not use them
    currently, put them for the future reference and use. One day, you might come to hear a similar
    design. So, why don't make everything compatible to the future demands?

    If you did not have these flags earlier, you can add them; without affecting the current
    feature of your database. Except that, you did not write queries like:
    SELECT * FROM ...
    INSERT INTO table (full list of columns, ...), etc.
    Bimal Poudel @ Sanjaal Framework over Smarty Template Engine
    ASKING INTERESTING QUESTIONS ON SITEPOINT FOURM

    Hire for coding support - PHP/MySQL

  14. #14
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,500
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by bimalpoudel View Post
    Code:
    `added_on` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Added on',
    `modified_on` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Last modified on',
    A sticky is not the place to start discussions, but... if you want to register the moment of insertion and/or modification, use DATETIME or DATE, not int(10).


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •