Top 10 MySQL Mistakes Made By PHP Developers

A database is a fundamental component for most web applications. If you’re using PHP, you’re probably using MySQL–an integral part of the LAMP stack.

PHP is relatively easy and most new developers can write functional code within a few hours. However, building a solid, dependable database takes time and expertise. Here are ten of the worst MySQL mistakes I’ve made (some apply to any language/database)…

1. Using MyISAM rather than InnoDB

MySQL has a number of database engines, but you’re most likely to encounter MyISAM and InnoDB.

MyISAM is used by default. However, unless you’re creating a very simple or experimental database, it’s almost certainly the wrong choice! MyISAM doesn’t support foreign key constraints or transactions, which are essential for data integrity. In addition, the whole table is locked whenever a record is inserted or updated; this causes a detrimental effect on performance as usage grows.

The solution is simple: use InnoDB.

2. Using PHP’s mysql functions

PHP has provided MySQL library functions since day one (or near as makes no difference). Many applications rely on mysql_connect, mysql_query, mysql_fetch_assoc, etc. but the PHP manual states:

If you are using MySQL versions 4.1.3 or later it is strongly recommended that you use the mysqli extension instead.

mysqli, or the MySQL improved extension, has several advantages:

  • an (optional) object-oriented interface
  • prepared statements (which help prevent SQL-injection attacks and increase performance)
  • multiple statements and transaction support

Alternatively, you should consider PDO if you want to support multiple databases.

3. Not sanitizing user input

This should probably be #1: never trust user input. Validate every string using server-side PHP — don’t rely on JavaScript. The simplest SQL injection attacks depend on code such as:


$username = $_POST["name"];
$password = $_POST["password"];
$sql = "SELECT userid FROM usertable WHERE username='$username' AND password='$password';";
// run query...

This can be cracked by entering “admin'; --” in the username field. The SQL string will equate to:


SELECT userid FROM usertable WHERE username='admin';

The devious cracker can log in as “admin”; they need not know the password because it’s commented out of the SQL.

4. Not using UTF-8

Those of us in the US, UK, and Australia rarely consider languages other than English. We happily complete our masterpiece only to find it cannot be used elsewhere.

UTF-8 solves many internationalization issues. Although it won’t be properly supported in PHP until version 6.0, there’s little to prevent you setting MySQL character sets to UTF-8.

5. Favoring PHP over SQL

When you’re new to MySQL, it’s tempting to solve problems in the language you know. That can lead to unnecessary and slower code. For example, rather than using MySQL’s native AVG() function, you use a PHP loop to calculate an average by summing all values in a record-set.

Watch out also for SQL queries within PHP loops. Normally, it’s more effective to run a query then loop through the results.

In general, utilize the strengths of your database when analyzing data. A little SQL knowledge goes a long way.

6. Not optimizing your queries

99% of PHP performance problems will be caused by the database, and a single bad SQL query can play havoc with your web application. MySQL’s EXPLAIN statement, the Query Profiler, and many other tools can help you find that rogue SELECT.

7. Using the wrong data types

MySQL offers a range of numeric, string, and time data types. If you’re storing a date, use a DATE or DATETIME field. Using an INTEGER or STRING can make SQL queries more complicated, if not impossible.

It’s often tempting to invent your own data formats; for example, storing serialized PHP objects in string. Database management may be easier, but MySQL will become a dumb data store and it may lead to problems later.

8. Using * in SELECT queries

Never use * to return all columns in a table–it’s lazy. You should only extract the data you need. Even if you require every field, your tables will inevitably change.

9. Under- or over-indexing

As a general rule of thumb, indexes should be applied to any column named in the WHERE clause of a SELECT query.

For example, assume we have a usertable with a numeric ID (the primary key) and an email address. During log on, MySQL must locate the correct ID by searching for an email. With an index, MySQL can use a fast search algorithm to locate the email almost instantly. Without an index, MySQL must check every record in sequence until the address is found.

It’s tempting to add indexes to every column, however, they are regenerated during every table INSERT or UPDATE. That can hit performance; only add indexes when necessary.

10. Forgetting to back up

It may be rare, but databases fail. Hard disks can stop. Servers can explode. Web hosts can go bankrupt. Losing your MySQL data is catastrophic, so ensure you have automated backups or replication in place.

11. Bonus mistake: not considering other databases!

MySQL may be the most widely used database for PHP developers, but it’s not the only option. PostgreSQL and Firebird are its closest competitors; both are open source and not controlled by a corporation. Microsoft provide SQL Server Express and Oracle supply 10g Express; both are free versions of the bigger enterprise editions. Even SQLite may be a viable alternative for smaller or embedded applications.

Have I missed your worst MySQL mistakes?

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • audii66

    that’s true :)

  • Caio

    How is it that using MyISAM is a mistake? a mistake is to give a flimsy excuse that innoDB has more features and thus is the right choice. No friend, MyISAM is the absolute right choice for any real programmer who KNOWS what he is doing and can handle foreing keys and transactions by himself, and thus not having to worry about the footprint innoDB forces into you. If you really want to say using the simples and fastest engine out ther is a “mistake” favouring one with more features, then go all the way to ORACLE or other huge engines that will clog your server by the time you reach 1000 visitors a day.

    It’s widelly well known that MyISAM has the best performance on all aspects, and unless you are really making a huge system where you don’t want or care to worry about details, then actually using innoDB is the mistake.

    Seriously, you want people out there to make blogs, CMS or galleries with innoDB clutter? not quite.

    And past items 1 and 2, all those mistakes would apply to any database, not only MySQL.

    • http://www.optimalworks.net/ Craig Buckler

      MyISAM is the absolute right choice for any real programmer who KNOWS what he is doing and can handle foreing keys and transactions by himself

      Really? And how do you code your “transactional” PHP around a power outage?

      MyISAM is rarely faster than InnoDB. I’m not sure what clutter it offers either?!

      • bemo56

        When I started with DBs, I think MyISAM was a lot quicker than the alternative at the time. InnoDB has gotten quicker now that its speed isn’t much of a difference, Foreign keys aren’t all that important (ask the NoSQL guys), but sometimes its nice to have an extra layers of checks in your app. MyISAM/NoSQL for web apps, InnoDB for record keeping.

        I really have only stuck with MyISAM out of habit, but I’ve been using InnoDB for recent projects of mine – when I’m not testing with SQLite.

      • neil_metaflow

        I think it depends on what you need to do with the specific table in question. If you need to do transactions, I would agree with Craig and use InnoDB. However, if you need to do FULL TEXT searches that requires MyISAM.

      • Ricardo

        Absolutely agree with you on that InnoDB is by far safer to use, although MyISAM can solve some other problems that InnoDB cannot, you need to consider the problems that are being addressed. Again InnoDB has Transactions and FKs making it “safer” in my opinion.

    • Bravo

      Table locking alone is enough reason to choose InnoDB over MyIsam anytime. Only when you have a table where 99 of 100 queries are select, MyIsam can be an option.

      Overall : InnoDB > MyIsam

      • g

        +1 agree on table locking. When you have a db with 30+GB of data with probably 10G of new data throughout the day but still need availability of the data to users, concurrent inserts & table locking comes into play a lot.

    • Gasolene

      Anyone that suggests implementing data integrity in the business layer has no business writing software.

      Forgoing atomicity, consistency, isolation, or durability for a minor performance boost is unacceptable. If you need more performance, then scale up.

    • Mchl

      MyISAM is no longer better performance-wise than InnoDB. Both engines are pretty much on the same level (depending on how you design your benchmarks).

      Starting with MySQL 5.5 (Q1 2011) InnoDB becomes default MySQL engine. For a reason. Currently MyISAM’s only advantage over InnoDB is Full Text search (a feature which is often delegated to Sphinx or Lucene)

    • http://www.lunadesign.org awasson

      I don’t know about calling it an error but I think InnoDB is a much wiser choice than MyISAM if you plan on scaling your app. I’ve had to extend apps to deal with transactional logic on a normalized database without the benefit of InnoDB’s foreign key benefit and although I can handle the transactions through code I have to ask why would you put yourself through that if you didn’t have to?

      • Mchl

        You can’t handle transactions through code. You only think you do. (At least not fully ACID transactions)

      • http://www.lunadesign.org awasson

        @Mchl: You can’t handle transactions through code. You only think you do. (At least not fully ACID transactions)

        Yes. We are on the same page… That was my point.

      • JasonPSage

        “You can’t handle transactions through code”
        That is absurd. How do you think it’s done? In hardware?

      • http://www.optimalworks.net/ Craig Buckler

        @JasonPSage
        The point being made is that you categorically cannot emulate database transactions using PHP code!

        Assume you’re handling a money transfer from one account to another. That’s 2 SQL updates — one to debit account A and one to credit account B.

        If you’re just using PHP code, you must run both updates and check they’ve worked. But what if your code fails? What if the database is busy? What if the server crashes? What if the power goes down? Anything could happen between the first and second SQL update — the money will disappear forever.

        With database transactions, the two updates are only committed once you’ve run a COMMIT. If that doesn’t occur, the database will revert to its original state and no money transfer will have taken place.

        MySQL provides that functionality in InnoDB tables, but not MyISAM. Why would you try to replicate transactions when:

        (a) the functionality is already available
        (b) PHP code can’t do it, and
        (c) you’re making more work for yourself!

      • Arkh

        The problem is not the fact you’re doing 2 updates in your example. A money transfer (or equivalent thing) which uses only one database is better made using stored procedure than 2 updates. It’s better (imo) to have your data logic as stored procedure as you don’t know how many apps will use your data later on.
        Now, if your transfer imply 2 different server, or 1 database server and a webservice, then you have to use transactions on your databases operations.

      • http://www.optimalworks.net/ Craig Buckler

        @Arkh
        SP code will be far more reliable than PHP because it’s validated and pre-compiled. However, even an SP will need two (or more) table updates — unless you wrap them in a transaction, there’s still a risk of data corruption.

      • Arkh

        @Craig
        My bad, MySQL SP can’t handle transactions themselves. One thing they should copy from Oracle.

      • Gasolene

        “You can’t handle transactions through code”
        That is absurd. How do you think it’s done? In hardware?

        You cannot implement transactions in source code. The database is an externalized component. There is no theoretical method to guarantee data was or was not received by the external service.

        It is simply impossible to implement ACID compliant transactions in the application layer.

    • Dragon

      mmh innodb its great for joins and data that really doesn´t matter if its breaks, myisam its a more stable engine , lets say if you are having a heavy load and your server rests, or power goes out, etc.. your innodb table get heavy corrupt and its really very hard to fix it, you can say okay backup, well yes but myisam does not have that error. test it if you dont believe me, thats why I think that innodb its great to speed some tables but watch out on where, there is no perfect engine for all cases, you should learn which its good for your information. just my two cents :D

      • Mchl

        I think you confused MyISAM with InnoDB. Of the two, it’s InnoDB that’s crash safe, and MyISAM that has a reputation of being faster.

      • Gasolene

        I’m not sure where you got that idea. InnoDB is safer during power outages than MyISAM.

    • Fab

      I was about to comment about the first point too, but i see this point is already quite polemical.
      MyIsam is not absolutely wrong. It’s just that InnoDB is much better in most situations. But if you know what you’re doing (and why), then it is not wrong to use MyIsam.
      I guess this article was made for people who don’t know too much about databases, so it’s good to give a simple rule.
      For all others, i totally agree, as i see them almost daily…

  • thinkt4nk

    The only point with which I would disagree is the suggestion not to use * in sql queries; While the point here is to optimize performance, as the author states, the database tables *will* change over time. Explicitly selecting a set of columns will break functionality when those column names no longer exist. You’re sacrificing dynamic adaptation for performance, which is something I wouldn’t do for the small amount of performance that this suggestion will lend. Good post though, thanks!

    • ScallioXTX

      The problem with using * is not the fields that removed, but fields that get added later. Suppose you have a SELECT * and later decide to add one or more BLOBs holding images to that table, but you don’t use them in your application where you had the SELECT *
      You’ll be wasting a lot of bandwidth for nothing, and if there is a ORDER BY or GROUPING further down the query that will cause a huge performance hit also.

      Still not convinced? http://www.joelango.com/2007/04/30/why-you-should-never-use-select-star/

      • thinkt4nk

        I understood the original point. You didn’t understand mine.

      • David

        As a rule of thumb, I never store blobs or images in a database. Even if you are, you’re better off storing them in their own tables (why would you want to put that kind of overhead in your tables??).

        If performance becomes an issue do some profiling to see what’s wrong. I have never found it to be because of a SELECT *. In some of my tests, SELECT * was actually faster than SELECT ….long string of columns… Overall though, the perfomance difference is typically negligable, and comes at the cost of extra maintenance.

    • someredditor

      It really depends on your database, and no one knows your tables better than you do. If you have a table with lots of columns and is changing frequently, don’t use *. But if you have a small table with only one purpose, then why not use *. If you ever add something like an ID or timestamp to it, it won’t hurt performance by much at all, and you’ll probably need it anyway.

      • http://codefisher.org/ codefisher

        If your columns are changing enough for it to be a problem, you should fire whoever is in charge of your database management.

      • http://www.optimalworks.net/ Craig Buckler

        That’s a little harsh, codefisher! Web applications evolve. Functions are added. Features are removed. Change is inevitable — embrace it!

      • Ken

        I can sympathize with CodeFisher’s point of view, because with well-defined DBs, you will see a sudden stabilization of your data structure. I was changing mine every 6 months, then I studied data design, completely rewrote my structure and even with significant technological changes in the type of data provided, the structure didn’t change for 7 years.

        I agree with naming the fields in the select because if you’ve changed the fields, you’ve changed the interactions and explicitly adding the new changes will put in a reminder to add the business logic to your code to support the new fields. Performance shmormance.

    • http://www.creativeclans.nl guido2004

      Using the ‘*’ in this case won’t break the query, but will break the PHP script the moment it uses that missing column. So it’ll break anyway. Doesn’t seem a valid reason to use the ‘*’ to me, unless I’m missing something.

      • Michael Rüfenacht

        The * is quite useful to keep the attributes of your models, representing an entity, dynamic. I dont think its a good advice to use it all the time, but there are cases it makes sense (and thats why i dont like “pragmatic” statements in comments/articles).

      • http://www.optimalworks.net/ Craig Buckler

        There will undoubtedly be good reasons for using ‘*’. In my experience, laziness is the overriding decision maker … although the developer will be able to provide 101 other reasons why they used it!

    • Thinkindie

      if you indicate a field in the query instead of using * it means you need such field therefore if you remove a column you would break your application even if you use “*” as you would refer to such field sooner or later

    • Zak

      If the database is changed and you are still referencing a column that no longer exists, then you *want* your code to break so it never gets out of testing if you accidentally miss that bit of code during the upgrade.

  • RogWilco

    Just a quick comment on #1:

    I think it’s important to emphasize the differences between InnnoDB and MyISAM so that you can make an educated decision. The description above makes it sound like there’s no question that InnoDB should always be used.

    It’s important to note that while InnoDB provides features like foreign key constraints and transactions, it does *not* offer fulltext indexing – something that MyISAM does offer. This is all but essential for data that needs to be searchable (such as forum posts, blog posts/comments, product descriptions, etc.).

    So in cases where you need full text searching, MyISAM would be the better choice. That’s just one example

    • http://www.optimalworks.net/ Craig Buckler

      See the linked articles for a full explanation of the pros and cons of each type.

      Fulltext search is an obvious problem, but I’d rather have data integrity any day!

    • Jonathan Cremin

      Or you could use InnoDB and offload full-text searching to a specialist like Sphinx or Lucene/Solr.

    • http://www.optimalworks.net/ Craig Buckler

      And let’s not forget that MyISAM fulltext indexing is … how can I put this … a bit crap.

      Perhaps it’s because users have high Google-like expectations? But it’s too limited and usually produces poor results.

    • Frank

      It’s also worth noting that foreign keys perform terrible under MySQL and any good DBA would cringe at the idea of using them on anything that is going to scale.

    • http://www.charlesbryant.co.uk gap_tooth_clan

      fulltext indexing has saved my butt before, made a huge load of development work dissolve into a 30 minute job.

    • MSHYYC

      MyISAM is never the best choice fora production app. If you want proper full text search and want it integrated into your SQL RDBMS then th solution is to dump MySQL entirely and use PostgreSQL or another DBMS that offers BOTH FTS and proper data integrity.

      If you insist on being a “platform bigot” and insist on making that last mistake on the list (not considering MySQL alternatives) then use MySQL with InnoDB backend and add-on FTS products. “It’s just a blog” isn’t an excuse for using a flimsy data store–especially if you burn out a power supply or something causing an uncontrolled server shutdown at just the wrong time and it’s YOUR blog that is hosed as a result.

  • Jeroen

    Maybe you could add some non-relational databases to point 11, i.e. MongoDB and CouchDB?

  • http://r937.com/ r937

    nice job, craig

    NTL;REW (not too long; read every word)

    :)

  • Cezary Tomczak

    Hi Craig,
    1. “… In addition, the whole table is locked whenever a record is inserted or updated: it causes a detrimental effect on performance as usage grows …”
    Do you have some benchmarks? Innodb inserts are 5x times (or more) slower than myisam, I didn’t run the tests, but when I’m importing sql data in phpmyadmin or similar I can feel the difference in time when importing myisam or innodb tables. These are not any big tables with millions of records, so I don’t know maybe innodb gets faster than myisam for the larger tables, but I doubt.
    2. “… an (optional) object-orientated interface…”
    Mysqli is not really object-oriented, it is just a bunch of procedural code put into an object. Have you used it? You will smell it fast.
    “… prepared statements (which help prevent SQL-injection attacks and increase performance) …”
    The whole mysqli really sucks, including prepared statements. If you have used it you would know that for example you can’t use ::fetchRow() on a prepared statement, it means you have to bind every column you fetch in sql query, you can’t get a whole row.
    Example:
    $stmt = mysqli_prepare($Mysqli, ‘SELECT ttid FROM imdb WHERE ttid = ?’);
    ….
    mysqli_stmt_bind_result($stmt, $ttid);
    That’s no problem when selecting one column, but imagine you have 20 columns in the table, you have to do this:
    $stmt = mysqli_prepare($Mysqli, ‘SELECT ttid,2,3,4,5,6,7,8,9,10,11,12,13,14,15 FROM imdb WHERE ttid = ?’);
    mysqli_stmt_bind_param($stmt, ‘sssiiisssiisisisisi’, $ttid, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15);
    Really nice, right? ;)
    It gets nicer when you want the columns to be in an array:
    mysqli_stmt_bind_param($stmt, ‘sssiiisssiisisisisi’, $row['ttid'], $row['column1'], $row['column3'], $row['column4'], $row['column5'], $row['column6'], $row['column7'], $row['column8'], $row['column9'], $row['column10'], $row['column11'], $row['column12'], $row['column13'], $row['column14'], $row['column15']);
    If you would like to use ::fetchRow() to automate that, you can no more use prepared statements, you have to escape data manually:
    $e1 = $Mysqli->real_escape_string($some1);
    $e2 = $Mysqli->real_escape_string($some2);
    $Result = $Mysqli->query(“SELECT * FROM table WHERE some1 = ‘$e1′ OR some2 = ‘{$e2}’ “)
    $row = $Result->fetch_assoc();
    And there is more, an error handling nightmare, when using prepared statements you have to check every return value of every functions used to execute prepared statement, or you get some strange kind of errors that say nothing about the real problem.
    So a proper way to execute 1 query is this (it uses procedural style, but that’s just an example that is really no different than using $Mysqli-> object):
    $stmt = mysqli_prepare($Mysqli, ‘SELECT ttid FROM imdb WHERE ttid = ?’);
    if (!$stmt) { trigger_error(‘mysqli_prepare() failed: SELECT ttid FROM imdb…: ‘.mysqli_error($Mysqli), E_USER_ERROR); }

    $bool = mysqli_stmt_bind_param($stmt, ‘s’, $ttid);
    if (!$bool) { trigger_error(‘mysqli_stmt_bind_param() failed: ‘.mysqli_error($Mysqli), E_USER_ERROR); }
    $bool = mysqli_stmt_execute($stmt);
    if (!$bool) { trigger_error(‘mysqli_stmt_execute() failed: ‘.mysqli_error($Mysqli), E_USER_ERROR); }
    mysqli_stmt_bind_result($stmt, $movie);
    if (!$bool) { trigger_error(‘mysqli_stmt_bind_result() failed: ‘.mysqli_error($Mysqli), E_USER_ERROR); }
    $bool = mysqli_stmt_fetch($stmt);
    if (false === $bool) { trigger_error(‘mysqli_stmt_fetch() failed: ‘.mysqli_error($Mysqli), E_USER_ERROR); }
    $bool = mysqli_stmt_close($stmt);
    if (!$bool) { trigger_error(‘mysqli_stmt_close() failed: ‘.mysqli_error($Mysqli), E_USER_ERROR); }
    It’s object oriented library, but it doesn’t use Exceptions, you can’t use try-catch, you have to check every function for an error, because these functions do not throw errors automatically when something goes wrong, you have to check that manually by checkin the return value. I had a nightmare finding a bug and I had to add those all trigger_error() to find out what was happening and fix it.
    Cheers,
    Cezary.

    • Mchl

      #1 InnoDB works in AUTOCOMMIT mode by default. When doing a sequence of several inserts, you better off putting it into a transaction. Speeds up your inserts tremendously.

      #2 Yes MySQLi API sucks, but it is in fact object oriented. You can extend these classes to fot your needs better.

      • JasonPSage

        LOL – So if I Make a class with one function, 2000 lines long… you’d consider it object oriented because someone could extend my class? I got a kick out of this.. but… I think I may possibly understand where you’re coming from also. Technically you’re right… but I think Cezary’s sentiment is that the MySQLi doesn’t behave in a manner consistent with how many OO classes are engineered.

        Properties set? Get? Functions return object instances? any polymophism in there? There is a gray area on what is versus what isn’t “Object Oriented” … but… if you have programmed both… you learn there are “patterns” or programming “styles” for each and you can really get pretty good idea of which side of the fence a programmer favors.

      • Mchl

        My point was, MySQLi’s API is terrible. Cezary’s code is something I see often and learned to hate. However, you can leverage object oriented (in purely syntactical sense) interface this extension offers to either extend it or put in into a wrapper class that is more pleasant to work with.
        It’s not at all difficult to extend it in such a way, that you can plugin custom error handling or query logging routines.
        You can extend it to throw exceptions when YOU want them to be thrown and of the type YOU want them to be.
        And yes, you can have an easy to use way of fetching rows from prepared statements (albeit slower than the usual one).

    • Josh Johnston

      @Cezary Tomczak

      Regarding #1,

      It isn’t the speed of inserts that is an issue as much as the fact that MyISAM LOCKS the entire table while writing. This table locking BLOCKS select queries thus forcing other processes to wait until the insert is complete to read from the table.

  • Wolf_22

    Excellent post, Craig!

    Number 7 really hit home with me because half the time I struggle to determine which data type to use for things like text input, file storage, and numerical accommodation.

    So how does one normally determination this?

    For example, I think we can all agree (to some extent) that gray areas exist when discussing when to use BLOB or when to use TEXT, TINYBLOB vs. TINYTEXT, so forth and so on…

    Is there a best way, method, or practice used to figure these specific things out or is it all just a matter of trial-and-error? If the latter, then how does one even perform the trial-and-error?

  • Jonathan

    Number 5 is wrong. In situations where your DB is the bottleneck (read “almost always”), you’re going to want to offload that sort of work to PHP.

    Your point would stand in a single server scenario.

    • http://www.optimalworks.net/ Craig Buckler

      Mmm, I still stand by the point that a database is better at data manipulation than an interpreted script accessing the same records. There are probably exceptions, but I can’t think of any!

      • Jonathan Cremin

        It may be better at it, but it potentially also has other things to be doing ;) It depends on the structure of your app and what part of the stack has the most resources available.

        When I said you were wrong, I meant that it’s not always the better approach and in fact is often not the better approach.

    • Anonymous

      If your database is “almost always” a bottleneck (I mean the actual database processing, not connetion and data transfer overhead, then you’re probably “almost always” doing something wrong. There are some cases, where processing data in PHP is faster (especially if instead one multijoin query you use a few smaller, that are more likely to hit your query cache), or just easier to code.
      However, if it was the case of “almost always” being the case, we wouldn’t be using databses in the first place (why introduce something that just slows us down?)

      • Jonathan Cremin

        I never said it was faster. You talking about an ideal world where the PHP and MySQL are equally loaded. Even when this does happen, it’s short lived as they scale differently.

        To take the corollary of your statements, should bottlenecks at PHP be considered the typical scenario? I don’t think I’ve ever encountered that at an application-wide level.

  • Jonathon Hibbard

    Very diasspointing article. The author is putting “needs and wants” over “speed and optimization”.

    • http://www.optimalworks.net/ Craig Buckler

      Could you expand on that?

    • mitke

      I am creating VERY complex programs using Doctrine as ORM. One of my programs fetched more than 80 objects in 7 different select’s with a lot of join’s (3-8 per select). With object hydratation (not array), all that took less than 400ms. With some optimization, it can easily go under 200ms on shared server. Speed? Lame excuse for writting bad code.

  • Anonymous

    Craig, excellent article. And I highly recommend Rudy Limeback’s (r937 in this post) book Simply SQL. I have begun to use InnoDB AND setting up foregin keys now and thanks to Rudy’s book I got a clear, concise explanation of how to do that. Reason I use InnoDB and foreign keys? It speeds up searches and for site that will require lots of data such as ecommerce (and yes blogs and forums) where you’re using search queries this definitely DOES improve response times. Keep up the great work guys!

  • Mchl

    Regarding #9. This advice is too general in my opinion (but on the other hand creating useful indexes is not a topic you can cover in a short blog post together with 10 other topics ;) )
    Anyway, I suggest everyone interested to have a good read at MySQL’s manual chapter devoted to indexing (and query optimisation in general). Point of interest – multicolumn indexes.

  • Matthew F

    These younger kids – the nattering nabobs dissing InnoDB – just don’t understand the value and importance of true ACID transactions. I’m amazed by their insistence that speed is the only thing that matters, even at the cost of correctness.

    For #9, I would add that “indexes should be applied to any column named in the WHERE clause __ or JOIN…. ON clause __ of a SELECT query”. But then, far too few people know the please of using JOINs.

  • http://htmlblox.com samanime

    In practice, I agree with every point that Craig made.

    I see a lot of people complaining that he was “wrong”, but little proof from them (or in my experience) that backs up their statement.

    Great article.

  • sachin

    hi i am new to MySQL . so i dont know how to create multiple tables in one script and execute it . i am using only java. can anyone help me please regarding the scripts.thanks in advance

  • Kabubi

    Error 3.5 give “all privileges” to db users.

    $sql = “SELECT userid FROM usertable WHERE username=’$username’ AND password=’$password’;”
    If you know db name you can write in user field: “admin’; drop database dbname; –”

  • spacecadetno9

    Interesting post & very interesting comments!!

  • http://www.consolegaming.eu Rjs37

    @Kabubi: Isn’t there a status (the name escapes me) that stops multiple queries being run at once? Granted it wouldn’t prevent SQL injection on the first query but I wouldn’t have thought that the second query would work providing that status is set.

    Very Interesting Article though, I taught myself PHP and Mysql have to admit I’ve fallen foul and still do on some of the points though I am improving. Concerning the points:

    1) Is it worth changing over existing systems/tables that are currently using MyIsam? Or just making sure new systems use InnoDB? And is there a problem with using a mixture of tables within a database? I use some paid software and I’m not sure whether it would be a good idea to change those too.

    2) I don’t do this currently but do intend to. But I can see prepared statements being a pain lol.

    4) The database for my main site has users from all over Europe with some language specific forums too so this seems to be a big one for me. Would any issues be caused by changing over to that character set?

    7) Concerning dates is it generally better to use a datetime field or store as a unix timestamp? I’ve always used timestamps and obviously for that I’ve used Ints to store them. I don’t think it would be practical to change my existing systems but worth knowing for future projects.

    • http://www.optimalworks.net/ Craig Buckler

      Hi Rjs37.

      In answer to your questions…

      1. Don’t switch from MyISAM to InnoDB for an existing app unless it’s worth the effort, e.g. better performance, transactions, etc.

      You can mix MyISAM and InnoDB tables. However, you can’t use InnoDB features such as foreign keys on the MyISAM tables. It’s probably best sticking with one or the other unless your data has distinct boundaries.

      2. I’m not convinced prepared statements are a pain — your DB layer can handle the bulk of the code. Ultimately, they’re far safer and can increase performance (in theory).

      4. Are you having problems with character sets now? Try an off-site local test if you decide to switch to UTF-8.

      7. With regard to dates, I’d usually recommend MySQL’s native DATE/DATETIME just because they’re easier to read and you can run date-like queries. However, there may be cases when integer timestamps are more effective, e.g. you’re recording file creation dates.

      I’d recommend using MySQL’s TIMESTAMP field for timestamps … they’re added automatically whenever you INSERT or UPDATE a record. You won’t need to write any PHP code!

      • etr

        “I’d recommend using MySQL’s TIMESTAMP field for timestamps … they’re added automatically whenever you INSERT or UPDATE a record. You won’t need to write any PHP code!”

        I think i saw a tweet from Rasmus just recently that advised against using the MySQL timestamp due to a time difference between php and mysql, forget which was faster but they aren’t identical. Might want to look it up to be sure it’s not gonna bite you if you are using MySQL’s timestamp.

      • http://www.optimalworks.net/ Craig Buckler

        Yeah, I’d heard they were 27 seconds out or something similar. It’s not likely to be a problem for the majority of applications, but it’s something to consider.

      • Frank

        MySQL is not optimized for prepared statements.

      • Eric Stern

        @etr: The supported date ranges are completely different. Unix timestamps are signed 64-bit integers; MySQL datetime is a 9000-year range (1/1/1000 to 12/31/9999) and timestamp has the same range as a signed 32-bit int (1/1/70-1/19/2038). What you really need to be careful of is storing unix timestamps with the wrong column type – anything other than signed bigint – especially if you’re storing stuff like user birthdays that can go before 1/1/1970. You also need to be cautious of the timezone setting of the script when formatting data to insert into date/datetime/timestamp columns (MySQL treats it as UTC, but your application may not). Generally speaking, we (WePay, which is where Rasmus works) store all timestamps as unixtime unless we have a reason not to. Though as a result, we end up with a lot of UNIX_TIMESTAMP/FROM_UNIXTIME calls any time we’re doing custom analytics work.
        More info – http://dev.mysql.com/doc/refman/5.0/en/datetime.html

  • c_spha

    Common mistake that i just released do repeatedly are using the *, forgetting to backup and not considering other databases

  • Jenn

    Another error I see all the time is some designers or new developers who use wordpress (and other cms systems). It seems like most o fthe time they follow the instructions step by step and give that database to root, and have the root user name in config file instead of creating a new user that has limited permissions.

  • John

    Combine 2 and 3 to make the world a whole lot better: Use PDO and parameter binding.

  • http://jonymusky.com.ar

    First of all sorry if my english suck,
    I disagree with point 1 and partial with 5.
    MyISAM is better when you have a lot of SELECT than INSERT, DELETE or UPDATE.
    So… in a lot of Websites MyISAM work better.
    And the point 5 beliver or not in most of cases in PHP extract some values are faster than MySQL.
    I suggest to do some benchmarks.
    Other points I complitly agree, nice post!
    If you like I wrote some posts in my blog (in spanish of course lol)
    Jonathan

  • http://www.optimalworks.net/ Craig Buckler

    Actually, even in the case of SELECTS, the speed difference between MyISAM and InnoDB is fairly negligible now (that’s not the case for older versions of MySQL though).

    As for 5, it’ll depend on what you’re doing. However, I think it’s rare that PHP will beat MySQL when it comes to processing data. If there’s one thing a database is good at, it’s that!

    • kyu

      Again it depends. The thing is that your db and application server scale differently. granted mysql will almost always be better at processing data, yet your app server might have more unused processing power dedicated to these tasks while your db requires more resources to work. overall it’s just about balancing the workload.

  • Lou

    Thanks for writing this article. I for one hope that the Firebird SQL projects gets better PHP PDO support. I only use Firebird and Postgres because they are 100% free, depending on the project. I hope that others will give these 2 great databases. The original MYSQL is now an Oracle product and those depending on a free mYSQL should really explore Firebird and Postgres.

    Peace Out!

    • http://www.reich-consulting.net/ coffee_ninja

      I 100% agree with you! I began using PostgreSQL years ago because it had better constraints and supported stored procedures in PG/SQL. Though I love MySQL, I never did quite understand how it’s mediocrity totally overshadowed the power of Postgres.

      • Anonymous

        That’s why I changed to PostgreSQL many years ago. Never regretted it.

    • Wolf_22

      What’s so great about Postgres?

      • http://www.reich-consulting.net/ coffee_ninja

        The ability to write some very complicated stored procedures is one major benefit. And the ability to have them triggered when a record is updated or deleted is especially beneficial.

    • bieltoni

      I have worked and DeSalle poyecto with open source databases, mysql, postgres, firebird and now my favorite is firebird. Firebird is a powerful, small (> 10 mb), light and fast.

  • Joseph Scott

    Casting development decisions in black and white can be very appealing. We like to think there is always just one right answer. Unfortunately this is almost never the case.

    1- I tend to phrase this as use InnoDB by default, unless there is a specific reason/need to use MyISAM. These reasons do exist, folks have already brought up full-text indexing, but that is only one. There are other conditions where using InnoDB is simply not the right answer. I don’t think those conditions are very common though, hence my suggested default of InnoDB.

    2- Using mysql instead of mysqli is not the end of the world. Also note that using prepared statements is not always a performance win. In many situations the opposite is actually true, aggressive use of prepared statements can actually degrade performance significantly.

    3- Agreed

    4- Agreed, the default in MySQL for a long time caught people off guard.

    5- Another “it depends” situation. Scaling DBs is generally harder than scaling out web servers. As such there are times where pushing out some work to the web server level is desirable because it becomes easier to scale out that work load.

    6- True, frequent use of EXPLAIN is a good thing.

    7- Agreed

    8- Another “depends” situation. Simply always calling it wrong ignores the need of the situation.

    9- Agreed

    10- I think we’ve done pretty good job at convincing most folks to run backups. What we haven’t done a good job on is convincing people to run test restores using those backups. A backup is only useful if you can actually use it to restore in the event of a failure.

    11- Generally agree

  • http://www.reich-consulting.net/ coffee_ninja

    An extension of#5, Favoring PHP over SQL is performing “n” queries within a PHP loop when a single query with a JOIN query will do.

    • gggeek

      Mostly agree. But sometimes the single-query result set will eat so much ram that splitting it up in a loop is in fact beneficial.

  • Paul

    I think you are wrong on some things…

    1. Using MyISAM rather than InnoDB: InnoDB its more secure but if you are a good coder you don’t need transactional DB because those are slow and memory hungry, thats why now exist BigTable, etc.

    8. Using * in SELECT queries: Using * its more eficient because the mysql core works less, you need to search benchmarks on these item.

    • http://www.optimalworks.net/ Craig Buckler

      If you’re a good coder, you’ll understand that it’s simply not possible to implement transaction-based database functionality in PHP!

      Using * is certainly not efficient if you’re fetching one column from a 101 column table!

      • eRadical

        I’m sure Paul didn’t meant “to implement transactions in PHP” but build DB & APP in such a way you do not need transactions…
        As Sheeri Cabral said “How many of you do you write COMMIT or ROLLBACK” in your web applications ? If you don’t… then you don’t actually need transactions.

      • http://www.optimalworks.net/ Craig Buckler

        Certainly transactions are not necessary everywhere, although I have built applications where all database activity is committed at the end of the page load so problems won’t pollute the database.

        However, data integrity is another matter. InnoDB supports foreign keys so you can ensure data is not deleted or cascade updates through the entire DB. For example, deleting a blog post automatically removes all associated data without a single line of PHP. See How to Use MySQL Foreign Keys for Quicker Database Development for more information…

      • http://www.lunadesign.org awasson

        I’m a bit confused about the resistance to your point regarding InnoDB…

        I suppose if I was building a simple commenting system that only used one table I wouldn’t see the value of InnoDB but relational databases are made for multiple tables that “relate” within an application and that often means that updates and deletes span several tables. It doesn’t make sense to me to try to duplicate a functionality using PHP that InnoDB does natively.

    • Gasolene

      “InnoDB its more secure but if you are a good coder you don’t need transactional DB”

      A serious programmer “good coder” would not recommend using a non ACID compliant storage engine (MyISAM).

      “I’m sure Paul didn’t meant “to implement transactions in PHP” but build DB & APP in such a way you do not need transactions…”

      You cannot design a system to “not need” transactions. InnoDB also enforces data integrity (or consistency). It is not possible to enforce consistency in the application layer.

  • gggeek

    About #7: using integer timestamps is the fastest way to make your SQL portable across databases, as they have wildly varying syntaxes for dealing with dates…

    • http://www.optimalworks.net/ Craig Buckler

      Possibly, but how often are you doing that? I’d suggest it’s more efficient to keep dates as dates, but translate them in your DB layer when it’s necessary to do so.

      • Frank

        Integers for date formats perform better, especially using the BETWEEN clause.

    • rblon

      I also prefer to use integers (unix timestamps) instead of datetimes, mainly because they make it much easier to convert between timezones

      • DukeOfMarshall

        I agree. I have found that it’s much more efficient to store the date/time as a Unix timestamp. It is a lot more powerful when you pull it back out to manipulate it. You do not have to convert the date to Unix and then manipulate the Unix to whatever you want to do with it.

  • Steve

    I would like to expand on #8.
    This should be a rule that new comers should sleep on. Never use the wildcard. When the time comes to add a new field to your tables, you will be in big trouble.

    S.

  • Anon

    #2 and #5 contradicts with your PDO suggestion.

    Best part is: #11 contradicts everything and effectively makes #1-10 a bunch of mistakes.

  • http://www.assemblysys.com/dataServices/index.php mniessen

    I’ve started using procedures as an integral part of my latest project and am very happy with them.
    For instance, a user uses a single form to capture/edit information that is stored in multiple tables. Everything is sent at once and the procedure takes cares of inserting each piece of data where it belongs.
    In such a case, the procedure offers the same data integrity a transaction would (like the power outage Craig mentioned). Supposedly (I have not run tests yet), network performance is also improved vs. multiple inserts/updates. It makes the database very portable, since the developer only has to call the procedure, no matter what the language used, without having to write the SQL.
    Finally, if the table structure changes, there is no need to make modifications to the PHP (or other) code unless more/less information must be stored; you just have to adapt the procedure to your new table organization.

    There might be drawbacks to procedures (I’d like to learn about those if they exist), but from my experience, when they can be used, they mostly have advantages.

    Regards,

    Michaël

  • loganathan

    Very Good Article. I like this post very much

  • Guillaume BRETOU

    “This should probably be #1: never trust user input.”
    I could not agree more !

  • Anonymous

    RE : 2. Using PHP’s mysql functions
    Why no mention of mysqlnd ? In fact, that is the best driver if you have it available.

    RE : 9. Under- or over-indexing
    You should NOT have an index anywhere the column is used in a select statement. Your example of indexing an email address is not beneficial. Youl should only use an index if the data in the field is repeated significantly. Since an email address in a table of users would almost never be repeated, indexing the email column not only does no good, it actually hurts performance.

  • http://www.deathshadow.com deathshadow60

    2, 3 and 11 really are tied together since they can all be solved by doing one simple thing… using PDO and not mySQL_ or mysqli_.

    MySQLi helps with 3 since prepared queries auto-sanitize values removing that entire headache from the equation, but PDO gives you that AND the ability to use any database you want.

    Increasingly in my own code I’ve been loading query strings the same way I do language strings — I store the prepeared queries in an array and load the appropriate array for whatever type of SQL the current set of code wants to run. This lets me target all the different SQL types off the same php codebase.

  • PCSpectra

    Some good points Craig, I’ve been on the fence about InnoDB and MyISAM for a while perhaps this article is what finally convinces me to make the switch. :)
    That being said, just to quote you and respond with my own understanding:

    When you’re new to MySQL, it’s tempting to solve problems in the language you know. That can lead to unnecessary and slower code. For example, rather than using MySQL’s native AVG() function, you use a PHP loop to calculate an average by summing all values in a record-set.

    This steps on the toes of good design practice, IMO. For one, this is promoting the implementation of possible business logic in the data layer, which has two side effects:
    1. Business logic is no longer centralized in the model but is now in a potentially distinct place (SQL query or SP’s) which is a cross-cutting concern and confusing to developers unfamiliar with the codebase and expecting exclusive responsibilities.
    2. SQL is much harder to unit test than plain PHP. So if you calculated a SUM or AVG in PHP you could verify that behavior using SimpleTest or PHPUnit, however if you calculate the AVG or SUM in SQL (while faster) it is not as easy to unit test.
    Other than that I agree with most of what you said.
    I would emphasize a bit more that PDO is a data access layer, not an abstraction layer like AdoDB. The SQL must still be written in a vendor compliant fashion. For instance if you write initially on MySQL and rely on LIMIT and later switch to MSSQL I don’t think that translates and would require refactoring the code. I think once your at this stage of development (where SQL portability becomes an issue) you are ready for an ORM, Data Mapper or similar device to automate the SQL writing process for you.
    Lastly, your comment in regards NOT using ‘*’
    While I agree it’s an optimization to specificy explicit fields, it comes at the cost of maintenance time. My table schemas change frequently, especially at first. Specifically indicating which fields are to be returned also promotes the use of aliases, which just further convolute PHP code and make learning a new code base all that much more difficult.
    Using the ‘*’ is less explicit but results in more dynamic code, IMO.
    Good article :)
    Cheers,
    Alex

    • Gasolene

      “Business logic is no longer centralized in the model but is now in a potentially distinct place (SQL query or SP’s) which is a cross-cutting concern and confusing to developers unfamiliar with the codebase and expecting exclusive responsibilities.”

      Interesting point, i would say this is a perception. I see an SQL statement executed in the model as part of the business layer providing the statement was defined in the model. I will agree that using stored procedures would blur the abstraction line. Stored procedures should be used in rare occasions only.

      “SQL is much harder to unit test than plain PHP”

      Using indepodent database fixtures easily allows for testing resultsets.
      I’ve never encountered any SQL query I couldn’t test.

      “(where SQL portability becomes an issue) you are ready for an ORM, Data Mapper or similar device to automate the SQL writing process for you”

      Fully agree, db abstraction layer is a vital component of a software app – even if there is no desire to change platforms.

  • Joe-Know-It-All

    You can always change the tables type to InnoDB

    No App is perfect and no man knows it all, but some tend to think they do. Learn C and create your own language. Who cares what these “I know what is best for you type of people”. Go back to your ivy school and eat some cake. People innovate by not doing what others tell them. Sometimes the wrong way leads to a new way. Pride is every know-it-all persons downfall.

  • justbn_me

    RE : 2. Using PHP’s mysql functions
    Why no mention of mysqlnd ? In fact, that is the best driver if you have it available.

    RE : 9. Under- or over-indexing
    You should NOT have an index anywhere the column is used in a select statement. Your example of indexing an email address is not beneficial. Youl should only use an index if the data in the field is repeated significantly. Since an email address in a table of users would almost never be repeated, indexing the email column not only does no good, it actually hurts performance.

  • MetalCat

    AVG in the database ? …….. I’d rather hold the logic in the code and use the database as just a store, as if that’s all it is then I can move to another store when I want/need to. Either for scaling or moving technology.

    Scaling in code with cheap web servers is a lot better and you run into less bottlenecks that tweaking a maxed out dB (even with 12 cores on SSD hardware).

  • jhoye

    #11 Not considering other databases…

    SQLite should be moved up on the list of database alternatives. A lot of PHP apps require exactly what SQLite offers, but it seems like it gets overlooked by many PHP developers. PHP has several extensions for SQLite, and there are some decent free GUI admin tools for it (SQLiteManager and even a Firefox plugin).

    It’s not a golden hammer, but it is under-rated as a database solution for PHP apps (those that get under 100K requests a day). Not having to connect to a database server can give a considerable boost in performance.

    • http://www.optimalworks.net/ Craig Buckler

      SQLite is a great DB and, while I think it’s viable for sites with low traffic and few ongoing INSERTs/UPDATEs, there’s always the risk your site could become more popular than you expected!

      For ease of deployment and cross-platform compatibility, it’s hard to beat though.

  • Adam Bolte

    I wonder how much longer people will be using MySQL? It looks like MariaDB’s going to replace it.

  • Ann

    I ain’t a db expert, but even wordpress uses MYIsam

    • http://www.optimalworks.net/ Craig Buckler

      WordPress has been around a long time and has a lot of legacy code. When it was released, InnoDB was newer, slower, less well known, and had less support on web hosts.

      However, I suspect some WP developers curse MyISAM today. Actions which could/should be handled by the database must be coded in PHP.

  • Justen

    All great tips. I can’t count the number of times I’ve seen programmers violate #5, and often with extremely bad code to boot. In one epic example a previous developer had used 14 separate queries (most of which required full table scans!) combined with a bunch of cumbersome array merging and sorting to come up with a single list of ids (primary keys) which were *then* used to create a final query (using ‘where ID = 1 or ID = 2 or ID = 3…’, *cry*). This work was taking about 3 minutes to pull a set of around 9k rows (average) out of a pair of 150k row tables. Ouch! I combined it all into a single query and made use of proper indexes and now the same job takes an unnoticeable amount of time (about 1-2 seconds). Seriously people, you’re not a LAMP developer until you know the “M” part.

    On number 11, don’t leave out the beautiful, scalable, schema-less, JSON-based, map-reducing wonder that is CouchDB! It’s out of beta now with a full 1.0 release and definitely stable and functional enough to be used at a production level. Give it a look :)

  • Justen

    Oh a followup on #7 – actually, to the contrary, serialized PHP objects can be pretty handy if the situation is right. You just need to know why you’re making that choice and what the implications are. Admittedly it doesn’t come up often but I’ve used it a handful of times.

    If it’s a lot of arbitrary data whose structure is going to change from row to row and it doesn’t make sense to introduce a new column for each property (e.g. because only a handful of the entries will share it) a serialized object makes a lot of sense. If you’re going to have a significant number of rows and need to be able to compare the block of data as a whole you can create a separate hash column with an md5 sum for better indexing (obviously this only lets you do strict equality checking).

    If at some point you need to be able to search some of the properties in the object you can extract the relevant properties from the serialized portion and add them as columns. Just write a quick one-off script to unserialize the data and reinsert it; it’s pretty trivial.

  • http://icoland.com/ glenngould

    For most cases, I find timestamps a lot more better than datetime fields. With timestamps you record server/timezone independent time data.

  • Amol

    nice:)

  • barney

    After the article and all – well, most – of the comments, I’m convinced that I should not have a PHP/MySQL (or any other database, for that matter) Web site until such time as I have acquired a Masters in IT with a minor in database design and utilization.
    Gee, thanks … I hadn’t known that … obviously my Web sites don’t/won’t work … really ‘preciat the effort you all have made to bring out my inadequacies and lack of ability. Only thing you missed was pointing out what an ignoramous such as myself could do to correct such failings.

    • http://www.lunadesign.org awasson

      Now, now… I think what Craig is trying to do is get people to look at what they’re doing and show some alternatives that will make their applications more efficient, secure and future proof.

      If you’re in this (software/web development) for the long haul and you’ve been putting together websites using questionable practices, now’s your opportunity to turn things around. I recognize some of the people who have weighed in on this discussion and IMO they’re some of the brighter stars on Sitepoint. There’s a lot of good information here.

  • Pelirrojo

    Hi Craig,

    I liked your article. Thanks for your work. Maybe I,d like to add some mistakes related with db user and permission management.

  • Arthur

    Another common mistake is opening too many mysql-connections without closing them. PHP document closes a mysql-connection as default, but not necessary. This mistake can lead to “too many connections” MySQL error, especially if you use a lot of “includes”. Usually it is common for programmers, who are NOT using OOP. If you are programming Object Oriented it is easy to forecome this problem by using singleton pattern at your database class.

    and comment on your point 1
    1. Using MyISAM rather than InnoDB
    unfortunately InnoDB does not support FULLTEXT indices, so sometimes it is not an option…

    • http://www.optimalworks.net/ Craig Buckler

      Thanks Arthur. Persistent connections are also a problem — I’ve never felt comfortable using them and tend not to.

      FULLTEXT has been mentioned a few times above, but I don’t feel it’s a good enough reason to adopt MyISAM over InnoDB. Data integrity is usually more important and it’s not as though MySQL’s FULLTEXT search is particularly good. But, it’s certainly something developers should consider.

      • harry

        I really dont get your attitude to myISAM.

        The truth is -the choice of table depends ENTIRELY on what your app is meant to do.

        Even the official mySQL development guidelines suggest this.

        You saying you know better than these guys?

        MyISAM is NOT that bad if you know what your doing – to suggest its a mistake to use them is a complete load of bollocks.

        The fact innoDB is often not considered during design is really the mistake here not the blanket use of MyISAM

        Secondly, transactions are not always required – or even desirable in many cases. If i had a text-searchable “contacts” table for instance, why the hell would i need all that overhead?

        Thirdly, there are features MyISAM has the innoDB dosn’t.
        If your handling money – you should use innoDB, if you need to search through bits of text – MyISAM is the only choice if your using MySQL – again it depends!

      • http://www.optimalworks.net/ Craig Buckler

        the choice of table depends ENTIRELY on what your app is meant to do.

        I agree … but you’ve chosen to use a database, right? Does it have two or more tables? Are those tables related in some way? If so, InnoDB is almost certainly a better choice even if you don’t need transactions.

        MyISAM is NOT that bad if you know what your doing

        I suggest you take a look at MySQL Transactions & Why They Can’t Be Emulated in PHP. If you know what you’re doing, you’re unlikely to be using MyISAM.

        transactions are not always required

        I’ve never said they were. Data integrity is another matter, though.

        there are features MyISAM has the innoDB dosn’t

        So that’s (a fairly poor) FULL TEXT search and what else?

        To say InnoDB is for handling money and MyISAM is for text is an incorrect generalization.

  • http://www.scribbleit.co.uk/blog KevPartner

    Of all these, 3 and 10 are head and shoulders more important than the others. Choosing the right engine, favouring PHP over SQL, data types, indexing etc are all important when creating busy applications where performance is critical.

    Not sanitising user input or backing up are absolutely fundamental – the first largely for security, the second for sanity!

    Fantastic article. Required reading for all new to PHP.

    Kev

  • rblon

    There are also features that MyISAM supports and InnoDB not, eg auto_increment on a secondary column in a multiple-column index… so wouldn’t say that InnoDB is always the preferred choice.

  • Herbalite

    On safety and speed (they are not mutually exclusive).

    These days using stored procedures or functions is IMO good habit too.
    Same goes for events and triggers.

    I know it’s easier to just hack a SQL query into PHP and be done with it.
    But on busy websites, using stored procedures is way faster, not even talking about the overhead of implementing events and triggers with PHP.

    The traffic between PHP and the SQL server is being reduced. Stored procedures/functions can (and on a busy site most likely will) be cached. It’s harder to use exploits with the data submitted. Incoming data still needs to be validated though. But that can be done with MySQL as well, and if I want to deal with malformed data on the database level I can do that too. And in case I have to move to another database, I only have to bother with adapting the stored procedure/function. The PHP to DB interface usually won’t change at all and if it has to change, it changes just very little. The gorier details of SQL semantics can be dealt within the stored procedure. It takes complexity out of a PHP script and lets the database deal with it’s own SQL semantic. Smaller scripts mean swifter execution of the PHP code too. Plus the PHP code becomes easier to maintain.

    On UTF-8. On new projects it’s great, but if you have legacy data to deal with, it’s probably best to avoid to convert that data to UTF-8. Even converting data that looks like ASCII, but has come from different client OS can create unexpected results. Non printable characters have thrown surprises for me in the past.

  • André

    Sometimes you should think of a NoSQL database. E.g. you have a large amount of traffic, it´s necessary to use a distributed database like Cassandra or HBase. Choose the right tool for your job! EBay for example dropped Oracle in 2003 and uses now a database that handles big server clusters better than Oracle did (it´s getting really expensive using Oracle). MySQL is always a good choice for rapid development, not the best for large projects. ;o)

    greetings André

  • Matt

    This is how I look at it:
    Use MyISAM if you’re going to be primarily _reading_ from the table. There’s no locking if you’re only reading, and MyISAM offers full text searching whereas InnoDB does not. So if you’re storing something like blog posts, then MyISAM is *definitely* a better choice by far.
    Use InnoDB if you are not simply reading from the table most of the time. InnoDB has row-level locking so your whole table isn’t locked when writing to one entry. Any table that contains data that will be modified often by numerous users should be InnoDB.
    You must use the right tool for the job, and blindly using one or the other exclusively is downright wrong.

    • http://www.optimalworks.net/ Craig Buckler

      I’d add that, as well as primarily reading, your tables should also contain unrelated data for MyISAM to be a good choice. That’s not likely in a relational database — even the simplest blogs have related tables of data for comments, authors, etc.

  • jkeppens

    Another very common mistake is thinking that int(1) takes less space than int(10) while both use up 4 bytes on disk. Always do your research to determine the min/max value and use the proper datatype (tinyint, smallint, etc)

  • Amit Yaron

    Another mistake: a user named “admin”.

  • SoftwareEgineer

    I’m baffled at the amount of people here who think foreign keys and transactions, and thus essentially data/referential integrity aren’t “all that important”.

    One of our customers used to think like that as well when they did their own software. This is the reason why we are rewriting all their software today.
    Datamigration was a real pain in the ass. Orphaned records, duplicate keys, …

    This customer’s business is delivery of electricity to people’s homes, including mine. Ever since I saw these databases, I’m literally amazed that my lights actually work when I flip the switch.

    Imho, any “programmer” who suggests that having NO foreign keys, NO primary keys, NO transactions or doesn’t see any problem with table locking should be fired instantly and stripped of his virtual license to code.

    Seriously.

    • http://www.optimalworks.net/ Craig Buckler

      Thanks SoftwareEgineer.

      I think part of the problem is the evolution of MySQL. MyISAM tables were there from the start and were easy to understand and code. There are many examples all over the web that developers copy, adopt and use successfully.

      Indexes, FKs and transactions are harder concepts to grasp — although they sound more complex than they really are. However, every good relational database design should use indexes and FKs. Once you’ve understood and used them, you’ll appreciate data integrity and find MyISAM far too restrictive.

      • slawek22

        The problem is that if you’re company that provides electricity you have money to scale and you need to keep the data normalized.

        If you run a forum there’s no problem with 2 or 3 orphaned records or 2 or 3 comments / year gone.

        That’s very “simple” thinking. You’re trying to say that every website on the internet is some kind of accounting / banking interface? And everyone have money to scale innoDB just to keep data maintenance and consistency on DB level? Ridiculous!

        I, on the other hand think that programmer that can’t match tools to the task should be instantly fired. If you say that innoDB is suitable for all tasks and myISAM has only flaws… it means that you’re not a good dev and don’t know the database you’re using.

        Each of three main mySQL engines (myISAM, innoDB and memory) have its own usage.

        Saying that foregin keys, transaction, etc. should always be there… without even knowing what for is not a good design… just fanatism and wasting resources :P

        I think that the problem are uneducated developers that use myISAM / innoDB for everything (no matter which one).

      • Gasolene

        “If you run a forum there’s no problem with 2 or 3 orphaned records or 2 or 3 comments / year gone”

        Designing a system to purposely allow orphaned records is absurd.

        “And everyone have money to scale innoDB just to keep data maintenance and consistency on DB level? Ridiculous!”

        InnoDB is far more cost effective to scale. If you build a system that self maintains consistency and durability, how would that cost more to scale or maintain?

        “I, on the other hand think that programmer that can’t match tools to the task should be instantly fired. ”

        I agree, any employee that does not design proper ACID compliant systems would not work for me.

        “Saying that foregin keys, transaction, etc. should always be there… without even knowing what for is not a good design… just fanatism and wasting resources :P”

        There is no acceptable reason to forgo data consistency. There are oodles of ACID compliant performance optimization techniques. Unstructured inconsistent unreliable data is not an acceptable solution.

      • slawek22

        Well… data maintenance is not a problem when data consistency isn’t a problem.

        Exact number of posts on the forum counter. Would you pay several thousands of dollars yearly for normalized data set to have that? Maybe just add another field in the DB and use it instead of count(*)?

        On the other hand banks need normalized data to handle transactions. World of Warcraft Forum != Bank of Canada.

        All noSQL speed is based on data inconsistency. Some solutions even provide inconsistent results by default (it’s called “eventually consistent”, where you’re supposed to get outdated results for some time after an update). You’re saying that everyone not using innoDB is just MAD? :)

        >There is no acceptable reason to forgo data consistency

        Saying that foreign checks should be there even when there are no foregin keys :) Thats beyond fanatic :)

        >Unstructured inconsistent unreliable data is not an acceptable solution.

        You’re saying that any caching is unacceptable solution? :) It’s just record => data. No need for FK-checks, transactions, structure. By definition it’s also unreliable (read memcache docs).

      • Bob

        Sitepoint’s own book teaches us to use MyISAM.

    • Joe boxer

      Agreed. I have seen messy mysql databases. One for instance, handles billing for phone calls handled by a popular telecom, won’t say which. Personally, I believe that any storage engine that doesn’t provide referential integrity is a such slightly better than csv or tsv files.
      Why bother with a database at all if you don’t need the ‘other stuff,’ is all I have to say to the developers. If you are so concerned with speed, why not use Sqlite? We used it for some applications where a dbms would be wasted and yet it provided. It also provides transactional support. If you really want the fastest data storage mechanism, there it is. Or you can move to NoSQL databases.

      • slawek22

        + noSQL doesn’t even provide data durability (writes can go to /dev/null ocassionally)
        + sqlite doesn’t scale writes AT ALL.

        Take mongoDB for example:
        + 128 bit minimum PRI KEY for each record (because of scalability it needs to be large). You’re wasting 12 bytes just to store a record comparing to mySQL INT, which is enough in most cases.
        + Then (another suprise)… record layout is stored in each field, so if you have X*number of chars of wasted space (eg. simple user details/priv table will add about 30 wasted bytes)
        + Everything in memory (with so much waste you’ll soon hit the RAM limit).

        If you’re comparing noSQL / sqlite to myIsam you obviously don’t know what you’re talking about.

        Sqlite fast? :) For single threaded desktop apps – yes, not for the web.

  • Joe boxer

    Not using stored procedures is one. Why have straight queries when you can use stored procedures. If you are only using mysql, it isn’t hard to have the logic in the stored procedure. This will decrease the amount of maintenance headaches. Unless you are using an ORM, then you’re not dealing with straight SQL. Just a point.

  • Igor

    Re: MyISAM vs. InnoDB.

    A person mentioned that WordPress uses MyISAM, and I expect it to be done by design. MyISAM is good for applications that requires more reads than writes. Another point that will make MyISAM, in the right circumstances, better than InnoDB is the fact that integrity check is pretty expensive and most of the times you don’t actually need it. Use MyISAM for fast reads, combined with lots of RAM, and InnoDB on the places that really need transactions and foreign keys.

    Re: Aggregation in SQL vs. code

    MySQL degrades a lot when you have queries with several joins and worse, if you need to aggregate the data, it often will create a temporary table either in memory or disk. In these situations it is better to move the join or aggregation to the code and it will perform much better (at the cost of more memory being used by the web server). My tip is: whenever you have more than a few simple joins and need to aggregate data, forget MySQL and do that in the code.

    • http://www.lunadesign.org awasson

      I don’t think WP chose MyISAM by design for performance or integrity but perhaps more because it was available on the largest array of servers. The same reason they’ve supported PHP4 for as long as they have. I can’t be sure but I’ve read some posts by Matt Mullenweg, the founding developer at WP and I think they used what was available so that the barrier to entry didn’t prevent people from embracing their system

  • SI G

    You actually wrote this and stand by it. Urgh! Sysadmins face this simplistic sh*t everyday when they have to explain why boxes are on the floor due to a moronic lack of knowledge of what second rate developers are doing.

    *bangs head on wall*

    Urgh!

    • Bob

      You disagree with all 10? Or what are you talking about? Post was useless.

  • Anonymous

    To add to common mistakes, think carefully before making date/time comparisons based on “now”.

    For example, select id from session_table where created_at > Now()

    MySQL comes with a wonderful querycache, but if you use the current datetime in a query, and use the now function from either language, your generating a timestamp down to the microsecond and so the data cannot be cached.

    If you put a little thought into the query, it is quite possible that you only need to know data based on the DATE, or perhaps the HOUR, or even MINUTE. So generate that datetime data yourself and only make it as precise as is logical so the query cache can save you time and cpu cycles.

  • FeN_X

    Maybe you should talk about mysql_real_escape_string() and intval() to sanityze user inputs.

    And programmers often use mysql_fetch_array() instead of mysql_fetch_assoc(), and don’t even know the differencies.

    ‘Not using UTF8′ ? is it really a programmer mistake ?

  • slawek22

    @Craig:
    InnoDB vs myisam.
    First is much faster for writes, the second excells in reads (BLOB’s especially)… there are differences up to 100-200% in speed in favour of myIsam for READS. It scales very well on multiple CPUs.

    Recent mysql builds you should check it out.
    Simple websites are not a banking system. In fact, most efficient forum engines (like myBB) use myIsam… and even keep the post count in separate columns, so the data isn’t even normalized.
    The data could be incorrect, but does it really matter that the post counter is off by 2 or 3 posts? When you can have forum with millions of posts on virtually any “garbage” server? Try it with innoDB and normalized data that is “100% correct” every time and “managed” on database level using transactions. Yes, you’ll have to scale. You’ll have to scale like hell. Is keeping post counter correct worth it?
    SELECT * FROM a_101_column_table. Well first of all i think that 101 coulmn tables are bad design. Second when you using column-lists you’re throwing mysql cache mechanism to trash.
    SELECT email FROM user WHERE id=10
    and
    SELECT privledges FROM user WHERE id=10
    are completely different queries for the query cache. What do you think is faster? Parsing the statement then issuing a query or just getting the result out of cache?
    How often do you add large BLOB column to finished database design? Files have to be stored on file servers PERIOD.
    Using field lists… in most cases you’re doing more harm than good.
    You should read: http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/
    You’re seeing only one part of the equation. In fact your advices can bring more bad than good. You’re always trading functionality and reliability for speed. If myISAM is so inferior to innoDB, and it has no advantages – why is it still here? Why noSQL is here? noSQL drops data consistency _and_ durability, you can’t even assume that data will be stored when you use popular nosql databases… and still there is some use for it :)
    Data durability in myisam is SUPERIOR to all noSQL databases invented. Yet even the biggest players use noSQL… why ?

    Beside myISAM is better for SSD disks (data isn’t stored by primary key order) and you can sort it independently of PK easily.

    • http://www.optimalworks.net/ Craig Buckler

      In summary, what you appear to be saying is that bad practices can sometimes lead to better performance. I agree. But we’re not writing “Call of Duty” where speed will always win over efficiency … we’re trying to build a decent web app. But you can still have speed without sacrificing good development techniques.

      As for keeping a post count, I see no problem with that. Use page caching – it doesn’t necessarily need to be at the DB level, though.

      With regard to SELECT * being faster because it’s applicable to all your queries on the same table, that’s only true if you’re running the same SQL statement again and again. Why would you do that?

      Finally, MyISAM is still here because it’s still in use. IE6 is still here too: that’s doesn’t make it the best choice of browser.

      • David

        IE6 has been upgraded and replaced, it’s legacy and a problem with no ongoing development. MyISAM is not dead, has not been replaced, and still offers many significant benefits over InnoDB in terms of speed, this is a terrible comparison.

        On large tables, with few or updates, it is ignorant to suffer 200-400% speed casualties for the sake of transactions and avoiding “bad practices” as you put.

  • MetalCat
    • Bob

      aka “My spam”

  • sylvaind

    I had a problem of performance with one of my extranet. it was solved with indexing 3 columns ! many thanks for that article.

    Sylvain

  • Me

    +1 for ‘Servers can explode.’ xD

  • Matan

    A must read article to new PHP developers.

    Thank you!

  • Nicolas BUI

    MyISAM is a very good engine and people still use it because it’s damn good and usefull. It really fast and require less resources.
    InnoDB is not made for the same use, it much more powerful on features but require more cpu and ram. And in common case, it’s slower than MyISAM.
    I use in my productions both solution for specific use.
    Developper should know and these differences, features and limitations of both to know how to properly use it.

    I do agree that you should never use * in a SELECT, it will help you debug your code when a column changes but yes I’m so damn lazy from time to time :)

  • David

    As a PHP/SQL developer for > 10 years, MOST of our tables are MyISAM. Any table that requires constant updates and is constantly changing, InnoDB is hands down the proper choice, however for the BULK of tables, for MOST websites, they are almost ENTIRELY SELECT statements, in which case MyISAM out performs hands down. On our production environments, running modern 2 x quad core systems, changing some of the primary MyISAM tables to InnoDB causes serious performance issues.

    If you’re new to MySQL, and you don’t know how to use databases efficiently, InnoDB may be a “safer” choice, but for actual production environments with traffic and where the amounts of SELECTS far outweighs INSERTS and UPDATES, InnoDB is a poorer choice. We implement several TEMPORARY tables (memory based) to handle constant tallies and minimal updates, and then it’s polled via cron to the appropriate tables. It’s more important to understand proper structure and database efficiency than to trash on MyISAM and withhold it’s usefulness. Transactions or not, however on large scale environments using MyISAM we have no known errors of a transaction just cutting out. Each page loads in a fraction of a second, if the power went out mid-update and it caused an issue, it would be essentially a statistical anomaly.

  • Cracker Koli

    thank you sir ,
    i have read u r article about “top 10 mysql mistakes”.
    i am intrested in php & mysql so how could i start development in php?

  • Naeem

    Thank you for giving a tips for mysql. its very use full for me..

    really its very help full tips…

  • Canyoun

    Debat here. As usual, the good part is in commentaries. ;)

  • McMurphy 510

    Great list…

    gotta disagree in a couple of points.

    On point 1… Whether to use InnoDB or MyISAM depends on what you’re doing. That said, I do almost always use InnoDB, but that’s due to the kind of work I do. There are times (more frequently than your article indicates) that MyISAM is the right choice. For a quick breakdown, start here: http://stackoverflow.com/questions/20148/myisam-versus-innodb

    On point 8… Again, I agree that MOST of the time you should spell out your columns. However, on frequently changing small tables, it’s perfectly acceptable and even proper to use *.