MySQL compared with PostgreSQL

Tweet

PSNC have published a comparison of MySQL 4.1 and PostgreSQL 8.0.

If you’re interested to know what features you’re missing out on by using MySQL rather than PostgreSQL, the comparison table provides a handy summary. MySQL 5.0 (in development) is getting closer to PostgreSQL’s impressive feature set, with Views and Stored Procedures on the list. According to the article, however, MySQL still beats PostgreSQL in terms of performance. Details of a number of benchmarks are given.

MySQL has certainly made progress since I began using it, adding support for subqueries (in 4.1), full text searching with boolean, and transactions (InnoDB).

From the article:

MySQL is simplier than PostgreSQL but much faster. MySQL doesn’t support many advanced features that may be important in huge relational and complicated databases. But for JRA1 measurement architecture MySQL should be adequate (we probably wouldn’t use the most advanced features).

Currently, MySQL is a staple of the LAMP (Linux, Apache, MySQL, and PHP) architecture, pre-installed on many hosting accounts and bundled with many operating systems. This makes it very attractive to novices and intermediate web programmers, as it is relatively easy and cheap to get started playing around with it. With its expanding feature set and dual license, MySQL is looking to expand its usage in other market segments.

The article also contains link to more detailed comparison information, such as the feature comparison page at mysql.com.

Via Open Source Versus, check it out.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.tonymarston.net Tony Marston

    Rubbish. A good programmer validates all data BEFORE giving to the database. In that way you can generate more user-friendly errors messages instead of the cryptic stuff that most databases generate.

    If you expect the DBMS to do all this validation for you then I suggest you don’t give up your day job as you’re obviously not cut out to be a programmer.

  • http://simon.incutio.com/ Skunk

    Tony: Are you saying you’ve never forgotten to check the length of your data before sending it to mySQL? Do you have if (strlen($text)) > 65536) { die(‘Text too long’); } (or equivalent) in your application code for every place where you are inserting in to a MySQL TEXT column? If so, I applaud your diligence!

  • MiiJaySung

    @ Skunk…

    If you code well, you shouldn’t need to place

    if (strlen($text)) > 65536) { die(‘Text too long’); }

    in front of all your insert and update statements. I wise programmer will use some form of data access object (Active record, mappers etc etc) to centralise database operations and reduce the bad coding ‘smell’ of code duplication.

    I totally agree with Tony here. Too many many MySQL users seem to lack disapline these days, relying on nasty MySQL querks, magic quotes (which is the dumbest thing I’ve seen as it doesn’t even protect against injections because they don’t escape the test to ANSI SQL specs. Where the people at Zend drunk when they did this?).

    Dispite the occasional disagreements people have had with Tony here, there’s no denying he’s right here.

  • MiiJaySung

    Another point I would like to point out, is that the tests are very unfair. They play right into the hands of MySQL.

    MySQL is very much optimised for very small simple queries. All of the tests use very simple queries that only exercise an increadably small percentage of both DB’s capablities. All of the selects are very simple, and they only work a single table, making use of the odd order clause.

    No indexes are made on the tables it seems. This is quite unfair, as MySQL might have cached a lot of the data in memory compared to PostgreSQL as the results seems to have a massive margin in timings. I wonder how both DB’s where configured for this.

    It does not state what MySQL engine is used (assumes to be MyIASM, I wonder how this performs on other engines. Also, I would like to see performance under Windows (inno DB tables where hellishly slow with auto commit on under Windows for some reason when I used it about a year ago, though it was probably down to a config issue).

    I wonder how both work non locally from the client application. I wonder if more generates more network contention. In fact although both where used locally, where they setup to use TCP, or that other unic socket interface? I would guess local TCP would incur more overhead than the local socket that MySQL defaults to.

    Was a persistent connection used? PostgreSQL probably would have more overhead with creating connections as it uses the OS’s authorisation system, which means a load of IPC operations. MySQL probably has a far lower connection overhead due to it’s internal authorisation mechanism.

    The current site I have written using LAMP is pretty fast, but I am curious if Postgre would have offered more speed for the search system as I have large queries (sometimes several KB big SELECTS if a lot of criteria are used). Maybe stored procs would have speed things up in PostgreSQL.

    This benchmark is far from accurate imo, even for something that is ment to used a basic summary, as it is biased.

  • tombr

    MySQL shouldn’t do the INSERT on a varchar(255) if there are more than 255 chars. If it’s the program job to validate all data, what’s the point in having a column type varchar(255) in the first place?

  • cholmon

    “Complaining that MySQL doesn’t produce the error messages you expect when you give it bad data is just plain nonsense.”

    Are you saying that it’s more sensible for MySQL to keep its mouth shut when you try to stuff data where it isn’t supposed to go?

    In Simon’s first post, he didn’t say that he failed to receive the error message he expected, but rather he failed to receive ANY error message. Expecting an error message when domain integrity is violated is perfectly reasonable. In fact, NOT expecting an error message would qualify someone as a “duffer of the first degree.” Regardless of whether or not the application code was written sensibly, the RDBMS’s single most important job is enforcing data integrity.

    “Any programmer who doesn’t check the data (especially that which is input by an human) BEFORE sending it to the database is a complete duffer of the first degree. You do NOT use the database to validate user input – that’s the program’s (and the programmer’s) job.”

    What if the data didn’t come from a human? What if the data came from another table in the database? It is entirely possible for a (competent) programmer to write a stored procedure that copies data from one table into another table, only to have the destination table changed at a later date. If one of the columns is changed from a TEXT to a VARCHAR(255), it’s very likely that some data will be truncated when the stored procedure runs. I would prefer that my RDBMS refuse to perform such an operation. If I was fine with the truncation, I could simply add a SUBSTRING() into the stored procedure and perform the truncation explicitly.

    Simon’s original complaint had nothing to do with application layer validation, and had everything to do with enforcing integrity.

  • Rod Taylor

    I got to the first line where it mentions operating system. I know for a fact the lead developers for PostgreSQL do a good chunk of development using OSX and I’ve run PostgreSQL on both AIX and Solaris.

    PostgreSQL release notes also include HP-UX and Tru64.

    If the author of the comparison list got that wrong (or heavily slanted) how much weight should I put behind the rest of the list? Not going to read it.

  • Gabriel Ricard

    They also grossly understate the platforms that PostgreSQL supports. In addition to Linux and Windows, PostgreSQL supports Mac OS X, BSD, AIX, HPUX, IRIX, QNX, SCO, Solaris, Novell Netware, OS/2 (eComStation), and probably more.

    http://www.postgresql.org/docs/faqs.FAQ.html#1.3

    Aside from that, the comparison does not seem to be well executed. Why not find a bunch of excellent DBAs, a few for MySQL and a few for PostgreSQL, have a third party develop some real world tests, and then have the DBAs setup, tune and run the tests and THEN see what the result is?

  • Rod K

    Tony, do you suppose you could actually debate the points rather than devolve to personal attacks?

    I never said that the app shouldn’t validate (in fact I made a special point of saying so), only that the DB MUST validate to insure data integrity. If YOU expect the app to do the validation necessary to maintain data integrity, than I suggest you stick to programming and stay away from DB design and administration.

  • Craig R

    I have to agree with Rod K on this. Besides, you can’t validate ALL the data before updating the DB, that would be a big waste of time. Since my experience is with Oracle, I catch DB errors in PHP and reformat the error messages into user-friendly messages. That way I let DB do the main validating (but not ALL of it) and still handle any errors in a user-friendly way.

  • http://www.tonymarston.net Tony Marston

    MySQL shouldn’t do the INSERT on a varchar(255) if there are more than 255 chars. If it’s the program job to validate all data, what’s the point in having a column type varchar(255) in the first place?

    Your code knows (or should do) that a particular column is varchar(255) so it validates it accordingly. If the column size is exceeded you then have a choice:
    (a) Generate an error message and abandon the insert/update.
    (b) Truncate the data, continue with the insert/update, but generate a warning message to the user.
    (c) Same as (b) but without the warning message.

    MySQL’s behaviour is well documented, so if you don’t like it then stop whinging and write your own code. If you are a competent programmer you won’t have to duplicate the code for every column in your database, you will write in once in a centalised validation module.

  • http://www.tonymarston.net Tony Marston

    “Complaining that MySQL doesn’t produce the error messages you expect when you give it bad data is just plain nonsense.”

    Are you saying that it’s more sensible for MySQL to keep its mouth shut when you try to stuff data where it isn’t supposed to go?

    In Simon’s first post, he didn’t say that he failed to receive the error message he expected, but rather he failed to receive ANY error message.

    MySQL behaves as documented. It clearly says that if you give a column a value that is not quite perfect then it does what it can to fit in in and continue without aborting.

    Expecting the opposite behaviour to what is clearly documented, and then complaining about it is not very clever. If you do the right thing and validate your data BEFORE giving it to the database then there will be no surprises.

    This reminds me of the lame brain I came across years ago who complained that, after designing a table for VAT rates, each of which had a start_date and end_date, the database did not automaticaly return the one with the current date. He did not realise that you had to specify what was wanted explicitly, he assumed that the database would do what he wanted automatically.

    I repeat, don’t rely on the database to do your work for you.

  • http://www.tonymarston.net Tony Marston

    If YOU expect the app to do the validation necessary to maintain data integrity, than I suggest you stick to programming and stay away from DB design and administration.

    Yes, I *do* expect the application to validate all data before it is passed to the database. That’s how it was 25+ years ago when I started programming, and I see no reason to change. My systems work because I design both the database(s) and the applications. I know the capabilities of both, so I know what needs to be done where, and why.

  • MiiJaySung

    I still agree with Tony here, MySQL documents it’s behavoiur, querky as it may seem to some.

    You should not assume the database will do certain things as Tony states. The database vendor is free to implement behaviour as they see fit. I don’t know what the ANSI standards have documented, but this is probably one of those areas where there is no rule, and therefore it is programmer’s responsablity. In C++ there are many querks / behaviours that are specific to certain compilers (e.g. the order in which expressions are evaluated for method parameters) that are not defined in the C++ standards. C++ programmers don’t tripe on like people here saying it’s the fault of the compiler when it doesn’t behave as expected on another platform / compiler. They accept that it’s something that has undefined behaviour because it’s not defined in the standards.

    Certainly, if you write apps which may migrate from DB to DB, you can not assume anything about the DB

  • Rod K

    Tony, now we’re getting somewhere. I don’t work in an environment where I can guarantee that ONLY my apps will ever access the DB.

    Your point about triggers is well taken except that the problem isn’t triggers themselves, but with designers who abuse them.

  • cholmon

    MySQL behaves as documented. It clearly says that if you give a column a value that is not quite perfect then it does what it can to fit in in and continue without aborting.

    Nobody has claimed that this behavior is undocumented. Simon’s original complaint, and my personal complaint, is with the behavior itself. It doesn’t matter if MySQL clearly documents the fact that a truncation will occur; the simple fact that the truncation occurs by default, with or without a warning, is a poor way for any RDBMS to handle a violation of domain integrity.

    If you do the right thing and validate your data BEFORE giving it to the database then there will be no surprises.

    Please re-read the second half of my last comment. It is possible for domain integrity to be compromised AFTER the data has been validated by the application. It is the job of the database to do everything in its power to, among other things, make sure that data is NEVER inadvertently lost.

    You are correct that MySQL behaves as documented…that is what needs to change.

  • http://www.tonymarston.net Tony Marston

    It is possible for domain integrity to be compromised AFTER the data has been validated by the application.

    Then the validation performed by the application would be poorly written. That would be the fault of the programmer, not the database engine.

    You are correct that MySQL behaves as documented…that is what needs to change.

    That behaviour will become configurable in a future version of MySQL. Refer to section 5.3.2 Server Modes at http://dev.mysql.com/doc/mysql/en/server-sql-mode.html.

  • http://simon.incutio.com/ Skunk

    “MySQL’s behaviour is well documented, so if you don’t like it then stop whinging and write your own code.”

    I stopped whinging and switched to PostgreSQL ;)

  • http://joseph.randomnetworks.com/ josephscott

    First off, this comparison that they link to is full of problems. SitePoint isn’t helping anyone by linking to it, this just adds to the confusion. I’ve already detailed some of the problems so I’m not going to rehash them here again. To make things worse, now PHP Developer has linked to the comparison.

    Next, lets address this issue of documented ways MySQL will corrupt your data. I think that both sides of this are right to some degree. Tony is right that you should check your data at higher level to provide meaningful warnings and errors to your users (or other applications in the case of an API). Doing this validation though does not give the database layer the freedom to simply alter the data I’ve provided. Databases are for storing the data that it was provided, not some of the data, not all of the data plus random garbage, simply the data it was provided.

    If the database is unable to properly store the data I provided it then it needs to through an error. Doing anything else is simply unacceptable, it doesn’t matter if it is documented. It is time to lower the pride level a notch or two and simply fix this bug and move one. Stop trying to come up with excuses as to why the database should be free to store something besides the data that I give it.

    As for making this behavior configurable in future versions, I don’t like this idea. I think in the long run it will be better to simply to the right thing and move on. Don’t let this bug drag on any longer than it needs to. Most MySQL based applications still can’t use sub-selects because there are so many MySQL 3.x installs still around.

  • cholmon

    Then the validation performed by the application would be poorly written.

    As I said in my first response, it is possible for this to occur completely inside the database. Let me present that scenario in a more detailed manner.

    • DBA Bob creates two tables, t1 and t2. t1 has a TEXT column and t2 has a TEXT column.
    • DBA Bob creates a stored procedure that performs an INSERT … SELECT from the TEXT column in t1 into the TEXT column of t2.
    • Developer Jim creates an application that properly validates all data that is inserted into the TEXT column of t1.
    • DBA Bob runs his stored procedure and values are properly copied from t1 into t2.
    • DBA Bob gets hit by a car.
    • DBA Sue is hired and decides to change the column type of t2’s TEXT column to be a VARCHAR(255).
    • DBA Sue runs the stored procedure. Data is silently copied from t1’s TEXT column into t2’s VARCHAR(255), some of which is truncated.

    As you can see, the integrity violation occurs completely outside the realm of control of the developer. His application code is still 100% sound. From his perspective, the data is as valid as it can possibly be. It is Sue’s fault that the data was truncated, but that is irrelevant. The stored procedure should have failed when it tried copying the data from the TEXT to the VARCHAR. Yes, Sue is a bad DBA, but does that excuse the RDBMS from enforcing integrity? No.

    That would be the fault of the programmer, not the database engine.

    Fault has nothing to do with this. I will reiterate It is the job of the database to do everything in its power to, among other things, make sure that data is NEVER inadvertently lost. This is called enforcing integrity, whether it be referential or domain. It does not matter if an application is poorly written. If the database allows integrity to be violated, it is poorly written/configured.

    That behaviour will become configurable in a future version of MySQL. Refer to section 5.3.2 Server Modes at http://dev.mysql.com/doc/mysql/en/server-sql-mode.html.

    I am aware that this behavior (STRICT_ALL_TABLES) is available with the current MySQL beta, 5.0.4, but the article did not make a comparison with that version, it made its comparison with MySQL 4.1. While this complaint regarding enforcing integrity will be addressed by MySQL 5, the complaint is still perfectly justified as it illustrates a very important difference between the versions of MySQL and PostgreSQL that PSNC used in their benchmarks.

  • andre

    i think it’s unfair to say that mysql silently corrupts data because it accepts strings that are too long and truncates them…after all, that is a documented behavior. let’s just say that it’s mysql’s way of dealing with bad data :P

    but i agree here…you don’t like mysql’s quirks? then switch to postgresql or firebird or some other database (maybe even sqlite! :P )

  • willthiswork

    While this complaint regarding enforcing…

    By the way did anyone read or feel the comparison was saying that MySQL is better than PostgreSQL?

  • MiiJaySung

    By the way did anyone read or feel the comparison was saying that MySQL is better than PostgreSQL?

    Ditto! Exactly. That’s what I was getting at because it didn’t test features both DB’s have, but instead tested unrealistic scenarios. I mean what sort of real application is going to have a table with a few columns holding 20000 rows with no indexes and just be issuing very basic select/insert/update statements?

  • bgallie

    Another point on the decussion of validating data. The data must be validated by both the application and the database engine. For example, assume an incident tracking system where incidents are assigned to people (users who exist in another table). The application builds a drop-down list from entries in the user table. Between the time that the form was presented to the user and the time the user submitted the form for processing, the user with an ID of 10 was deleted from the database. The application user selected the user with an ID of 10 to assigne the incident to and submitted the form.

    From the applications point of view is the data contained in the assigned to field valid? In my opinion, it is. The data to build the list was obtained from the database and the application user could only select entries from the list, but the data in the user table had changed between the time the drop-down list was created and the form was submitted (which the application would not know about).

    The only way that the incorrect data can be detected is to:
    1. Have the application query the database to determine that the data from the drop-down list is still valid
    or
    2. Rely on the database engine to catch the referential integrity error that would occur.

    Using option 1 is not very efficent and prone to errors. Every item from a drop-down list created from a database table would have to be checked by the application by making a query to the database before the application can update the database with the data entered on the form. Every form that used the drop-down list would have to have a duplicate of the code to validate that the value was good. People are only human and do make mistakes. It would be a simple oversite to miss the check that could lead to an obscure error condition. Also, you are coding into your application functionality that the database engine should be very good at doing.

    Option 2 pushs the referential integrety checks to the database where is should be handled. I know that Oracle, PostgrSQL, DB2, and Ingres (to name a few) are very good at handling referential integrity. Having the database enforce referential integrity and domain validation also ensure that updates made outside the application (ie. in the interactive SQL tool) does not inject data that doesn’t conform to the validation rules (data too large for the column it’s inserted in, or data that’s outside the range of valid values).

    That being said, the application should validate data for things it can easily check (a field that is to contain a number actually contains numeric data, a date field contains a valid date, etc.) and let the database engine check for domain and referential integrity errors. If your database engine can’t perform those validations, maybe it’s time to change the database engine you use.

  • jackchung

    Hi, I’m a newbie Postgres user. Can someone point out or link to optimization and tuning techniques for Postgres performance?

  • bcyde

    jackhung, check out http://www.powerpostgresql.com and http://www.varlena.com/GeneralBits/ for some optimization and tuning techniques for PostgreSQL.

  • Itshim

    I also feel it is unfair to say ‘MySQL will silently corrupt data’, because that is simply not the case. When inserting a value that exceeds the column’s maximum length MySQL issues a warning, telling the user that the data has been truncated. I understand that this warning was not generated for single row inserts prior to version 4.1.1, but again this is documented behavior.

    No database is prefect, I’m sure situations can be found in almost every db where, we as programmers, would consider the db silently corrupting data. The same can be said about PostgresSQL < = 7.3 and inserting into a DATE column. PostgresSQL will switch month and day values in date strings 'YYYY-MM-DD' if the month value is > 12, assuming the values to be ‘YYYY-DD-MM’.

    So what I’ve learned from this discussion is: RTFM. And as for the comparison chart, I think both dbs are quality products and anything you read on the internet should be taken with a grain of salt. That is not to say SitePoint shouldn’t link to the information.

  • http://www.boringguys.com/ boringguy

    By the way did anyone read or feel the comparison was saying that MySQL is better than PostgreSQL?

    No, but you wouldn’t get that from the way it was written. There was an incredible bias towards postgres. Just look at the first row of the comparison table… it says postgres runs on “Windows, more than 2 dozen
    Unix-like operating systems…” while for mysql it simply says “Linux, Windows
    FreeBSD, MacOS X, Solaris, HP UX, AIX, and other.” Why didn’t mysql get that pumped language “more than 2 dozen.” And why is windows mentioned first for postgres when until very recently, there wasn’t even a stand-alone windows installer?

  • Etnu

    “Many apps can connect to a database.”

    They CAN, but they rarely DO. In the vast majority of databases, only one application is used to access the data. In rare cases, there might be a desktop application and a web based one, but ultimately they’ll still go through the same server application.

    I’m firmly of the opinion that the database should be largely used for storage and retrieval of data, not application logic. There’s nothing “wrong” with having validation in the db as well (unless it’s performance critical and largely unnecessary), but you’re an idiot if you let data go from application to db without validating it – period. If I were the person in charge of QA and you submitted me code that didn’t validate input, you’d either be re-writing it or looking for a new job.

    Simply put, PostgreSQL is the best option right now for free database products when you need heavy transaction support and data integrity is critical, and mySQL is the best when you’re dealing with largely performance-driven sites that do a ton of reading from a db and have a single / limited point of updating.

    If you’re choosing a database for the checkout area of ebay, for example, postgres would probably be your best choice – but for item listings, mysql would make more sense.

    It isn’t an apples to apples comparison at this point. mySQL focused on a database that is fast and can easily handle huge amounts of data (witness FULLTEXT indexes), along with trying to be as close as possible to the SQL standard. postgres focused on ACID compliance. There is nothing wrong with either of these options, they’re just different ways of doing things. If you’re dumb enough to write a CMS in postgres and then complain about performance, it’s your own fault.

    Worry about comparing when mysql 5 comes out. Then focus on the only relevant comparison: MySQL /InnoDB vs. Postgres. Don’t compare MyISAM, as postgres simply won’t come close (unless you do something terribly stupid with your mysql configuration) for performance here.

    Only an anal DBA with absolutely zero business knowledge would refuse to use a database because it doesn’t enforce referential integrity on data (note: this applies to about 99% of all web sites) that changes rarely.

    It seems like people are complaining about rare-case scenerios that truly will not matter in a real environment. Stop worrying about the theoretical. Does it REALLY matter if mysql truncates a text field to the size you specified it at? Of the millions of lines of code I’ve written in my life, I can’t think of a single scenerio that this would result in a bigger problem than a document’s title being truncated.

    “Besides, you can’t validate ALL the data before updating the DB, that would be a big waste of time.”

    You very rarely need to validate “all” data.

    The only data that should ever need to be validated is that coming from user input. If your application is generating invalid data, FIRE YOUR PROGRAMMER (or quit, if this is you). It is not the database’s job to check for division by zero.

  • thebigfoo

    When do the guys @ mysql get in their heads that it would be nice to add typing to their DB Tables etc like postgres has, and not to try to get a solution to this problem via marketing expeses. LOL.

    @ Itshim: Its not unfair, its simply correct: put in a get b is corruption of data. And yes its documented, but anyway i have to fix the db with my PHP code that has to check what the db has to do.

    taking lots of data: LOL, someone did already experience mysql and 2 gigs of data using transactions.
    Do a dump and get it back to your db .. and then tell me again it is taking lots of data.
    haha …

    p.s sorry for the pseudo .. needs 2 be.

  • http://doitslower.com/ lartexpert

    I agree mostly with Tony on this one – your application code should validate the data before even thinking about sending it to the database, and report any errors/warnings to the user to allow them to change their input.

    The example involving two developers and a TEXT type column is somewhat specious too. A competent DBA/programmer would review the reasons *why* the field was TEXT and not CHAR(255) before changing it, for precisely this reason. You wouldn’t automatically truncate all numerical variables to integers in code you didn’t know and understand, woudl you?

    I agree that getting errors from the database is a good thing – inserting 300 characters into a 200 character field is something that should not happen, and it would be right for the database to complain that your data is bad. It would also mean that your code is bad too.

    Validation in itself is not difficult to do – it’s just a question of making sure that you have a validation class / library to check everything that needs checking, the same way, every time. I do that in my code – do you do it in yours?

  • Itshim

    @ thebigfoo:

    I understand that inserting ‘a’ and getting back ‘b’ is corruption of data, my problem is saying MySQL does it ‘silently’. By default MySQL issues a warning, and if you want it to issue an error you can configure it to do so.

  • ce

    I am stupid and want a hint.

    How would you validate in the application something like

    UPDATE tbl_name SET field_name = concat(field_name, ‘hei’);

    don’t tell me a way with LOCKing the table!

  • C

    I cannot believe that nowadays still exist this kind of discussion. MySQL is made for a different goal than PostgreSQL! you cannot compare PHP with PERL and say that PERL is ‘silent’, of course! to be a practical scripting language it has to have default values and default behaviors. I think we miss the point.

  • photo312

    I am with Tony. Each and every programmer should check the data before sending it to the database.

    Its a matter of preference I guess to have MySql cut the excess input and complain or not complain. Bottom line is – at least now we are all aware that the database should notify the administrator about the overabundance of data being sent.

    I am a very disciplined programmer and I apply the “defensive” programming strategy. I am totally with Tony on this.

    Paul

  • Rambug

    What the hell are you saying here ? Any changed made to the data you expect to have in your database should throw an error. Most of the validation should be done by the applications but data integrity must be handle by the database server. I don’t understand how programmers with many years of experience would agree on having the database server to modify the data you are sending without letting you know. THIS IS BAD DESIGN at the same level as you would build an application that doesn’t validate the input data by the users. Many of you say that applications you develop should validate all but you say that the the database server ( which is an application) should not. If I understand well Tony would not build an application without validation so if he would build a database serer he would do it the same way. That’s my comment.

  • http://doitslower.com/ lartexpert

    I think that programs should always take the worst-case view of all data inputted from whatever source. ALL user input should be validated, and modified/rejected as appropriate for your application.

    Also, as far as handling the database server goes, if you make assumptions about things like silent truncation of field values, then how well will your code work if you switch database platform?

    Pumping data into MySQL regardless of what it contains is fine for skript-kiddie forum sites, but if you want something that will work and you can leave running unattended on a client’s server, you need something a bit mroe grown-up.

  • Rambug

    By your answer, you are telling me that MySQL is not well designed because it does not validate the input. If I send a string with 50 character into a varchar(25), I should receive an error and MySQL MUST not accept it. All the answers are good and tell me that this server is not well designed. Like some of you said : good programs made by good programmer should validate all the input. That’s also what I think.

  • Anonymous

    Validation has to be done in the database, and CAN be done on the application level. however, the database MUST STOP you from inserting invalid values.

    And even more, you can’t check some stuff on the application level: suppose a query like this:
    update news set title=’COPY:’ || title where id=1234;

    It’s the job of the database to validate the data it gets. And MySQL fails miserable here:
    mysql> create table test (x date);
    Query OK, 0 rows affected (0.00 sec)

    mysql> desc test
    -> ;
    +——-+——+——+—–+———+——-+
    | Field | Type | Null | Key | Default | Extra |
    +——-+——+——+—–+———+——-+
    | x | date | YES | | NULL | |
    +——-+——+——+—–+———+——-+
    1 row in set (0.00 sec)

    mysql> insert into test (x) values (‘this-is-bullshit’);
    Query OK, 1 row affected (0.00 sec)

    mysql> select * from test;
    +————+
    | x |
    +————+
    | 0000-00-00 |
    +————+
    1 row in set (0.01 sec)

  • Jeff Triplett

    I hate flame wars and x vs. y debates but after looking through the response times on PostgreSQL something looks amiss. I’ve worked with both and after tweaking, I’ve rarely seen a case where PostgreSQL wasn’t faster or the two were neck in terms of performance. The single commit transactions faired better but it still looks like there are some other issues or bottlenecks.

    None the less, the results are interesting. I ran into similiar problems with PostgreSQL with a fresh install before tuning both PostgreSQL and my OS until they hummed. It seems hard in an article to say where the line is of how far to go to get the optimal results.

  • http://simon.incutio.com/ Skunk

    Does MySQL still silently corrupt your data? That’s the reason I switched to PostgreSQL Here’s a classic example: create a VARCHAR 255 field, then insert into it a string that is 300 characters long. Last time I tried this in MySQL, it truncated the data I inserted to 255 characters but didn’t tell me it had done so – the INSERT “worked” just fine. Tha’t just Broken As Designed.

  • http://www.sitepoint.com/ mmj

    Yes, I believe that is still the case. I noticed it in the MySQL docs a while ago.

  • http://simon.incutio.com/ Skunk

    On this page truncating strings comes under the “warnings” header. I think you can get it to throw an error rather than running the query using the server SQL modes feature, which has been significantly upgrade in recent versions.

    So it looks like you /can/ get MySQL to be as respectful of your data as PostgreSQL, but it takes quite a bit of configuring. I wonder if it still performs better once you’ve turned the missing features.

  • Chris

    Mysql performs better ‘out-of-the-box’. Postgresql needs some tuning to make it perform.

    Without appropriate tuning (of both sides) it’s an uneven comparison.

  • Itshim

    Skunk, you are correct:
    [quote=MySQL Manual]
    If you assign a value to a CHAR or VARCHAR column that exceeds the column’s maximum length, the value is truncated to fit. If the truncated characters are not spaces, a warning is generated. You can cause an error to occur rather than an warning by using “strict” SQL mode.
    [/quote]
    I’ve never tried this, I check the length in PHP first, before sending it to MySQL, but it is quite painful.

  • Itshim
  • http://www.lopsica.com BerislavLopac

    http://sql-info.de/mysql/gotchas.html lists quite a few MySQL “gotchas”, as they call them. There is also a similar, albeit much shorter list for PostgreSQL.

  • http://www.tonymarston.net Tony Marston

    Complaining that MySQL doesn’t produce the error messages you expect when you give it bad data is just plain nonsense. Any programmer who doesn’t check the data (especially that which is input by an human) BEFORE sending it to the database is a complete duffer of the first degree. You do NOT use the database to validate user input – that’s the program’s (and the programmer’s) job.

  • PaulG

    Sorry to say this, but a good database will have the functions to do some of the basic validation for you too, thus clearing your php tier.

    I use only mysql, I only read this in an article, but I havent forgotten it – because thats where I want to go..

  • Rod K

    Tony, you are wrong that one shouldn’t complain that the DB doesn’t validate. Yes, you should validate in the app, but the DB is ultimately responsible for the cleanliness of the data. Having the app have the final say in data integrity is a common misconception. Many apps can connect to a database. Having each responsible for integrity is not only wasteful in resources, it’s dangerous to assume each app will be 100% vigilant. How would you like your app to stop working because another app developer corrupted the DB?

    Also, if the validation rules change, you only need change them in the DB, not in multiple apps.

  • Db Power Ranger

    Hum, what about a web app. Let’s say a hacker returns your html page without any of the javascript validation that you included in your page. He can then send anything to your database.

    If the database validates nothing or accept data that it should not, don’t you think this is not a good thing? And you can be certain that someone will figure out a way around your protections and security no matter how much you think its good.

    The best way is to have layered security. Sure you validate the input from the user in your app, but you do not trust that validation one bit when you insert the data into the database. You redo a basic validation and then you insert. Anyone bypassing your app would have to include valid data into the database.

  • JiffyPop

    If you are using JavaScript (or the maxlength on text fields) as the only means of validation then may god have mercy on your soul.

  • Masticate

    Redundant layers of integrity is essential to software development. I find it absurd (even appalling) that some of you “software developers” are content with relying solely on the application layer (and of course, you shouldn’t rely solely on the database either). This lack of redundancy has caused numerous problems in many cases in the past; e.g. Therac-25.

    Let’s not be arrogant. We all like to believe we are outstanding developers, and some of us are. But in the end we are human (fallible); we cannot always build a perfect system – especially if the complexity large.

  • Anonymous

    I perfer MySQL coz the GUI tools developed for it is more mature and powerful.

    MySQL administrator (http://www.mysql.com)
    Navicat MySQL front end (http://www.navicat.com)