Migrate your site from MySQL to PostgreSQL -- Part 2

In June 2001, I switched my Website’s back-end database from MySQL to PostgreSQL. It only took me a day. Since that time, Postgres has worked flawlessly, confirming that I made the right choice.

In the first tutorial in this series, I explained how to convert data from MySQL to Postgres while avoiding potential pitfalls. I also explained the reasoning behind my move to Postgres. In this article, I’ll guide you through the other half of the conversion process and explain how to take existing MySQL PHP code and make it Postgres-friendly.

Preparation

Before you go muddling around in code, you need to take some preparation steps. The conversion process will vary depending on your coding skills and site complexity, but it will certainly be noticeable if you try to change things on a live site. To avoid annoying errors while you work on the site, make a copy of your site in a working area of your Web server. In my case, I set up a special subdomain for testing while I modified the code. Since the site automatically calculates URLs based upon some basic settings in a configuration file, this was easy, but you may have to make some changes to your code to make it work whether you choose to place the work-in-progress version in a subdomain, in a sub-folder of your site, on another Web host, or on a local development machine.

Remember: if you put the work-in-progress copy on another machine, make sure that the Web server is configured just like the live site, or you may run into trouble. If your Web host only allows Postgres connections from the live Web server, you may have to copy your Postgres data over to the development server as well. For more information on copying a Postgres database, see the documentation for pg_dump and pg_restore on www.postgrsql.org.

Once you have copied your site and have tested it to make sure it works normally, you’re ready to pull out the scalpel and start hacking away.

PHP Surgery: Hacking The Code

First, create a central configuration file if you don’t already have one. This makes it easy to change Postgres server settings within your code and have the changes instantly affect your entire site. Make sure that the configuration file is not in a Web-accessible spot on your Web server, which is a security risk. The default include directory for PHP is /usr/local/lib/php/, so you may want to have your Web host create a directory there for any configuration files you don’t want accessible from the Web. Also make sure it’s not readable by other people using your Web host’s services. If your site, like mine, includes a standard header file, you can include the Postgres settings file from there. It goes something like this:

<?php  

// /usr/local/lib/php/mysite/configfile.php  

$hostname = "localhost";  

$username = "username";  

$database = "mydb";  

$password = "mypasswd";  

?>  

<?php  

//Standard HTML Header  

include("mysite/configfile.php");  

?>  

  

<html>  

<head>  

<title>Bill's Kazoos</title>  

<head>  

<body>  

header continues...

A central repository for information like the previous example makes switching database systems a lot easier. Speaking of switching, it’s time to start editing files.

Connections and Queries

PHP’s MySQL functions are similar enough to the Postgres ones to make converting code easy but different enough to make it just a little quirky. In fact, you may want to write some special functions to make converting easier. But before you do that, here’s an overview of the connection and query differences:

Connecting to a MySQL database requires two commands:

$connection_id=mysql_connect($hostname, $username, $password);  

mysql_select_db($database, $connection_id);

Or if you use persistent connections:

$connection_id=mysql_pconnect($hostname, $username, $password);  

mysql_select_db($database, $connection_id);

However, PHP’s Postgres connect function takes a single string argument, unlike the MySQL function, which takes multiple string arguments. The Postgres functions also let you specify which database to use when you initially connect. Here’s an example:

$connection_id=pg_connect("host=$hostname dbname=$database user=$username   

password=$password");

The persistent connection function works exactly the same, except it requires a call to pg_pconnect().

PHP’s MySQL and Postgres query functions are a bit different as well. The MySQL query function is $result_data = mysql_query("query goes here", $connection_id);, but Postgres’s is reversed from that: $result_data = pg_exec($connection_id, "query goes here").

As you can see, the differences between PHP’s MySQL and Postgres support in connecting and querying aren’t great, but the differences in function arguments make the editing process slower. To speed things up, you can write some wrapper functions that take the same arguments as the MySQL functions but connect to Postgres instead. If you have a central library of utility functions, you can place them there. You could also put them in the configuration file explained previously, since it is automatically included into every page.

//connect to database  

function postg_connect($hostname, $username, $password, $database)  

{  

  return pg_connect("host=$hostname dbname=$database  

                     user=$username password=$password");  

}  

  

//alternately, if you use only one database  

//you can make best use of the variables in  

//your configuration file by doing the following,  

//which eliminates the need for any variable   

//passing at all.  

  

function postg_autoconnect()  

{  

  global $hostname, $username $password $database;  

  return pg_connect("host=$hostname dbname=$database  

                     user=$username password=$password");  

}  

  

//query-making function  

function postg_query($query, $connection_id)  

{  

  return pg_exec($connection_id, $query);  

}

Whether or not you use wrapper functions, converting the connection code and query calls is pretty simple. Postgres is mostly able to support the old SQL queries used under MySQL, but you might have to tweak your queries a bit. Since data models and code vary from site to site, I won’t explain things here. However, converting the SQL isn’t that hard. Convert the code first, then see what queries fail in Postgres. Next compare the MySQL Language Reference side by side with the PostgreSQL Users Guide. You probably won’t have to do more than just finding the Postgres equivalent of MySQL features, as Postgres supports all of the common MySQL features.

Now that you have made the connection and query code Postgres-friendly, it’s time to get your hands dirty. The differences between PHP’s MySQL and Postgres support place several hurdles to working with result sets that will require some more code tweaking.

Let’s See Those Results

PHP’s Postgres result set handlers can do everything their MySQL counterparts can; they’re just slightly different. These small differences may require nothing more than a little modification to code, but they might also require some involved programming.

First, look at what is similar between MySQL and Postgres. The following is a list of common MySQL result handling functions and their Postgres counterparts:

MySQL

mysql_num_rows($result) Returns the number of rows in a result set. This is only valid for SELECT STATEMENTS

mysql_affected_rows($result) Returns the number of affected rows in an INSERT, UPDATE, or DELETE SQL query

mysql_fetch_object($result) Fetches one row of data and returns it as an object. Field names can be accessed using class syntax. (as in $field1 = $var->field1;) This function keeps an internal value to make sure it returns the next row each time it is called.

mysql_fetch_row($result) This function returns a row of the result set as a numeric array. The values can be accessed using array syntax, starting with 0. (as in $field1 = $var[0]). An internal counter keeps track of the rows and passes a new row result each time it is called.

mysql_fetch_array($result) This one is identical to the other two fetching functions, except it returns an associative array with row results ($field1 = $var["field1"];).

Postgres

pg_numrows($result) Works exactly the same as its MySQL counterpart

pg_cmdtuples($result) Works just like its MySQL counterpart

pg_fetch_object($result, $row) Fetches a designated row of the result set. The $row value must be included, and there is no internal counter. Otherwise, it works just like the corresponding MySQL function.

pg_fetch_row($result, $row) Returns a numeric array of the result from the specified row. The row number <b>must</b> be included. Unlike its MySQL equivalent, no internal counter keeps track of row results.

pg_fetch_array($result, $row) Identical to its MySQL equivalent except for the requirement of specifying the row to return and the lack of an internal counter.

For more information on PHP’s MySQL and Postgres functions, see the PHP Documentation on PHP.Net.

The starkest difference between PHP’s MySQL and Postgres support is in the actual reading from result sets. Where MySQL automatically determines for you which row to return results from, Postgres requires you to specify the row you want to read from. This tends to get in the way of many common MySQL data-reading algorithms. Here are a few short examples of the kind of problem you might run into, as well as two ways to solve it.

//first common example:   

$rslt=mysql_query("SELECT * FROM blah", $connection_id);   

   

while($value=mysql_fetch_array($rslt))   

{   

  //do data handling stuff   

}   

   

//In the Postgres functions, the previous algo won't work,    

//since they require one to specify row number. The fix   

//goes like this (if you're not using the wrapper function   

//explained earlier):   

   

$rslt=pg_exec($connection_id, "SELECT * from blah");   

$limit=pg_numrows($rslt);   

for($rownum=0;$rownum<$limit;$rownum++)   

{   

  $value=pg_fetch_array($rslt, $rownum);   

  //do stuff here   

}

In the previous example, note that the code is a little longer for Postgres, since the row number is required. However, things become a lot easier if you write your own count-keeping wrapper function and use it in conjunction with the wrapper functions shown previously. Here’s the utility file with the wrapper functions added. Note the changes to postg_query() as well as the addition of three global variables and result-gathering functions.

   

<?php   

// /usr/local/lib/php/mysite/configfile.php   

$hostname = "localhost";   

$username = "username";   

$database = "mydb";   

$password = "mypasswd";   

   

//internal counting variables   

$fetch_array_counter=0;   

$fetch_object_counter=0;   

$fetch_row_counter=0;   

   

//wrapper functions to ease   

//transition to PostgreSQL   

   

//connect to database   

function postg_connect($hostname, $username, $password, $database)   

{   

  return pg_connect(host=$hostname, dbname=$database user=$username,    

password=$password");   

}   

//connect without having to pass values   

function postg_autoconnect()   

{   

  global $hostname, $username, $password, $database;   

  return pg_connect(host=$hostname, dbname=$database user=$username,    

password=$password");   

}   

   

//query-making function   

function postg_query($query, $connection_id)   

{   

  //set the globals to 0   

  global $fetch_array_counter, $fetch_row_counter, $fetch_object_counter;   

  $fetch_array_counter=$fetch_row_counter=$fetch_object_counter=0;   

  return pg_exec($connection_id, $query);   

}   

   

//pg_fetch_array() replacement   

function postg_fetch_array($rslt)   

{   

  global $fetch_array_counter;   

  $fetch_array_counter++; //add one to the counter   

  return pg_fetch_array($rslt, $fetch_array_counter);   

}   

   

//pg_fetch_row() replacement   

function postg_fetch_row($rslt)   

{   

  global $fetch_row_counter;   

  $fetch_row_counter++; //add one to the counter   

  return pg_fetch_row($rslt, $fetch_row_counter);   

}   

   

//pg_fetch_object() replacement   

function postg_fetch_object($rslt)   

{   

  global $fetch_object_counter;   

  $fetch_object_counter++; //add one to the counter   

  return pg_fetch_object($rslt, $fetch_object_counter);   

}   

?>

Of course, the previous functions will not work if you want to work with two result sets in the same loop, as there is just one internal counter per fetch type. If for some reason you need to read from more than one result set interchangeably, you’ll have to use the traditional Postgres method.

Another problem you may run into when changing your code is the absence of a Postgres equivalent of MySQL’s mysql_insert_id(), which returns the index value of the last INSERT query. The PHP documentation’s language may mislead one to think that pg_getlastoid() does the job, but that is not the case. The lack of such a function is actually not a downside, for it is a result of Postgres’s power in allowing multiple auto-incrementing fields through the SEQUENCE system.

Fortunately, getting the last ID is easy. Sequence information can be accessed through SQL, so the following replacement for mysql_insert_id() is possible:

function postg_insert_id($tablename, $fieldname)   

{   

  global connection_id;   

  $result=pg_exec($connection_id, "SELECT last_value FROM ${tablename}_   

${fieldname}_seq");   

  $seq_array=pg_fetch_row($result, 0);   

  return $seq_array[0];   

}

Because Postgres uses a special naming system to name sequences, the function I created above requires the tablename and fieldname. When called, the function above will retrieve the last sequence value used for any SERIAL fields you may have in your table, even if there are more than one.

By using the previous techniques, you should be able to get your MySQL site successfully running PostgreSQL. However, that’s just the first step; read on to see a list of useful PostgreSQL resources.

Resources for Further Reading

The first and foremost PostgreSQL resource can be found at the PostgreSQL Non-FAQ Documentation Site. This valuable resource provides an extensive list of books, quick references, tech guides, and even jobs. It also has a list of resources on changing back-end databases from MySQL to Postgres, as well as troubleshooting help and information on everything PostgreSQL.

Another good PostgreSQL-provided resource is the PostgreSQL Interactive Documentation, where you’ll find out anything there is to find out about using PostgreSQL.

Xach Beane, famous for his work on The GIMP, has also written a script to convert MySQL dumps to Postgres dumps. His script handles a few more things than covered in my article series, so check it out. Of course, it doesn’t handle things flawlessly, so you may have to use it warily.

Dobrica Pavlinusic has also written a script that handles MySQL to Postgres conversion. Note again that the source claims its own inability to handle some things, so you may have to just resort to hand editing as explained in my previous article.

A very thorough PostgreSQL & PHP tutorial can be found here. It starts at installation and goes on right through to explain from the beginning how to use PostgreSQL. This tutorial should be required reading for beginners.

Bruce Momjohan has written a book on PostgreSQL called PostgreSQL: Introduction and Concepts, published by Addison Wesley. You can even view it online!

Finally, OpenDocs is publishing Practical PostgreSQL in conjunction with O’Reilly. The book is due to be out in the beginning of October, 2001, but you can read it on the OpenDocs Linuxports.com site.

Moving your site’s back-end from MySQL to PostgreSQL is a wise choice. Converting your site takes time and dedication, but by following the proper steps, your site can rest upon a more robust database system with many more options for future expansion. Like me, you won’t regret it!

Happy Coding!

Frequently Asked Questions on Migrating from MySQL to PostgreSQL

What are the key differences between MySQL and PostgreSQL that I should be aware of before migration?

MySQL and PostgreSQL are both open-source relational database management systems, but they have some key differences. MySQL is known for its speed and ease of use, while PostgreSQL is highly extensible and compliant with SQL standards. PostgreSQL supports advanced data types and functions, such as arrays and hstore (for storing key-value pairs), which MySQL does not. Additionally, PostgreSQL supports transactional DDL, meaning you can roll back changes to database schemas if something goes wrong. Understanding these differences can help you anticipate potential challenges during the migration process.

How can I convert MySQL data types to PostgreSQL data types during migration?

During migration, you may need to convert MySQL data types to equivalent PostgreSQL data types. For instance, MySQL’s TINYINT can be converted to SMALLINT in PostgreSQL, and MySQL’s DATETIME can be converted to TIMESTAMP in PostgreSQL. Tools like pgloader can automate this process, but it’s important to review the conversions to ensure data integrity.

What tools can I use to migrate from MySQL to PostgreSQL?

There are several tools available for migrating from MySQL to PostgreSQL. Pgloader is a popular open-source tool that can load data from MySQL to PostgreSQL. Other tools include SQLines, which offers data transfer and schema conversion, and AWS Schema Conversion Tool, which can convert your MySQL database schema to a PostgreSQL-compatible schema.

How can I handle errors during the migration process?

During migration, you may encounter errors due to differences in SQL syntax or data types between MySQL and PostgreSQL. Most migration tools provide error logs that can help you identify and resolve these issues. It’s also a good practice to test the migration process with a subset of your data before fully migrating to identify potential issues early.

How can I optimize the performance of my PostgreSQL database after migration?

After migration, you can optimize your PostgreSQL database’s performance by tuning various parameters, such as shared_buffers and work_mem. You can also use the EXPLAIN command to understand how your queries are being executed and identify potential performance issues. Regular maintenance tasks, like vacuuming and analyzing your database, can also help maintain optimal performance.

How can I ensure data integrity during the migration process?

Ensuring data integrity during migration involves carefully mapping data types between MySQL and PostgreSQL, handling null values, and preserving primary and foreign keys. Tools like pgloader can help automate this process, but it’s important to thoroughly test your migrated data to ensure integrity.

Can I migrate stored procedures and functions from MySQL to PostgreSQL?

Yes, but this can be a complex process due to differences in how MySQL and PostgreSQL handle stored procedures and functions. You may need to manually rewrite your stored procedures and functions in PL/pgSQL, PostgreSQL’s procedural language.

How can I handle character set conversion during migration?

Character set conversion can be a challenge during migration, especially if your MySQL database uses a character set that is not supported by PostgreSQL. Tools like pgloader can help with character set conversion, but it’s important to verify the results to ensure that your data is correctly encoded.

Can I use PostgreSQL’s features like arrays and hstore after migration?

Yes, one of the benefits of migrating to PostgreSQL is the ability to use its advanced features like arrays and hstore. However, these features may not be directly compatible with your existing MySQL data, so you may need to modify your database schema or application code to take advantage of them.

How can I ensure that my application works correctly with PostgreSQL after migration?

After migration, it’s important to thoroughly test your application with your new PostgreSQL database. This includes testing all database-related functionality, checking for SQL syntax differences, and ensuring that your application correctly handles PostgreSQL’s features and data types.