These days everyone’s building database driven Websites. A few years ago this was the main luxury of search engines, portals and the sites of large companies like Microsoft. But now, as the popularity of MySQL grows, database driven sites have become mainstream. MySQL offers many features, it’s fast and it’s free. However, what happens when you reach the limit of this software and have tried every little optimization and tweak?
Do you give up and resist growth? No, you migrate. MySQL isn’t the only database system out there. You actually have many different systems to choose from – systems that have their own strengths and weaknesses, and which offer a number of advanced features that aren’t available in MySQL. These features include Stored Procedures, Triggers, Atomic Updates, and Foreign Key Integrity, and they’re the kinds of features that provide the degree of scalability required for quick searches on search engines and other large sites.
Not all of these solutions are costly – some are even free. And the more expensive solutions like Oracle and Microsoft SQL Server can be provided by hosting companies, which also brings down the cost and complexity of their maintenance. That’s one great advantage of MySQL though: because of its simplicity anyone can maintain it. Many of these other servers require the attention of a full time, trained DataBase Administrator (DBA), but by using a hosted solution, you can lower this cost significantly.
A Little SQL Background
SQL (ess-que-el) is a standard for data storage. SQL is an acronym for Structured Query Language, and this language can be used to access data from databases and file systems. SQL is based on another similar language called Sequel. This has created some confusion in the development community, as many believe that SQL is a simple shortening of the older language’s name.
Sequel was created in the mid-1970s by researchers at an IBM facility in the area of San Jose, California. This work was based on a paper published by Dr. E. F. Codd in June 1970 called "A Relational Model of Data for Large Shared Databanks".
IBM further developed a Research Database and API called by the same name of Sequel. By the late 1970s this work had progressed into a product called SQL/DS RDBMS. This new system was based on what would become the SQL standard, and was envisioned to make data access universal.
When SQL/DS was announced, many competitors rushed similar products to market using the same SQL standard, and IBM was beaten to market by an unknown company. A small company called Relational Software Inc. was the first to launch – and once its software became successful, the company changed its name to Oracle Corporation. To this day, Oracle has dominated the RDBMS market with its products. But more about Oracle later.
Even though each of these products is based around the SQL standard, each implements certain features differently, and as such, some things are very different from package to package (sound familiar?). They all cover the basics of the DDL (Data Definition Language) and DML (Data Manipulation Language). However, the main point at which they differ is in their optimization features and the DCL (Data Control Language). This is good news for you, because the basic terminology is the same: your basic queries and table markup will be almost the same across the different servers (most people don’t use the DCL subset very much in everyday development).
Common Terms and Features
Let me explain what some of these features are and how they can improve your performance.
- Stored Procedures – These are bits of code that are used to modify the results from a query, or to modify data in a table. Unlike a query in your script, these are stored in a compiled format within the database itself, which allows for faster execution and response times. Each of these databases comes with a procedural language outside SQL that allows you to write and run stored procedures. This saves time over a normal query, as the RDBMS doesn’t have to figure out how to run the query, it just has to execute it. It also enables you to add features and fix bugs with little or no change to the front end code.
- Triggers – These are specialized Stored Procedures. Triggers are functions that are automatically run by the database server in response to events you define, such as adding or modifying records in a table.
- Atomic Updates – These are commonly called "Transactions". What this allows you to do is to collect a series of queries into one group so that they’re treated as a single entity. This way, if one query fails or is cancelled, the rest are reversed – which prevents partial transactions from being entered into your system. Atomic Updates are necessary for OnLine Transaction Processing, and in the maintenance of both your business’s, and the customer’s financial security.
- Foreign Keys – These enforce the relations in a relational database. Foreign Keys allow you to make sure that the data is valid without having to double-check against code, or worry about duplicate entries. These keys are enforced using the existing indices and table querying methods in the server, which makes them very fast and dependable.
- Views – These are special tables. With a View you can create the appearance of a table using standard SQL queries. This allows you to do joins across tables without incurring a performance hit when indices are checked and rows matched. A View maintains all these matches for you so they’re available when you need them. Views are also good for the provision of subsets of data to different people. For instance, Views can allow you to make sure that your support staff never see a customer’s credit card information – as far as the View is concerned, that information simply doesn’t exist.
And now, on to the solutions!
Postgre brings some unique features to the world of SQL databases. It allows the developer to create relational objects to base the database on. These objects allow for inheritance, which, alone, is a fairly advanced feature in a database server. Postgre also includes other features that MySQL lacks.
These include triggers, stored procedures and atomic updates. Postgre is based on code developed by the University of Berkeley in California, and is distributed under the GNU Public License. PostGre SQL is distributed free of charge under the GPL license. The server runs on Linux or Unix machines without modifications, and the clients will also run on Windows without modification.
Postgre continues to gain acceptance on the Internet development front. This is because it offers so many of the features provided by other, more expensive solutions, but is still easy to use, and was released under the GNU Public License. This gives you a lot of power for little or no cost. Many sites and companies have migrated to Postgre because of the limitations imposed by MySQL, and the high cost of systems produced by the more established providers.
Postgre SQL is the only open source database server that supports procedural language constructs like stored procedures and triggers. It uses a language called PL/pgSQL for these commands. PL/pgSQL is formatted and structured like the PL/SQL code used in Oracle’s database server. I’m sure that this is no mistake – and it presents a developer with a good upgrade path involving minimal conversion efforts if you outgrow Postgre.
One thing I really like about Postgre is its level of documentation, which is similar to that often found with the more commercial servers. Each different function, i.e. programmer or developer, is available in a separate Adobe Acrobat file, which makes the location of functions easier.
Overall, if you need an open source solution and you’ve outgrown MySQL, you should look at Postgre as an alternative. It provides the features needed for a small enterprise solution, a stable engine, and active development.
This is IBM’s entry into the high-level database arena. They call DB2 the Universal Database because it is supposed to run on all platforms. It even runs on mainframe hardware under systems like OS/390. I personally haven’t used DB2 but list it here as an option you might consider. The newer versions come complete with IBM’s Websphere software, ADO and ODBC drivers, an Interface to connect to a database from the Webserver, and a custom module that allows you to connect to PERL code through PERL’s DBI module.
IBM just released version 7 of this suite, which is popular on IBM hardware, and often comes pre-installed on the larger machines. Coupled with their powerful Websphere rapid development toolset and the robust features of Lotus Notes, DB2 can provide a balanced, integrated platform to meet most companies’ data handling needs.
With this system IBM aims to provide a single solution that is all things to all people. This version includes features such as XML data, Multimedia file handling, Web driven output, Java support and more.
A little cup of Java
As with many database servers available today, IBM has integrated Java capabilities into its DB2 product, which allows the developer to create new and unique features, and to extend the DB2 library using common, everyday tools. DB2 utilizes two different Java technologies, Java Database Connectivity (JDBC) and embedded SQL for Java (SQLJ). Both can be used to create Java applications and applets that access DB2.
JDBC accesses information in the DB2 database by accessing the command line interface of the database. This can result in slower accesses than you can achieve with stored procedures or triggers using SQLJ. SQLJ uses the JDBC classes to connect to the database, but holds compiled queries in its byte code to allow for faster planning and execution.
As I said before, DB2 is advertised as being able to run on every platform. For the most part this is true, but some modules are only available for certain platforms. DB2 comes with a complicated pricing and licensing scheme, where IBM calculate the charge based on which modules are used. While this does allow you to pick and choose what you need, if you’re looking for a complete package, you might end up spending more than you can afford.
In my opinion DB2 needs some improvement to be a "universal database solution". However, if you’re using other IBM software (such as Websphere and Notes) that’s equipped to handle fast access to the DB2 server, then you should take advantage of that, and use this database.
Sybase Adaptive Server
According to the Sybase Website:
Sybase Adaptive ServerÂ® Enterprise is designed for those who know what they need – - an elegant solution for managing data in transaction-intensive, mission-critical enterprise environments.
If you had to describe Sybase in one sentence, that’s a good one to use. Their products are the best for accomplishing those tasks for a small to medium business (of 1000-5000 users). Sybase ASE is the database on which I learned SQL, and I still think it’s a good solution. The newest production release is version 12.0 of the Database server, and its accompanying application suite has new features that better integrate it into the eBusiness world. A new version 12.5 is also in Beta testing right now.
The core of Sybase’s programming offering is a procedural language called Transact SQL. This is a full programming language with almost as many options as PHP or Java. It allows the developer to create entire database driven applications without ever leaving the SQL environment.
Within this language you can manipulate the data of a query before the program ever returns the results to your script — which translates into faster processing and easier formatting. Imagine being able to apply a template at the database level. Imagine determining the honorific (i.e. Mr., Mrs., or Dr.) for a name and automatically adding it to the name even if the honorific isn’t stored in the database. Imagine comparing words for a search engine without concern for spelling.
You can do all of this in Transact SQL before the results are ever turned over to the script, and as they’re compiled code, it’s executed much faster, and without any latency.
Sybase provides a full range of features that include stored procedures and triggers, C++ and Java integration, Identity fields (similar to MySQL’s auto increment), Atomic Updates and Foreign Keys. Sybase runs on Windows NT, Linux and Unix. You can download several versions for free use on Redhat Linux, which is a benefit to the Internet development community. The new beta of Sybase includes some exciting features for the world of Internet development. Three of the most interesting of these are:
- XML Queries – Sybase introduced XML storage classes in ASE with version 12. Continuing this integration in ASE 12.5 provides programmers with the tools and capability to create SQL-like queries using XQL, a language extension to SQL. This allows the database to handle data directly in XML form. Content can be published on the Web, is more easily changed over time, and can be customized for a specific user or device, making this a good tool for the ebusinesses arsenal. The data can be stored in XML format and ASE can access it directly for quick publication.
- SQLJ (including Java Stored Procedures) – Java has been at the forefront lately in the provision of access to server based applications. Sybase has been capitalizing on this, and working towards the standardization of Java-Relational Database technology. This has resulted in SQLJ, which combines the power and rapid development capabilities of Java with the data management capabilities of a database manage server such as Sybase ASE.
- Support for External File Systems – Most published content is not relational, and is stored outside the database itself. Most large files like images are stored in the Operating Systems File System directly. This leads to problems in the control and management of files, as anyone who looks in the typical Windows directory will agree. Sybase ASE 12.5 allows developers to control this data and still store it in the file system. They can create proxy tables that map the file locations and other information, which allows them to maniputlate these files using normal SQL commands to access, create, delete and write files.
Overall, Sybase is a good system with a lot of features. It will support a large user base and has the technologies to allow your site to grow as large as you need it to. For more information on Sybase, visit their Website and read the different case studies, or download the documentation in PDF format. Just be warned — the documentation on Transact SQL alone is over 1500 pages printed.
Microsoft SQL Server
SQL Server is the cornerstone of Microsoft’s Enterprise data strategy. Technologies from it are used in everything from the Exchange to .NET architectures. It’s the preferred database for many companies that use other Microsoft Software because it’s all designed to fit together.
You’ll most frequently find SQL Server offered by those hosting companies that use Windows Operating Systems for their servers. SQL Server is not cheap, but it will provide you with a scalable architecture for any kind of data. Microsoft’s SQL Server started life as a licensed copy of Sybase ASE ported to Windows NT, and both systems are still very similar, as it was only recently that Microsoft started to make large modifications to the underlying engine.
SQL Server is designed to be scalable. You can share one database over multiple machines and maintain data integrity using replication. Each machine can support up to 32 processors and 64 gigabytes of RAM for the ultimate in processing power. Like Sybase, SQL Server is Web-enabled, allows direct data access through queries from HTTP, and can output data in XML format.
SQL Server also comes with its own procedural language, which, thanks to its Sybase Roots, is based on Transact SQL. Microsoft simply shortened the name to T-SQL. T-SQL allows the developer to rapidly create enterprise level applications with full Internet capabilities. The SQL Server client application contains a full debugger to make the development process easier. With the debugger you can control and monitor your application as it runs, to create the best optimizations in your queries and other code.
Some of SQL Server’s Strengths
Full Text Search – SQL Server 2000′s full text search capabilities give you the power to search not only character-based fields in tables rapidly, but to also search text files stored on the hard drive, formatted files such as Word Documents, and more.
.NET Integration – This is a .NET server, and as such it can be integrated with any other capable server, and will exchange information directly. This allows for the rapid development of online applications, from communities to online stores.
Clickstream Analysis – This functionality allows you to data-mine the information captured every time a visitor clicks on a Web page within your site. You can find out where they came from, where they’re going, and plot this over time to generate valid predictions about your traffic. You can also determine if your customers buy more red socks on Wednesday or Thursday, and what else they buy to go with those socks. This gives you powerful information that will allow you to streamline your business and data.
If you use a Windows Server for your Website, or need to access Windows Services using ActiveX, then SQL Server is your obvious choice. SQL Server provides a strong, scalable platform for use on NT and Windows 2000. It’s also offered by most Windows NT Hosting Companies either free or for a small monthly fee, which gives you the ability to use a high-end database server for little or no cost.
Oracle was the first commercial relational database server sold on the open market. Today it’s still considered to be the number one database server in terms of power and features. Oracle is used by most Fotune 500 companies as their preferred method of data storage. This prestige and power comes at a price though: Oracle is also the most expensive solution available. The newest version of the database software is 9i (the i stands for Internet). Oracle ships a fully standard Internet Web server with their database packages. This server can access data directly from the database as if it were a normal HTML page.
The Oracle Database system can access files up to 1 terabyte in size, and can support tens of thousands of users. It’s a breeze to develop on, with its fully capable PL/SQL language, strong support for stored procedures and triggers, security control, and many other features. However this also comes at a price — a large one. Oracle is the most expensive and resource-intensive database server listed here. The installation and maintenance of an Oracle database is not for the faint of heart or inexperienced, either. That said, Oracle will provide the power to support the largest global enterprise and handle tens of thousands of users and transactions from around the world.
Oracle gets most of its advantage from its speed. This speed comes from the fact that once it runs something like a query or stored procedure, that information is kept in memory in a compile format for an indefinite period. When a user logs in and accesses a table, this information is also cached in memory and only updated when a transaction is committed to permanent disk storage. This gives the database super fast access because the hard disk access — the slowest past of the process — is eliminated.
Oracle ships for every Operating System from Windows NT, to the largest mainframes, including Linux and Unix. Their installation package and many of the utilities that come with Oracle Database are written in Java, which allows them to use the same code on every installation, and makes the software easier to support.
Besides its memory-caching features, Oracle supports full integrations with PERL, PHP, C++, Java, TCL/TK, and many other languages. It also has a powerful sequencing utility for generating
unique ids, and its internal programming language is unmatched.
The programming language inside Oracle’s stored procedures and triggers is simply called PL/SQL (Procedural Language/SQL). It has all the capabilities of any 3rd generation programming language like C++, and allows the developer to complete complex queries and routines, including Dynamic SQL queries built on the results of previous queries.
Oracle has many features, however three that I find interesting are its Gateway Services, Failover and Transaction Logs.
Gateway Services – This is a series of applications that allows Oracle to access data from any source, including the different databases described in this article. By using the Gateway, Oracle can access data as if it was stored directly in its own tables. This allows you to access data as you would in its native system.
Failover – You can design your applications so that they’re always connected to a database with failover and replication. Using failover, you can have a duplicate copy of your database provided by replication, which is accessible if the first database goes offline. This enables continuous 24/7 access to your most important asset: the company’s data. Once the primary database goes offline or a query fails, the application will automatically failover. This requires no special programming in the application or script, just a few more lines in one of the systems initialization files.
Transaction Logs – These are meant to maintain your data, adn they contain snapshots of each and every transaction. By maintaining these logs between backups you can restore all your data in the case of data corruption. These logs are applied sequentially and restore everything that happened between the backup and the corruption, so they provide a good failsafe for your data.
Oracle is the number one product when it comes to the RDBMS world. It is the one that others compete with and try to surpass. Each time, though, they fail. Oracle provides power, speed and efficiency unlike any other system, however, as was stated before, these benefits come at a large price. Oracle is also currently the most expensive computer application anywhere in the world, but if your data depends on the utmost integrity and security, then it’s the application for you.
As your site evolves and matures, it may outgrow the programs and services that you have come to rely on. This is a natural part of a site’s or business’s growth and development. MySQL is a great database wiht which to learn, and start your online business. However, if you find that you spend an increasing amount of time maintaining the database and its data than you do running the business, then it’s time to look for a new solution.
Each of the systems outlined here present viable upgrade paths from MySQL. If you must use an open source solution, then you have a choice of MySQL or Postgre SQL — and Postgre will allow your business to go further. If you can afford a commercial solution, I’d have to recommend Sybase and Oracle for Unix/Linux based Systems. If you’re on a Windows NT/2000/XP server, then use SQL Server 2000, as you’ll gain the greatest benefits from this solution.
Before I go, here’s a list of database resources from around the Internet that will help you develop your next site in the High-End database system of your choice.
DBForums: This site provides discussion on many different SQL based database systems. It includes all the servers mentioned in this article, and more.
Here are three online magazines that will allow you to keep up to date on the latest happenings in the RDBMS market, and help you make informed decisions:
Other downloads include:
- Postgre SQL – http://www.postgresql.org
- IBM DB2 – http://www-4.ibm.com/software/data/db2/udb/downloads.html
- Sybase ASE – http://www.sybase.com/detail_list/1,3691,2282,00.html
- Microsoft SQL Server – http://www.microsoft.com/sql/downloads/default.asp
- Oracle 9i – http://otn.oracle.com/software/products/oracle9i/content.html