Database Function Load

Ren · February 9, 2011, 7:24pm

Depends how trivial the application is. The more trivial the easier it is to switch RDBMs. The whole point of picking a RDBMs over another in the first place is for its unique features. Limiting yourself to some low common denominator of features doesn’t make sense.

webaddictz · February 10, 2011, 10:19am

I have. I’ve moved a lot of logic to the database, although unlike most people in this topic, I use PostgreSQL instead of MySQL, which is ever so slightly more sophisticated in stored functions. My database now performs most of the “intelligent functionality”: logging in users, generating lists, doing CRUD, etc.

This decision has had benefits: first of all, it’s trivial to change the database scheme without having to fix your code, which is a big plus. Your application code becomes easier to write and read, as it is no longer cluttered with either 80+ classes of an ORM which obscures my queries, or with a lot of SQL that you have to read before understanding what is actually happening.

Another benefit is added security. Granted, this point is moot, as you can develop applications that are secure with PHP itself, but I find it easier to deal with that in the database. When a user is logged in, that session on the database gets a customer id, which my views and functions then take into account. That means it’s impossible for a customer to ever see something that doesn’t strictly belong to him, instead of doing that check in either every query from PHP, or having another if statement in PHP.

Also, I tend to write web-applications, but also iPhone, iPad and Android apps. This means I’d have to write the business logic in Java, PHP, and Objective-C if I weren’t moving it to the database, or at the very least behind a common API. The common API is now my database, so creating a webservice to do exactly the same as the web application can do is extremely simple and straightforward.

Furthermore, there is a clean split between responsibilities. I can spend a day writing stored functions, while my colleague writes the front-end and the PHP code that interfaces with the database. I can write stubs so he can make sure he’s calling the correct functions. My colleague can mock out the entire database, and only assert he’s calling the correct function with the correct parameters: beyond that, there’s nothing left to test on the PHP side!

Those of you who’ve mentioned a lack of ability to test, please take a look at pgtap, a unit testing framework for PostgreSQL. Yes, you can actually stick that in your deployment and your database will be tested separately, which also means that if you get a failure somewhere, you know exactly where it is, because the boundary is clearer than ever before.

Those of you who state that a database should be used solely to store data and ensure the correct existence of relationships, allow me to quote Martin Fowler, when he’s talking about Table Module, on that one:

Databases are more than a key-value store, if it’s slightly more advanced than Berkeley. PostgreSQL is, and so is MySQL. The database can do pretty much all you can imagine: I’ve even seen an Oracle database that would parse an e-mail template and send the e-mail through SMTP. One of my buddies had a PHP script to connect to a database and call a select query with the URL, and rest was done by the database. Dynamic pages, a blog, comments, a forum. Yes, all in the database. Now, I’m not saying that’s such a good idea per se, but at least it demonstrates the possibilities.

It has downsides though. While I state above that the boundary is clear, from my point of view, the database shouldn’t do everything: sending e-mails being one of them. However, part of the logic in creating a user could be to send an e-mail to verify the e-mail address. In those cases, the boundary becomes unclear, although this is easily solved: I store the users’ information, I let the database generate a token, I let PHP retrieve said token, and PHP sends it. Later, when the user clicks on the link with the token, the database will verify and (if all is well) activate the user. Still, the boundary is unclear in a few cases.

Another downside is SQL. Developers tend to hate SQL. I don’t know why, as it’s a versatile and pretty easy to learn language, but if you’re not “into SQL”, working with this strategy becomes rather complex. Then again, when you put your logic in the database, all the developers that hate SQL should know is what function to call and with what parameters they should call it. The rest can be done by people who like SQL. I have my own company, so I do have to admit: once the SQL gurus leave, you’re up for some trouble as it’s easier to find people that really know PHP than it is to find people who really know SQL. It’s a business risk, but nothing that can’t be overcome in my opinion.

Your milage may vary, but all in all I’m glad I’ve made the move. Taking away the need for an ORM made it all worthwhile.

webaddictz · February 10, 2011, 10:42am

Sorry for following up my own post, but I just saw this comment and I have to reply. In my opinion, trying to write SQL that all RDBMS’es support usually isn’t the best of ideas. It can be done, but it leaves you with many headaches and you can’t take advantage of any of the “cool stuff” that some RDBMS’es provide. I actually write and host my applications, so for me, just writing SQL that PostgreSQL can swallow is sufficient.

If you’re writing distributed applications however, having portable SQL is a nightmare. Simply put, you have three options:

Try to write everything in ANSI SQL. Downside: you can’t use cool features that save you time, and have to write everything in “pure” SQL.
Write many SQL files, one for each RDBMS. Downside: more maintanance and it’s hard to keep track of changes.
Leave the abstraction to another layer (ORM). Downside: ORM is the Vietnam of computer science.

I’ve only ever made one transition: from MySQL to PostgreSQL. As I haven’t used the “stored functions for almost everything” before I moved to PostgreSQL, this was actually easy, apart from the NOT NULL columns where MySQL somehow had allowed NULL, and the “0000-00-00” date issues The main problem lies with applications that are distributed and should offer a possibility for supporting multiple RDBMS’es.

TomB · February 10, 2011, 10:51am

I’m just playing devils advocate here…

Also, I tend to write web-applications, but also iPhone, iPad and Android apps. This means I’d have to write the business logic in Java, PHP, and Objective-C if I weren’t moving it to the database, or at the very least behind a common API. The common API is now my database, so creating a webservice to do exactly the same as the web application can do is extremely simple and straightforward.

If you’re allowing multiple sets of client code direct access to the DB I can see the benefit entirely… but if you’re going through a webservice anyway, how does that help?

This decision has had benefits: first of all, it’s trivial to change the database scheme without having to fix your code, which is a big plus.

I’m not sure I agree with this… say I add a coulmn to a table. With an ad-hoc query, I update the query in the application code and I’m done. If It’s in a stored procedure, I need to update the query in the stored procedure, the parameter list for the stored procedure and the application code to send the new parameter.

I’ll have to look into pgtap, do you know of anything similar for mysql? On a similar note, how do you handle version control?

webaddictz · February 10, 2011, 12:31pm

That makes the world interesting, so no worries!

Well, you don’t have to duplicate the logic. Agreed, you could obviously write an application that extracts the logic out to a Domain Model, and you’d be off with the same. Nonetheless, extracting it to the database does the same thing and keeps things simpler, at least, for me. Then, I’m not afraid of SQL

When you add a column to your table, you’re screwed either way. If you’ve ever tried to accomplish this, regardless of where you’ve placed your business logic, it’s hard to do this with minimal effort. If you’re using ad hoc queries, you’ll have to update those. I don’t agree that you’re off with updating just one query though: you’ll likely have one that can update the table, one that inserts into the table, a few that select from that table, etc. If you’re using MVC, you’ll have to update your view as well, by the way.

If you’re using stored functions, you’ll have to add a parameter, and you’ll have to update all queries that call the stored function. The point I was trying to get across is that by changing the underlying structure of your database will not break your application in any way, as the application itself is fully unaware of the underlying scheme.

I don’t know a Unit Testing framework for MySQL, sorry. Version control is actually simpler than you would imagine. I use phing for deployment of applications to servers, and phing has a “task” called dbdeploy. That task is ridiculously simple, I’ll try to explain the gist of how it works.

First off, your database contains a dedicated table for version tracking (table public.version). You run phing and it connects to the database to determine which version it’s based on. After that, it will look into a folder of your choice (for us, the standard is /deploy/deltas) and look for .sql files. Each .sql file begins with a number, so the first commit will usually contain the file 1-create-initial-structure.sql.

If you want to change the scheme, you’ll have to write a new SQL file for that. You pick the next consecutive number, write a description, and write the SQL that should be executed. Also, you write a script that does the exact opposite: an undo, in case anything goes wrong. The dbdeploy task will know that the currently deployed version is 1, and see that there’s a file that starts with 2, and so it will execute that.

You’ll have to make rules to make this work. First off, you’ll have to have either central tracking of the next consecutive version number (we have a board to write it on), or you’ll have to make one man responsible for assigning those version numbers.

Second: you’ll have to make sure that - no matter how much you want to - you never change a committed SQL file (except for syntax errors). If you want to undo what you did, you’ll write another SQL file that will undo the previous. If you fail to live by that rule, you’ll possibly still have different states the database might be in, which is a bad thing ™. Changing a previously committed SQL file should at the very least mean you buy your colleagues a beer

That’s about it.

TomB · February 10, 2011, 1:13pm

Well, you don’t have to duplicate the logic. Agreed, you could obviously write an application that extracts the logic out to a Domain Model, and you’d be off with the same. Nonetheless, extracting it to the database does the same thing and keeps things simpler, at least, for me. Then, I’m not afraid of SQL

hehe, SQL isn’t scary it’s just limited in its power, I’d rather take advantage of more complex programming techniques which aren’t available in SQL.

When you add a column to your table, you’re screwed either way. If you’ve ever tried to accomplish this, regardless of where you’ve placed your business logic, it’s hard to do this with minimal effort.

Not really. In my system, in most cases I add a DB column, add a field to a form or just make immediate use of the field in my views, everything just works behind the scenes… which could not be the case using the DB for logic.

Version tracking is quite interesting but is there no way to automate the process? hmm… might be fun to play around with a trigger on mysql.proc.

webaddictz · February 10, 2011, 1:46pm

Me too: I use plpgsql if it gets complex. Doing that in pure SQL seems like a lot of work over nothing. plpgsql can do pretty much everything I’ve ever wanted

Not? If you’ve written some magic layer that will map the requests’ body to fieldnames in your database, you might just as well map them to parameters of a stored function, right? I don’t see the difference there.

How automated do you want to have it? Just run the command “phing”, and it’s all done. As far as I’m aware, there’s no better way.

TomB · February 10, 2011, 1:58pm

Not? If you’ve written some magic layer that will map the requests’ body to fieldnames in your database, you might just as well map them to parameters of a stored function, right? I don’t see the difference there.

First, you need to edit the table, the query and the stored procedure instead of just the table. Secondly, how do you automatically map the fields? Since stored procedures (in mysql at least, pgsql might be better) can’t be passed arrays you need to make sure they’re sent in the correct order. It also makes it almost impossible to re-use the same procedure when you’re only updating a subset of the data.

How automated do you want to have it? Just run the command “phing”, and it’s all done. As far as I’m aware, there’s no better way.

I initially misread your post, I thought you had to manually set up the .sql files each time. Still, might be interesting playing with triggers on the mysql database tables

webaddictz · February 10, 2011, 2:09pm

You can’t send a normal SQL query an array either, yet, you’re telling me you don’t have to update the query. All I’m saying is that if you’re using some magic to deduce what fields to send through, you should be able to do the same with stored function mappings. Stored procedures (with PDO) can take arrays, by the way, so saveAnswer( :question, :answer ) can be called with an array( ‘:question’ => 1, ‘:answer’ => 2 );

As opposed to what? I don’t really understand what you’re getting at: it’s also next to impossible to re-use the same query when you’re only updating a subset of the data? Or are you currently using an ORM which marks overwritten fields as “dirty”? If so, what exactly is the downside of overwriting everything, even columns that haven’t been changed? Apart from sending a bit more data, there is no downside I’m aware of.

Ah, that explains. No, you just write the .sql files once, phing, and your database is up to date. I don’t use MySQL any more, so I don’t really know what a trigger on a mysql database table should do.

TomB · February 10, 2011, 2:26pm

webaddictz:

You can’t send a normal SQL query an array either, yet, you’re telling me you don’t have to update the query. All I’m saying is that if you’re using some magic to deduce what fields to send through, you should be able to do the same with stored function mappings. Stored procedures (with PDO) can take arrays, by the way, so saveAnswer( :question, :answer ) can be called with an array( ‘:question’ => 1, ‘:answer’ => 2 );

As opposed to what? I don’t really understand what you’re getting at: it’s also next to impossible to re-use the same query when you’re only updating a subset of the data? Or are you currently using an ORM which marks overwritten fields as “dirty”? If so, what exactly is the downside of overwriting everything, even columns that haven’t been changed? Apart from sending a bit more data, there is no downside I’m aware of.

I meant I can use array(‘foo’ => ‘foo’, ‘bar’ => ‘bar’); to generate INSERT INTO $table (foo, bar) VALUES (‘foo’, ‘bar’). Add a third field, and it still just works without a change.

The stored procedure cannot be called in this way without a change to application -> database binding step where the procedure is called, you need to add the parameter… or make an assumption about the order of the parameters for the stored procedure.

Ah, that explains. No, you just write the .sql files once, phing, and your database is up to date. I don’t use MySQL any more, so I don’t really know what a trigger on a mysql database table should do.

Well I was just thinking out loud but adding a trigger to the mysql.proc table (which is where the stored procedures are stored) to automatically copy the old record to a different table whenever it’s updated could be useful for a starting point to version control.

edit: Basically, I just find the addition of stored procedures an unecessary extra step which just adds an extra level of binding and complexity with little to no practical benefit and enough drawbacks for me to not use them in most cases.

Michael_Morris1 · February 10, 2011, 4:00pm

My current application is tiered. It will have an open source instructive framework that is designed to be portable. But I don’t want it to close the door to custom SQL setups like you are describing. I would like to use them someday.

The overriding goal of the framework, Gazelle, is to be a teaching framework - so simplicity and clarity trumps most other concerns. I want Gazelle to be the BASIC of PHP frameworks - the framework coders can use to get a true grasp of MVC architecture and patterns in as unambiguous an example as possible. If this means it will fall short of the sort of scalability Symphony 2 or Zend accomplish, so be it - but there’s something to be said for clarity in my opinion, and not every website needs code robust enough to survive the slashdot effect.

On the database side this means the core framework will have its own ORM which I consider an 80/20 approach. It will be fine for 80% of use cases while getting the hell out of your way for the last 20%. This avoids the last mile problem by not getting anywhere near the last mile as it where. And I think this approach will avoid the problems your article mentioned because of the fact your article presented - All ORM’s I’ve seen try to do too much and end up stumbling and mucking things up to the point where it would have been better if they’d never been used at all. What is worse is that most of them make it unduly difficult to do even the simplest tasks.

The goal is to guide, not constrain. The expected audience will know little of SQL or normalization procedures. The framework’s ORM and associated db setup tools will aid in this and encourage proper setups. Advanced setups will be beyond its scope, but once the programmer reaches the point they want or need to do that they’ll be able to turn off the ORM without affecting the rest of the framework.