I’m doing some planning and the database in question contains a few hundred tables. It currently is spread out over a few dozen logical databases instead of being contained in one. Obviously this is easier to look at as a human, but at the same time you’re losing fundamental relationships in your core tables, such as users. So, what would you do?
i’d leave it alone
of course, if your question was, what would i do if i were designing a completely new database, that’d be different
The question was; “one large database or multiple smaller databases?”
i have to agree with rudy. if you have an existing application that really is hundreds of tables spread across a couple of databases, then leave it alone.
but if you’re designing an application from the ground up, i would most likely pick one database.
I would take a different approach: federate the data, mainly due to backup and restoration issues. For example, I like to be able to restore application data without effecting the application’s users and security role mapping.
OK, just to be clear, I’m not planning on touching the existing database. This is simply a matter of theory.
Depends on the app design to me. If you’re building one monolithic app that uses all the database tables, do it all in one database. If you’re building multiple small apps that can/might loosely connect to each other, do multiple databases and grab external data with REST/Web services, or better yet (if your RDBMS supports it and your app only needs to read external data), use a view to pull the data you need from other databases into a virtual table/tables and work with that.
I actually have the opposite problem at work. We have a database with about 700 tables because nobody thought to logically separate them out all those years ago when most of the apps were built, and now it’s a big mess.
a matter of theory?
in that case it doesn’t matter whether it’s all tables in one database or one database per table, the app logic will be the same, and the SQL will be the same except for minor details in naming, i.e.
select …
from [servername.][databasename.][ownername.]tablename …
Does the size of the database you are querying effect the loading time of that information? or is it the size of the table within the database? Or, does it not matter at all?
Table size definitely matters. It’s a lot faster to SELECT from 100 rows than it is to SELECT from 20 million rows, especially if neither table is optimized.
mmm… i think it depends.
if you put your databases on different server, you can share load between servers. but if you put them in one DB server, it’s better to use one database.
I once built an app that (while far from being enormous) had about 60 tables. I split it into two databases because their was a logical distinction. One was for products, suppliers and pricing; the other for customers, users, quotes, orders and everything else.
Although the data still “connected” in the application I always thought I made the right choice by separating them. I never wished they were part of a single database.
If you have a reasonable number of tables (20+) and more importantly there is a logical distinction in purpose then I’d suggest having different databases. Just remember to have some mechanism in your app to easily switch and identify the databases. As rudy posted you can still have join queries across tables in different databases.
This is what I do too. If the data is going to be used by more than one loosely connected applications I keep a bunch of databases, starting with a separate one for users.
The main reason I did this was in the case I needed to move one of the applications somewhere else (including backup/restore), I can just move the necessary databases.
I agree with vgarcia. I don’t have any apps with huge databases, but if I did, I would keep it in one database for simplicity’s sake.
I know table size matters for speed, but does database size?
table size yes, database size no.
Speed is related to table size not database size.
I prefer one large database rather than using multiple databases…
Hi,
Whether or not tables should be stored in a single database or spread across multiple databases really depends upon how data is being accessed functionally by different applications. There are many factors to consider, and I’ll name a few:
-
It is much easier to optimize queries on tables that are stored in a single database. The ability to optimize across multiple databases is very much RDBMS dependent. Queries (especially ad-hoc queries) are extremely difficult to optimize within an application, since data (type and volume) can change over time.
-
It is possible for an RDBMS to physically cluster information (if necessary), if the tables are in a single database.
-
Depending upon the RDBMS, it may be possible or easier to enforce domain integrity, and cross-table business rules, directly within the RDBMS, as opposed to within application programs. Integrity rules, stored in applications, are more difficult to maintain, and can create data integrity issues, if improperly enforced within the application program.
-
It is easier to enforce transaction management when all tables are within one database. In some cases, it may not be possible to enforce proper database recovery, when transactions are accessing multiple databases. (Sorry, I have been out of the game for a while, and I do not know how different RDBMSs do this).
On the plus side for multiple databases,
-
Smaller databases are easier to manage and physically optimize.
-
By partitioning databases, you avoid performance collisions between applications (sometimes databases are replicated for this reason).
These are just some ideas off the top of my head. In the past, when faced with similar types of problems, I would first look at the functional requirements of the applications (e.g, which databases and tables are they accessing and when), and partition the database based upon what is most suitable for each application, and all of the applications as a whole.
Hope this helps a bit,
Rich
For an advisor, you’re being awful helpful in this thread…
Thanks everyone for the replies.
So it seems the message is to use multiple databases if there is a logical divide, which makes sense. However, even if there’s a logical divide, at some point you’ll want to create a relationship between two tables that exist in different databases. How would you handle that? Would you do the major no-no and just handle it in the application logic, or another no-no by creating a duplicate table, or some other way?