Gigantic data sets best practices

I’ve been charged with creating a database that is going to have huge datasets per user and I’m really not sure the best way to do it all.

The way I see it, right now there are two tables: USERS, which holds user information, and ENTRIES, which will hold the main data for the users. The problem is there are thousands of users and millions of entries. I can’t use just two tables they would never hold all the data.

I realize I don’t know the best way to handle huge amounts of data like this. How do companies do it? Where can I go to read up on this type of problem and can someone point me to some best practices for these situations.


Just an idea… one of the strategies of handling huge datasets is partitioning (sharding) which will split your data (and workload) across many physical servers.

yes, i missed

thousands of users and millions of entries
in the original post. This number of rows should easily fit into a single database instance running on commodity hardware. Sharding makes sense when dealing with much bigger datasets.

do you have actual evidence of this, or are you just apprehensive? do you know what the upper limit on database size is?

mysql tables with millions of rows are fairly common