More tables or more rows which is better

  1. I have a project that takes 100,000 users.
  2. And we have 500 subscriptions or services which one user can subscribe to all or any of them.
  3. Let assume all users subscribed to all the services which will be 100,000 * 500 = 50,000,000 rows.

Now the table design is thesame , the only difference is values they have

serviceid | servicename | userid | date | expire | balance

We query more often and perform some calculations

I want to split the tables into 500 to record for a particular service. which means a table can only have 100,000 * 1 = 100,000 records

The reason for this splitting is that if we are looking for serviceA we only have 100,000 records to work or access, instead of going to look at extra 49 million records to get small info about a service

Secondly, I can easily backup the tables as they are much smaller than backing up full tables with larger size even when no update was made in them.

Lastly, i can import all tables into one if i need to.

Also my function uses a listed array of tables names if is in array then proceed with the table, also I have show variables table like query to search if the table exists before continuing any query.
So this makes my code one and not multiple for each service.

Given my explanations, what do you think is the best way and reasons for your preference.

1 Like

I am no expert in these things, and it will be interesting to hear what those with more experience say, but it seems to make things unnecessarily complicated. 50m records on an appropriate server probably isn’t a major issue, and the selection time needn’t be massively different if the indexing is set up correctly.

2 Likes

MySQL won’t have a problem with 50m records. I’ve see tables that are bigger and MySQL still worked fine. The biggest concern as @droopsnoot points out is getting your indices right so querying stays fast.

Also, 50m is an upper limit, it will probably be lower than that.

If you still feel that it’s too much you can also look into partitioning.

1 Like

I’m having a hard time following along because it seems overly complex for what you initially described.

Let’s see if I understand correctly what you’re describing. You’ve got a set of services that users can subscribe to. To me, this could be accomplished with three tables.

  1. Services - this holds basic information about each service
    1. ServiceID - PK which will be referenced later
    2. ServiceName - Title of Service
    3. ServiceDescription - you’ll want this as it gives a description of the service which the user subscribes to
    4. Duration - number of days the subscription is good for.
    5. Active - boolean which allows you to show on the services which can be subscribed to at that time.
  2. Users - holds basic info about each service
    1. UserID - PK referenced later
    2. UserName - visible value which the user references (trust me, you’ll want this as I guarantee the user will want to change their user name and you don’t want to mess with FKs.
    3. name, address, email, etc…
  3. Subscriptions - this holds the services users subscribe to
    1. ServiceID - FK to the Service table
    2. UserID - FK to the User table
    3. StartDate
    4. Active - allows the user to “unsubscribe”

With this structure, you just need to ensure your keys are indexed, and you should be able to handle well over your anticipated levels.

Active subscriptions for a user would be something like (SQL Server but mySQL syntax would be similar

SELECT s.ServiceName
     , sub.StartDate
     , DateAdd(day, sub.StartDate, s.Duration) AS DateEnd
  FROM User u
  JOIN Subscription sub ON sub.userID = u.userID
  JOIN Service s ON s.serviceID = sub.serviceID
 WHERE u.userName = @UserName
   AND GETDATE() BETWEEN sub.StartDate AND DateAdd(day, sub.StartDate, (s.Duration + 1)) -- Add 1 because BETWEEN is not inclusive 
   AND sub.active = true
   AND s.active = true

active users for a particular service would be something like

SELECT u.Name -- would probably have first/middle/last separate but for basic purposes...
     , sub.StartDate
     , DateAdd(day, sub.StartDate, s.Duration) AS DateEnd
  FROM Service S
  JOIN Subscription sub ON s.serviceID = sub.serviceID
  JOIN User U ON sub.userID = u.userID
 WHERE s.ServiceID = @ServiceID
   AND GETDATE() BETWEEN sub.StartDate AND DateAdd(day, sub.StartDate, (s.Duration + 1)) -- Add 1 because BETWEEN is not inclusive 
   AND sub.active = true
   AND s.active = true
2 Likes

partitions how? am just looking at it on a very different angle like since everything is all about serviceA lets narrow the table to it alone.

if in the process of update or insertion i can restore table serviceA alone without having to restore full database.

if table gets corrupt it stays only on it instead of destroying all other records

This is the summary of all i said in the post, we don’t need all those other tables, lets say we are dealing with subscription table alone which if all users subscribe will get us 50m records.

This is what directly comes to my mind…

And having all the data in one table is just begging for other issues. My suggestion was to group like data in one place so you’re not repeating information multiple times in multiple places.

Using your numbers (500 services and 100,000 users)

So if you have to update data with your plan, you have these to do.

  • If you change a service name, you have to update 500,000 rows in the database (1 row per user). If you have multiple tables like you were talking about, you first have to scan those tables to find the right one THEN update it.
  • If you change a user you have to update 500 rows (1 row per service). If you have multiple tables, you’re now updating the same information in each table.

My way has

  • If you change a service name, you have to update 1 row
  • If you change a user name you have to update 1 row

It also

  • allows you to have a user re-subscribe to a service in a way that makes sense and easy to report on
  • saves storage (not a major cost savings as storage is cheap but could be a performance savings) if you add services/users

Okay this is not the only table in the site we have users, services, then subscription

users have 100000 with their email, userid, phone, thats all

service have 500 rows with serviceid,name and description

Then when a user subscribe to a service it is added to the subscription table, which from calculation can hit 50,000,000 rows

:smile:
Premature optimization?

  1. you need to check a word in 1000 records (services) and you are sure if that info is not there it can’t be there.
    Why then do you go searching for such info in 50,000,000 records when you can easily distinguish the table and search.

get all amount from 50,000,000 can not more faster than get all amount from 100,000

if you change a service name, simple go to the table with the old name and run one code of update servicename = ‘new name’
You don’t need to search because you already know the service name.

But there is a valid point when changing userid, if the whole thing is in one table it will be one line of update

but if is in 500 tables then I have to update 500 times targeting each table per time

but then no one changes userid bcs that will mean updating every table where userid is

So?

Number one, you’re assuming that every user is going to subscribe to every service. That’s unlikely.

But even if they did, 50M rows is doable if you have the correct indexes in place. It all depends on the server capacity (memory, HD space, load, etc.). Software wise, MySQL supports something like 300 trillion rows, Postgres is almost 10 billion, SQL Server up to 9 quintillion rows. and Oracle claims to be “unlimited” but my experience tells me otherwise, though it’s most certainly higher than 50 million.

All you need is that subscription table, and queries similar to what I provided above will gain you whatever you need. The indexes will filter out the data you don’t need/want and you should be fine.

1 Like

Talking about indexing a table, the only thing I know is using the serviceid as the primary key.

every other columns are just free and not unique.

What really is indexing? bcs i know using primary key, unique keys only where and when necessary

Generally speaking, and not specifically in terms of MySQL, indexing is a method of helping to find rows. If you have an index for the service id in your subscriptions table, when you want to find all of the subscribers for one service id, MySQL will use the index rather than just searching through all 50m records one by one until it finds one.

Think of it as being similar to an index in a book. If you look in the index, it’ll tell you which pages specific terms (or people, or whatever) are mentioned, so you can find them quickly and easily. When you optimise your table design, you take into account what you might search on to decide which columns should be indexed. The gain is faster searching on those columns, the loss is that the more things you index, the longer it takes to perform updates, though I expect that’s done in the background anyway.

1 Like

see https://use-the-index-luke.com

1 Like

Does this mean select from table where serviceid is 6

Thats not indexing, when I set serviceid as primary key.

So how do i index it further so sql can know where to find it?

Am thinking this indexing to be sort of category or categories or tags or group?

Yes, the primary key is also always indexed, but you can add further indices beside the primary key if you want to search for values in other columns.

I have not indexed before, the link you sent is a compressive lecture on indexing, and I appreciate you for that.
But however I can’t go into such rigorous reading or study now.
Show me with few lines what indexing is or should be like in my issue, and I will see if is something I should bother about regarding this particular project

i need a sample of indexing serviceid, servicename, servicgrouo bcs only servicegroup is repetitive column values

which where clause can help find.

You only need to generate indexes on those fields which will be searched on consistently. So if you’re always searching on UserName, then add an index on that field. The basic syntax (in mySQL) is:

ALTER TABLE tableName ADD INDEX indexName (columnName);

Primary keys automatically generate indexes. Your subscription table should have a PK of serviceID, UserID, startDate (or whichever fields you have in the lookup table)

FKs do not, but it may behoove you to generate them for sizeable tables, plus it can enforce data integrity in the tables.

1 Like