Scaling a database

kreut · August 19, 2011, 6:14pm

Hello!

I’m beta testing an online homework system using MySql and I’m beginning to get very, very nervous about the scale of the thing. As an example, in my database, I originally had two tables which had assignment information. Assignment_questions contained the assignment number, question_id, and course_id for a particular course. The submitted_solutions table contained all of the student responses to questions for the assignments, in addition to their user_id, and assignment_id.

If 20,000 students use the system in a given year, (essentially used by 10 colleges, each giving 20 courses per year with 100 students per course) then with:

assignment_questions for each course: about 600 records per course
submitted_solutions per course: upwards of 60,000 (600 questions times 100 students)
Number of courses per year: upwards of 200 per year (10 schools * 20 courses)

this means that my assignment_questions table will have 200*600 or 120,000 records per year and my submitted_solutions table will have 60,000 *200 = 12 million records.

To me this seems like a mind-boggling number of records to sort through by the database and have no concept of how “slow” things could get.

So, first question: Is there anyway to test this? Is 12 million records “alot”?

Another possibility that I thought of (which would require rewriting many of my queries, but totally doable) was to create “course level” tables when courses are created.

In other words, for course id 1:

assignments_questions_1
submitted_homework_1

and for course id 2:

assignments_questions_2
submitted_homework_2

I actually have 6 “course level” tables for a given course. So, what this would mean is that if I have 200 courses, then I’d be back to submitted_homeworks of size 60,000 records. BUT, then I’d have 200*6=1200 tables in my database.

So, Question 2: Is 1200 tables a lot for a database to have?

My third option that I came up with is to have separate databases for each course. Then, if I have 200 courses, I’d have 200 separate databases with names such as:

course_1
course_2

etc. where the 1, 2 represent the unique course ids.

Within each course I’d have tables: assignment_questions, submitted_solutions.
The advantage to this solution is that things would look more organized (instead of having to stare at 1200 tables, I’d have 200 databases, arranged by the course level, then within each database I’d have 6 tables). My gut also says that this solution would be cleaner to implement since the table names in the queries wouldn’t change: I’d just have to rewrite the code with a variable for the database name. However,

Question 3 Is having 200 databases “a lot”?
Question 4 If I implement this solution, then I’d have to do some cross-database querying…for example, joining my “main database”, with all generic user info, with specific “course databases”. I assume that this sort of thing is possible?

Any answers to the above questions or thoughts about the different approaches would be appreciated.

Thanks so much for taking the time to read this rather long post.

-Eric

r937 · August 19, 2011, 8:13pm

Q1. is 12 million rows a lot? no

after that, i stopped reading when you started talking about separate tables for separate courses, and then separate databases

that’s not the way i would go

kreut · August 19, 2011, 9:14pm

Rudy,

Your message was a bit cryptic for me this time.

First, 12 million rows isn’t a lot: Excellent!

But, then you said that you stopped reading (can’t say I blame you!). So, did you stop reading because the size of the table didn’t warrant a different approach OR because there was a better approach than my crazy scheme of dividing up the database into 5 zillion tables. And, if there is a better approach, if you wouldn’t mind laying the “concept of it” on me, it’ll give me a good place to start. :stir:

r937 · August 19, 2011, 9:17pm

one table for every different type of entity

so one table for all students, one table for all courses, etc.

do a search for the phrase “premature optimization”

kreut · August 19, 2011, 10:00pm

And now that I did a search for premature optimization I’m smiling!

My database is more or less set up as “one table for every different type of entity” (sometimes a little more and sometimes a little less), so I’m going to take your slightly cryptic advice and not worry so much about it until I’ve got those 5 zillion lines in my database and it proves to be slow.

-Eric

brense · August 26, 2011, 7:13am

12 million records can be a lot depending on your queries.
If you have good indexes it should be no problem.

For example, I imagine you’ll have an index for course_id or something and that index should be unique together with the student_id etc.

kreut · August 26, 2011, 1:09pm

Yes…both of those guys together will be unique!