I'm beta testing an online homework system using MySql and I'm beginning to get very, very nervous about the scale of the thing. As an example, in my database, I originally had two tables which had assignment information. Assignment_questions contained the assignment number, question_id, and course_id for a particular course. The submitted_solutions table contained all of the student responses to questions for the assignments, in addition to their user_id, and assignment_id.
If 20,000 students use the system in a given year, (essentially used by 10 colleges, each giving 20 courses per year with 100 students per course) then with:
assignment_questions for each course: about 600 records per course
submitted_solutions per course: upwards of 60,000 (600 questions times 100 students)
Number of courses per year: upwards of 200 per year (10 schools * 20 courses)
this means that my assignment_questions table will have 200*600 or 120,000 records per year and my submitted_solutions table will have 60,000 *200 = 12 million records.
To me this seems like a mind-boggling number of records to sort through by the database and have no concept of how "slow" things could get.
So, first question: Is there anyway to test this? Is 12 million records "alot"?
Another possibility that I thought of (which would require rewriting many of my queries, but totally doable) was to create "course level" tables when courses are created.
In other words, for course id 1:
and for course id 2:
I actually have 6 "course level" tables for a given course. So, what this would mean is that if I have 200 courses, then I'd be back to submitted_homeworks of size 60,000 records. BUT, then I'd have 200*6=1200 tables in my database.
So, Question 2: Is 1200 tables a lot for a database to have?
My third option that I came up with is to have separate databases for each course. Then, if I have 200 courses, I'd have 200 separate databases with names such as:
etc. where the 1, 2 represent the unique course ids.
Within each course I'd have tables: assignment_questions, submitted_solutions.
The advantage to this solution is that things would look more organized (instead of having to stare at 1200 tables, I'd have 200 databases, arranged by the course level, then within each database I'd have 6 tables). My gut also says that this solution would be cleaner to implement since the table names in the queries wouldn't change: I'd just have to rewrite the code with a variable for the database name. However,
Question 3 Is having 200 databases "a lot"?
Question 4 If I implement this solution, then I'd have to do some cross-database querying..for example, joining my "main database", with all generic user info, with specific "course databases". I assume that this sort of thing is possible?
Any answers to the above questions or thoughts about the different approaches would be appreciated.
Thanks so much for taking the time to read this rather long post.