SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Scaling a database

    Hello!

    I'm beta testing an online homework system using MySql and I'm beginning to get very, very nervous about the scale of the thing. As an example, in my database, I originally had two tables which had assignment information. Assignment_questions contained the assignment number, question_id, and course_id for a particular course. The submitted_solutions table contained all of the student responses to questions for the assignments, in addition to their user_id, and assignment_id.

    If 20,000 students use the system in a given year, (essentially used by 10 colleges, each giving 20 courses per year with 100 students per course) then with:

    assignment_questions for each course: about 600 records per course
    submitted_solutions per course: upwards of 60,000 (600 questions times 100 students)
    Number of courses per year: upwards of 200 per year (10 schools * 20 courses)

    this means that my assignment_questions table will have 200*600 or 120,000 records per year and my submitted_solutions table will have 60,000 *200 = 12 million records.

    To me this seems like a mind-boggling number of records to sort through by the database and have no concept of how "slow" things could get.

    So, first question: Is there anyway to test this? Is 12 million records "alot"?

    Another possibility that I thought of (which would require rewriting many of my queries, but totally doable) was to create "course level" tables when courses are created.

    In other words, for course id 1:

    assignments_questions_1
    submitted_homework_1

    and for course id 2:

    assignments_questions_2
    submitted_homework_2

    I actually have 6 "course level" tables for a given course. So, what this would mean is that if I have 200 courses, then I'd be back to submitted_homeworks of size 60,000 records. BUT, then I'd have 200*6=1200 tables in my database.

    So, Question 2: Is 1200 tables a lot for a database to have?

    My third option that I came up with is to have separate databases for each course. Then, if I have 200 courses, I'd have 200 separate databases with names such as:

    course_1
    course_2

    etc. where the 1, 2 represent the unique course ids.

    Within each course I'd have tables: assignment_questions, submitted_solutions.
    The advantage to this solution is that things would look more organized (instead of having to stare at 1200 tables, I'd have 200 databases, arranged by the course level, then within each database I'd have 6 tables). My gut also says that this solution would be cleaner to implement since the table names in the queries wouldn't change: I'd just have to rewrite the code with a variable for the database name. However,

    Question 3 Is having 200 databases "a lot"?
    Question 4 If I implement this solution, then I'd have to do some cross-database querying..for example, joining my "main database", with all generic user info, with specific "course databases". I assume that this sort of thing is possible?


    Any answers to the above questions or thoughts about the different approaches would be appreciated.

    Thanks so much for taking the time to read this rather long post.

    -Eric

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Q1. is 12 million rows a lot? no

    after that, i stopped reading when you started talking about separate tables for separate courses, and then separate databases

    that's not the way i would go
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Rudy,

    Your message was a bit cryptic for me this time.

    First, 12 million rows isn't a lot: Excellent!

    But, then you said that you stopped reading (can't say I blame you!). So, did you stop reading because the size of the table didn't warrant a different approach OR because there was a better approach than my crazy scheme of dividing up the database into 5 zillion tables. And, if there is a better approach, if you wouldn't mind laying the "concept of it" on me, it'll give me a good place to start.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by kreut View Post
    And, if there is a better approach, if you wouldn't mind laying the "concept of it" on me
    one table for every different type of entity

    so one table for all students, one table for all courses, etc.

    do a search for the phrase "premature optimization"

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    And now that I did a search for premature optimization I'm smiling!

    My database is more or less set up as "one table for every different type of entity" (sometimes a little more and sometimes a little less), so I'm going to take your slightly cryptic advice and not worry so much about it until I've got those 5 zillion lines in my database and it proves to be slow.

    -Eric
    Last edited by kreut; Aug 19, 2011 at 15:11. Reason: combined two posts

  6. #6
    SitePoint Enthusiast
    Join Date
    Aug 2011
    Posts
    67
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    12 million records can be a lot depending on your queries.
    If you have good indexes it should be no problem.

    For example, I imagine you'll have an index for course_id or something and that index should be unique together with the student_id etc.

  7. #7
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yes...both of those guys together will be unique!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •