SitePoint Sponsor

User Tag List

Results 1 to 11 of 11

Thread: Database Design

Hybrid View

  1. #1
    SitePoint Enthusiast
    Join Date
    Jul 2001
    Location
    Ireland
    Posts
    90
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Database Design

    I am still struggling with tables!

    Say if I wanted to design an email system like hotmail with 2 million users and each user would have say 200 emails in their inbox.

    Is it practical to put all emails into one table or should each user have its own table to keep the no of records in any table low?

    Would someone mind giving me a simple table diagram to help me?

    Thanks,

    Willie

  2. #2
    code addict Abstraction's Avatar
    Join Date
    Apr 2001
    Location
    Des Moines, IA
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would go with a users table and a messages table.

  3. #3
    SitePoint Zealot alexk's Avatar
    Join Date
    Nov 2000
    Location
    Sydney, Australia
    Posts
    134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    2 million users * 200 emails * 2.5 kb avg email size = 1 Terabyte of data... first, i doubt mysql can handle this... second, you would need a cluster system... third, it's very complex and expensive... unless that was an abstract question

  4. #4
    You talkin to me? Anarchos's Avatar
    Join Date
    Oct 2000
    Location
    Austin, TX
    Posts
    1,438
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    MySQL's maximum table size is 8 million terabytes: the operating system and hardware are the bottlenecks. In any case I don't think someone who's struggling with tables is the best person to design a 2 million user email application.
    ck :: bringing chris to the masses.

  5. #5
    code addict Abstraction's Avatar
    Join Date
    Apr 2001
    Location
    Des Moines, IA
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    MySQL would start on fire =)

    A Terabyte is not that much anymore if you are talking about a large system. Hotmail has ~110 million users.

    I think he was talking in an abstract way. I wouldn't believe someone in charge of designing a system of that size wouldn't know how to model a database.

  6. #6
    SitePoint Zealot alexk's Avatar
    Join Date
    Nov 2000
    Location
    Sydney, Australia
    Posts
    134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    also, it's not only data size problem... if 1/100 of 2 million users will try to read their emails, this will demolish mysql server...

  7. #7
    SitePoint Enthusiast
    Join Date
    Jul 2001
    Location
    Ireland
    Posts
    90
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi again,

    You appear to be very harsh on me!

    All I was looking for was how to structure the tables for users and messages. How are messages pulled from tables where there are many users and many messages. For example, one approach would be:

    User Table (index on userid)
    userid|login|password

    MessageIndex Table (index on userid)
    userid|messageid|subject

    Message Table (index on messsageid)
    messageid|narrative|date|fromwho

    Based on your replies, it would appear that the MessageIndex Table would come under pressure (which is what I would expect). Now what I am asking is there a better way to do it?

    On a more general point, if there was a 500MB table and a record within it had to be updated, would all 500MB be loaded into RAM? How do updates, inserts, etc work memory wise?

    Thanks again.

    Willie

  8. #8
    code addict Abstraction's Avatar
    Join Date
    Apr 2001
    Location
    Des Moines, IA
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Based on your replies, it would appear that the MessageIndex Table would come under pressure (which is what I would expect). Now what I am asking is there a better way to do it?
    I would do it like this:

    User Table (index on userid)
    userid|login|password

    Message Table (index on messsageid)
    messageid|message|userid

    You can parse message for date, who it's form etc, but that's besides the point.


    On a more general point, if there was a 500MB table and a record within it had to be updated, would all 500MB be loaded into RAM? How do updates, inserts, etc work memory wise?
    I am assuming that, no, it isn't all loaded into RAM. That could get messy in a matter of seconds if it were. If you want to know how database operations work, I would look into the specific database for details.

  9. #9
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Why put the e-mails in a DB anyway... or for that matter, don't think of a single DB server handling all of it.

    Split it up into several machines.. Each one handling a percentage of the total users... e.g. DB1 handles users 1 - 500,000, DB2 handles 500,001 - 1 million, etc.

    You get the idea -- MySQL can't scale well enough to handle it all on one box (most couldn't probably; that is one reason why clustering/replication was invented)...

    But again, store the posts in plaintext files or something... It is a waste to store them in the DB if you don't have to (granted if you wanted to do some weird things with the data from an admin end it would be nicer in the DB I think).

  10. #10
    code addict Abstraction's Avatar
    Join Date
    Apr 2001
    Location
    Des Moines, IA
    Posts
    346
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think he was just using that email thing as an example. But anyway...

    But again, store the posts in plaintext files or something... It is a waste to store them in the DB if you don't have to (granted if you wanted to do some weird things with the data from an admin end it would be nicer in the DB I think).
    plaintext files? Wouldn't disk access kill you in this situation?

  11. #11
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Originally posted by Abstraction
    I think he was just using that email thing as an example. But anyway...

    plaintext files? Wouldn't disk access kill you in this situation?
    I wouldn't think so -- that is how sendmail et. al handle mail files, right? That way though you could easily distribute them to many, many machines...

    I suppose if you did do the splitting up dealie that I mentioned the message table wouldn't get terribly large (50 million rows perhaps?) so it could work... It would tend to get a bit slow I would think though.. Get 10GB RAM and you could store it in memory!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •