SitePoint Sponsor

User Tag List

Results 1 to 17 of 17
  1. #1
    Massimiliano Bruno Giordano sid egg's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,280
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Fastest Database for LARGE (10 million plus) project?

    I am interested in some information on a database that (with properly written queries, and enough disk space) will work at a usable speed with an extremely large number of records.

    If it isn't open source, I should be able to set it up for remote access, and to spread single tables across multiple computers. If it is open source, and in C, I should be able to make the appopriate modifications.

    Wtf, titles never update? MILLION not BILLION.
    Last edited by sid egg; Feb 25, 2005 at 21:37. Reason: billion = million
    GamesLib.com - the slickest, most complete and
    easily navigatible flash games site on the web.

  2. #2
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Any of the current popular databases can handle Millions of records: MySQL, MS SQL, Oracle, Postgres are the ones I would consider, if the hardware is right.

  3. #3
    SitePoint Zealot csi95's Avatar
    Join Date
    Jan 2005
    Location
    Albany, NY
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I could be wrong, but if you're really talking about a database with 10 Billion records, you're probably not going to get the software off the shelf at Staples.

    You need to be talking directly to the folks at IBM, Oracle and Microsoft, and it ain't gonna be cheap.
    Join the EasyImage Affiliate Program!
    30% commission on all sales
    Conversion rates as high as 20%
    Dedicated Affiliate Manager to help you succeed!

  4. #4
    SitePoint Guru asterix's Avatar
    Join Date
    Jun 2003
    Posts
    847
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by csi95
    I could be wrong, but if you're really talking about a database with 10 Billion records, you're probably not going to get the software off the shelf at Staples.

    You need to be talking directly to the folks at IBM, Oracle and Microsoft, and it ain't gonna be cheap.
    He meant 10 Million.
    Oracle might be able to help with billions, but I don't think MS could...

  5. #5
    Afrika
    Join Date
    Jul 2004
    Location
    Nigeria
    Posts
    1,737
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the header says BILLION. but the title says million

    which is which ?

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    maybe it's milliard
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    Gone!
    Join Date
    Aug 2001
    Location
    Witty Location Parody
    Posts
    3,889
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Heh - edited (after a while you can only modify the title of a threads "post" not the actual thread)

  8. #8
    chown linux:users\ /world Hartmann's Avatar
    Join Date
    Aug 2000
    Location
    Houston, TX, USA
    Posts
    6,455
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Any of the major databases can handle 10 million records, it all really depends on how well the system is designed and implemented.

  9. #9
    Non-Member DaveMichaels's Avatar
    Join Date
    Nov 2004
    Location
    US
    Posts
    535
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    10 billion? What are the records about?

  10. #10
    chown linux:users\ /world Hartmann's Avatar
    Join Date
    Aug 2000
    Location
    Houston, TX, USA
    Posts
    6,455
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by DaveMichaels
    10 billion? What are the records about?
    No. 10 million.

  11. #11
    SitePoint Guru puco's Avatar
    Join Date
    Feb 2005
    Location
    Slovakia
    Posts
    785
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would say any of the major commercial databases can handle millions of records, that is not a problem. With properly degined database it is not an issue. But you should ask yourself what do you want to do with the data. Is it static? How big are the rows? What type of queries do you want to execute OLAP/OLTP? etc..

    Each of the RDBMS has their ups/downs. You could look into MSSQL, Oracle, DB2 etc...

    But 10M records is nothing, PosgreSQL could handle that AFAIK.
    Martin Pernecky

  12. #12
    Massimiliano Bruno Giordano sid egg's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,280
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, so, since I am very comfortable with MySQL that would be a fine choice (or, should I learn Postgre simply because it's "Better" -- if you are going to suggest that, please tell me WHY it is better, i've never understood that :P)?

    Even if the entries are 5-20kb a piece (that's between 48 and 240-ish gb of data, however, it'll be split across multiple machines)?

    The data is sorta static, many of the rows will be updated but no more frequently than every month, and new rows will be added weekly.

    Also, if you read the "Wtf" at the bottom of my post, I stated that I tried to edit it, it seems to have been done now (by a mod I guess).

    Edit, for those who are wondering, it is a statistical database (how's it so big you ask, don't worry - it is (I really don't know ).) information for a large company my friend is doing a a job for.
    GamesLib.com - the slickest, most complete and
    easily navigatible flash games site on the web.

  13. #13
    SitePoint Guru puco's Avatar
    Join Date
    Feb 2005
    Location
    Slovakia
    Posts
    785
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Before choosing MySQL check its table size limit, because I'm not sure if it goes over 4GB (or over 64GB). But since you have static, statistical data maybe you should look into OLAP, might suit your needs.
    Martin Pernecky

  14. #14
    SitePoint Wizard big_al's Avatar
    Join Date
    May 2000
    Location
    Victoria, Australia
    Posts
    1,661
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    MS SQL Server should cope... Walmarts' database is half a petabyte (500 Terrabytes) at last report...

    Edit:

    Can someone confirm if Walmart do use MS SQL, or another database server?

    Walmart is not listed on this list http://www.microsoft.com/sql/evaluat...wintercorp.asp

    Also note one database is 33 BILLION records... so 10 BILLION records should not be a concern, providing the server is set up by a DBA that REALLY KNOWS WHAT THEY ARE GOING
    .NET Code Monkey

  15. #15
    SitePoint Zealot Oggle's Avatar
    Join Date
    Jul 2003
    Location
    Cali
    Posts
    138
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Try using a flatfile database....























    j/k

  16. #16
    SitePoint Addict phpster's Avatar
    Join Date
    Feb 2005
    Location
    Toronto, Canada
    Posts
    374
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Walmart is DB2 and Oracle...running HUGE iron

    MySQL on *nix has no set limit for file sizes.

    Since its a stats db, my guess is that you are gonna need some decent iron to run it on and a well designed environment. You will likely need to look at clustering, replication and high availablility. A great place to look is [url=www.highperformancemysql.com]high performance mysql[//url] (a great book as well, worth picking up)

    MySQL can routinely handle 50 million records, though at this size, the hardware is gonna matter more. You ain't running this puppy on a $500 Dell special.

    As for features compared with PostgreSQL, PG gives you store procedures, triggers and some other nifty features. Yes, these features are in MySQL v5, but that is not ready for the production environment quite yet. I have heard of PG referred as the Oracle of Open Source Dbs.

    The advantage to MySQL is that there is a company behind it that you can go to for support when needed, and support is still very fairly priced.

    MS SQL Server is not the tool for this, lacking some basic features like regex and not easily editable text areas(my two pet peeves). MySql is great at these. Others to examine are Oracle and DB2, though the main drawbacks are the $$$ involved in this.
    phpster

    I wish my computer would do what I want it to.
    Not what I tell it to do...

  17. #17
    Massimiliano Bruno Giordano sid egg's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,280
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey, thanks a lot guys, we've decided on MySQL and yes, the "heavy iron" as you call it is all worked out. There will be 8 servers dedicated to it, and the load will be spread evenly across them.
    GamesLib.com - the slickest, most complete and
    easily navigatible flash games site on the web.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •