SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Addict
    Join Date
    Apr 2001
    Location
    None
    Posts
    254
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi, I'm thinking of importing over 1 million records into the MySQL database, and then do queries against it. Any suggestions? Will MySQL handle that many records just fine?

    Thanks

  2. #2
    SitePoint Columnist Skunk's Avatar
    Join Date
    Jan 2001
    Location
    Lawrence, Kansas
    Posts
    2,066
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It should be fine, though your main worry should be the complexity of the queries and the number of people that will be using the database simultaneously. I help administrate the forums at http://forums.gameplay.com/ - they run using PHP and mySQL and have well over a and a half million posts (all sotred in mySQL).

  3. #3
    SitePoint Enthusiast Jack@dwd.com's Avatar
    Join Date
    Dec 2000
    Location
    UK
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, it all depends on how many people are going to be accessing it. gameplay.com forums have hundreds of users on at once, and it does tend to slow down. This is primarily because MySQL has a lower connection limit than a big enterprise DB such Oracle. On the other hand, it is as far as I know the fastest database out there, so if say only 30 people are going to be using your site/application at once, you may appreciate the speed.

  4. #4
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also the indexes make quite a big difference with such large DBs.
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  5. #5
    SitePoint Wizard
    Join Date
    Apr 2000
    Posts
    1,483
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It will handle it, but only just.
    I would speak to mrogish on these forums (Matt Rogish) - he runs a huge forum that used MySQL with over a million posts. He recently migrated to Sybase though.
    It really depends on how much RAM is in the server as well - what spec is it?
    And are the records you're importing complex (eg forum posts) or simple?

  6. #6
    ********* Callithumpian silver trophy freakysid's Avatar
    Join Date
    Jun 2000
    Location
    Sydney, Australia
    Posts
    3,798
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think Matt goes by the nick of MattR around here. I'm just doodling here - but how big is each record? Lets say it is 100KB, and there are 1,000,000 records. Thats over 95GB! Of course if each reacord is 10KB thats 9.5GB, and if each record is 1KB thats less than 1GB.

  7. #7
    SitePoint Evangelist
    Join Date
    Feb 2000
    Location
    England
    Posts
    568
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the actual usage is going to be more important i would think than size, but it can certainly handle it, there are a few vbulletins with over 1,000,000 posts in them and lots have 300 or so concurrent users. It is the later that will create the breaking point not the number of records as far as I can tell. Of course indexes are not just going to speed it up but are a necessity here!

  8. #8
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, it was basically the high number of concurrent users which killed MySQL so we made the switch to Sybase ASE 11.9.2 back in January of this year. I think one of my links in my sig explains why I don't like MySQL and our specific problems that we encountered.

    And BTW, 1460089 posts takes up somewhere in the neighborhood of 3.3GB.

  9. #9
    SitePoint Addict
    Join Date
    Apr 2001
    Location
    None
    Posts
    254
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi guys,

    Thanks for your answers.
    Last edited by Pweb; Apr 19, 2001 at 10:25.

  10. #10
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Definitely I think multiple tables will help.
    I don't know much about db normalization but it's very logical to avoid repetitive data.
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  11. #11
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Originally posted by Pweb
    Hi guys,

    Thanks for your answers.

    The database file is about 300 MB.
    Records are small - 1 line of text in each column, and there are about 7 columns.

    I need a suggestion - I have one table, which has about 2 million records in it, when I do a search against it, even when trying to be very specific, it takes a very long time to pull data out of it (I'd say 1-3 minutes).

    The table has a column with years in it (e.g. 2001, 2000, 1999) there are over 300.000.00 records for each year. Also, it has a column with "types" in it - for example "Male, Female".

    Here is what I'm thinking of doing, but need your suggestions...

    1. if I divide the table by year, into several tables (1 year for 1 table)
    2. then create another table with separate types in it - 1 for male, 1 for female
    3. And index it all

    Then use joins to join "type" with "year" will this speed up the search??

    Thanks!
    I'd say 99.9% of performance problems (well maybe 90%) come from improper SQL queries and lack of indexes.

    If you could provide the table schema (e.g. CREATE TABLE ... along with the queries you are running AND the indexes (e.g. CREATE INDEX.. you might not have any problems other than improper indexes.

  12. #12
    SitePoint Addict
    Join Date
    Apr 2001
    Location
    None
    Posts
    254
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    select * from people where type="some_type" and state = "some_state" and year = "some_year" and name like "%some_name" and org like "%some_org"

    So I need to index year, state and type (they have a LOT of duplicates)

    Any suggestions?
    Last edited by Pweb; Apr 19, 2001 at 10:26.

  13. #13
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yikes! TEXT columns? If you do not need the virtually unlimited length of the TEXT datatype I suggest cutting it down to a VARCHAR( ... ) column..

    For instance,
    TYPE VARCHAR( 30 ),
    STATE CHAR( 2 ), -- you could use abbreviations, CA, OH, etc.
    NAME VARCHAR( 200 ),
    ORG VARCHAR( 250 ),
    YEAR SMALLINT
    etc.

    Keep the datatypes relavent AND the right size.

    MySQL indexes on text columns are flaky at best so avoid them if you can help it.

    Also from what I understand about indexes in MySQL they can't use them on "%something" queries, only "something%" -- so if you can avoid the leading wildcard that would help.

    After you get the datatypes ironed out, create an index on everything in that where clause -- it will provide best performance in comparison to a composite index:
    CREATE INDEX type_state_year_name_org people( type, state, year, name, org );

  14. #14
    SitePoint Addict
    Join Date
    Apr 2001
    Location
    None
    Posts
    254
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Matt -

    I need to use wildcards in front of the data - "%something" because the database I have has blank lines, commas, dots in front of the data - for example: " Hello" or ".Hello" or ",.Hello"

    Don't ask me why - it's not my database

    So you suggest using ONE index for all the columns
    CREATE INDEX type_state_year_name_org people( type, state, year, name, org );

    Instead of 1 index for 1 column?

  15. #15
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If that's the one query you're going to be running, then yes the single index would be the way you want to go about doing it.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •