SitePoint Sponsor

User Tag List

Results 1 to 16 of 16
  1. #1
    The Omnipresent [ArcanE]'s Avatar
    Join Date
    Mar 2005
    Location
    Belgium
    Posts
    214
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Inserting massive amount of records with PHP

    I'm building a mailing tool for a client of ours and need to send an email to all their client contacts (over 130 000 emails). This is the first time i need to insert such a huge amount of records to the mailqueue table.

    Previously i inserted them one by one with an insert query. After a while that wasn't performant enough while inserting 1000+ records so i modified the query to insert multiple records at once [insert into table (field, ...) values (val, ...), (val, ...), (val, ...), ...]

    I was already running into php memory limit problems when inserting the 1000+ records one by one. This is now solved by the above query but i'm afraid i'm going to run into memory limit problems again when inserting the 130 000 records.

    Anyone has another solution how i can add 130 000 records to a database? Perhaps with sessions and a page that reloads itself???

    Any idea's are welcome and will be considered
    Thanks!
    Webdevelopment : Skyrocket Concepts Inventis Web Architects
    Ain't got time for the future or the past.
    Live for the moment, make it last.

  2. #2
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I think your best option is to have a cron-job that runs the mail queue at some time in the night where you have the least amount of visitors. This means that the code you have above would work, while it's running faster and affecting the users less.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  3. #3
    SitePoint Addict
    Join Date
    Aug 2005
    Posts
    207
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    LOAD DATA INFILE

    PHP Code:
    <?php

        define 
    'DB_HOST''localhost:3306' );      // should not need to change
        
    define 'DB_USER''user' );                // enter the user name for this database
        
    define 'DB_PASS''pass' );                // enter the password for this database
        
    define 'DB_NAME''my_db' );               // enter the database name you are connecting to


        
    define 'DB_TABLE''my_table' );           // enter the NAME of the database TABLE to INSERT data into
        
    define 'DBF_CSVP''./path_to/csv.txt' );  // enter the PATH and NAME of the CSV FILE to IMPORT

        
    mysql_connect DB_HOSTDB_USERDB_PASS ) or die ( 'Connection Error: ' mysql_error () );
        
    mysql_select_db DB_NAME ) or die ( 'Select DB (' DB_NAME ') Error: ' mysql_error () );

        
    /* change or add any (terminated, enclosed or escaped) option that you need */

        
    $query "LOAD DATA INFILE '" DBF_CSVP "' INTO TABLE " DB_TABLE " FIELDS TERMINATED BY '\\t\\t' OPTIONALLY ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\\r\\n'";

        
    mysql_query $query );

    ?>

  4. #4
    The Omnipresent [ArcanE]'s Avatar
    Join Date
    Mar 2005
    Location
    Belgium
    Posts
    214
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by printf View Post
    LOAD DATA INFILE

    PHP Code:
    <?php

        define 
    'DB_HOST''localhost:3306' );      // should not need to change
        
    define 'DB_USER''user' );                // enter the user name for this database
        
    define 'DB_PASS''pass' );                // enter the password for this database
        
    define 'DB_NAME''my_db' );               // enter the database name you are connecting to


        
    define 'DB_TABLE''my_table' );           // enter the NAME of the database TABLE to INSERT data into
        
    define 'DBF_CSVP''./path_to/csv.txt' );  // enter the PATH and NAME of the CSV FILE to IMPORT

        
    mysql_connect DB_HOSTDB_USERDB_PASS ) or die ( 'Connection Error: ' mysql_error () );
        
    mysql_select_db DB_NAME ) or die ( 'Select DB (' DB_NAME ') Error: ' mysql_error () );

        
    /* change or add any (terminated, enclosed or escaped) option that you need */

        
    $query "LOAD DATA INFILE '" DBF_CSVP "' INTO TABLE " DB_TABLE " FIELDS TERMINATED BY '\\t\\t' OPTIONALLY ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\\r\\n'";

        
    mysql_query $query );

    ?>
    This is a nice solution but doesn't this just move the problem to creating the file to load in the LOAD DATA INFILE query?

    Quote Originally Posted by arkinstall View Post
    I think your best option is to have a cron-job that runs the mail queue at some time in the night where you have the least amount of visitors. This means that the code you have above would work, while it's running faster and affecting the users less.
    The problem is not the sending of the mails but the moment the mails (with their body and headers) get inserted into the mailqueue table. I already have a cronjob in place that sends out 50 mails from the mailqueue table each minute.
    Webdevelopment : Skyrocket Concepts Inventis Web Architects
    Ain't got time for the future or the past.
    Live for the moment, make it last.

  5. #5
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    This is a nice solution but doesn't this just move the problem to creating the file to load in the LOAD DATA INFILE query?
    Yup, but it's made for importing large sets of data (I'd say 130,000 records is a medium-sized set of data)
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  6. #6
    The Omnipresent [ArcanE]'s Avatar
    Join Date
    Mar 2005
    Location
    Belgium
    Posts
    214
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Anyone can come up with another solution?
    Webdevelopment : Skyrocket Concepts Inventis Web Architects
    Ain't got time for the future or the past.
    Live for the moment, make it last.

  7. #7
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Is it only a one-time insert?
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  8. #8
    SitePoint Addict
    Join Date
    Aug 2007
    Location
    GR
    Posts
    352
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What are the records you have to insert?

    Aren't the e-mail supposed to be the same for everyone?

  9. #9
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by [ArcanE] View Post
    Anyone can come up with another solution?
    Where is the 130k emails getting pulled from?

    I take it you're creating a temporary table to store the emails to be sent?

  10. #10
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by [ArcanE] View Post
    I was already running into php memory limit problems when inserting the 1000+ records one by one. This is now solved by the above query but i'm afraid i'm going to run into memory limit problems again when inserting the 130 000 records.

    Anyone has another solution how i can add 130 000 records to a database? Perhaps with sessions and a page that reloads itself???
    130 000 records is nothing, your database should have no problems with handeling that and much more if its setup properly and of course if your database layout is good.

    Since you mention that your afraid of running into memory problems, are you talking about php or mysql specificly?

    I assume you mean php, in that case you could of course just increase the memory limit. A better solution would be to limit the number of rows inserted at a time and instead to several inserts. By doing that you would make sure that the php memory limit is never exeeded.

    Another thing to consider is how long time the script will use to execute.
    Even if you set the php settings to keep running until the script is finished, the browser window could time out before your script finishes. So if the code could run for a longer amount of time before its finish it could be a good idea to start the script by a asyncron javascript request (popularly called ajax these days), and then update the status of the "run program" when you get a reply back. By doing that you can also restrict that the user restarts the process before its finished.

  11. #11
    SitePoint Zealot
    Join Date
    Sep 2007
    Posts
    136
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    need more info, post some code

  12. #12
    The Omnipresent [ArcanE]'s Avatar
    Join Date
    Mar 2005
    Location
    Belgium
    Posts
    214
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well the database layout is simple ... i've got a table containing html templates which contains wildcards. The wildcards get replaced by the corresponding columns from the usertable (name, email, etc ...) so we can personalize the mails. This makes that no 2 mails are the same.

    What i do is when a user clicks on send mailing i run over all the users in the usertable and replace all wildcards in the mailing template and that result gets saved in the mailQueue table along with the subject, sender address and receiver address.

    The mailqueue table gets processed 100 records at a time with a cronjob that connects to our SMTP server we set up specifically for this purpose.

    The bottleneck here is like TheRedDevil said the moment the template gets 'parsed' and 130.000 records get inserted in the mailQueue table in one go.
    Would an async request solve this? The script runs with ignore_user_abort() and timelimit 0. I also use output buffering to output a message to the browser before the insertion starts to prevent the user from staring at a blank window untill the script finishes.

    I hope this is enough info :-)
    Webdevelopment : Skyrocket Concepts Inventis Web Architects
    Ain't got time for the future or the past.
    Live for the moment, make it last.

  13. #13
    The Omnipresent [ArcanE]'s Avatar
    Join Date
    Mar 2005
    Location
    Belgium
    Posts
    214
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've come up with an idea where i store only the id's of the users that get the mail in a temp table along with the template_id and then run a cronjob every X seconds that does the template parse and insert 100 records at a time? Good idea?
    Webdevelopment : Skyrocket Concepts Inventis Web Architects
    Ain't got time for the future or the past.
    Live for the moment, make it last.

  14. #14
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    I fail to see how that method would be different from the other method? You would only split it into several segments.

    What you try to do should be possible to archive in one go. Ive processed much more data than that in cron jobs that only run 4-5 seconds before.

    Have you gone over looking where the possible resourse/memory hog can be located?

    An asyncron request to start the script will not solve a problem in the script itself, but it would improve the user experience of the script.

    The questions you need to look over is:

    How do you handle the user information?

    How many emails do you process at a time?

    Are you using the best methods/functions for the task? (I.e. the fastest and most resource/memory friendly ones)

    It is very difficult for us to help you, after all we are just describing how to proceed in similar situations. If you post the actual code, it would be easier seeing where you could optimalize it.

  15. #15
    SitePoint Addict
    Join Date
    Aug 2007
    Location
    GR
    Posts
    352
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ArcanE, I think you are approaching it the wrong way.
    There is no need to add all the records in the mail queue table,
    I mean there i no need for a mail queue table either.
    I have written a mailing list manager and what the cron does is search for records of e-mail messages that are flagged "on hold".
    If it finds any it marks them "processing" and then retrieves some addresses
    from the subscribers table to sent. What I hold in the db is after each processing is a value of how many have been mailed which works in conjunction with "... LIMIT $next, 50" and that's all.

  16. #16
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    can you post the actual code you're currently using to retrieve and insert the data?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •