SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Zealot auth1's Avatar
    Join Date
    Nov 2004
    Location
    Melbourne, Australia
    Posts
    167
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Optimizing file reading, looping and database entry

    Hi there,

    Ive just written something quick to enter End of Day stock data from text file into a MYSQL database.

    This is what the stock data looks like, Ive cropped it, but it runs to 3800 lines.

    Code HTML4Strict:
    AAC,20101101,1.6,1.6,1.55,1.555,558449
    AAD,20101101,1.08,1.08,1.04,1.055,1120721
    AAE,20101101,0.011,0.011,0.011,0.011,1504065
    AAG,20101101,0.295,0.295,0.255,0.275,778858
    AAI,20101101,13.1,13.1,13.1,13.1,0

    This is the PHP code:

    Code PHP:
     
    //Stock Format: Code, Date, Open, High, Low, Close, Volume
     
    //File where Stock End of Day data is stored
    $lines = file('stockdata.txt');
     
    foreach ($lines as $line_num => $line) {
        $pieces = explode(",", $line);
        //Where $pieces[0] = Stock Code, $pieces[1] = Date etc
     
        $result = @mysql_query("SELECT * FROM ". $pieces[0] ."");
     
        if (!$result) {
            //No table exists - create table and update table
            //Creating table in MYSQL database
            @$create = mysql_query("CREATE TABLE ". $pieces[0] ." (date int(8), open float(7,3), high float(7,3), low float(7,3), close float(7,3), volume int(8) )");
     
            //Inserting data into the new table
            @$new = mysql_query("INSERT into `". $pieces[0] ."` (date, open, high, low, close, volume) values ('". $pieces[1] ."', '". $pieces[2] ."', '". $pieces[3] ."', '". $pieces[4] ."', '". $pieces[5] ."', '". $pieces[6] ."')"); 
     
        } else {
            //Table found - Insert data into table
            @$new = mysql_query("INSERT into `". $pieces[0] ."` (date, open, high, low, close, volume) values ('". $pieces[1] ."', '". $pieces[2] ."', '". $pieces[3] ."', '". $pieces[4] ."', '". $pieces[5] ."', '". $pieces[6] ."')"); 
     
        }
    }
    ?>

    This script will be run every day to update my data base for daily prices. There are 3800 lines in the text file.

    How does the code look? Are there areas to optimize so it runs quicker or less resource intensive? Granted it takes only 1 second my WAMP server, could it be a problem with 250 entries a year? And how are my data type selections?

  2. #2
    Non-Member Kalon's Avatar
    Join Date
    Aug 2010
    Location
    At my computer
    Posts
    2,012
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    one suggestion:

    mysql_query returns true or false when you do an insert.

    I would check what value is returned, so that if the insert failed for some reason (corrupt or missing data in your text file) you can display an appropriate error message and then also output that problem $line to a separate text file to investigate later.

  3. #3
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,135
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    Here some suggestions:

    Insert ~200 rows at a time, rather than one by one to reduce the number of queries, thus possible chances of failure.

    Switch the table engine to innodb so that entire process can be wrapped in a transaction. If any single insert fails the entire thing can than be rolled back. At which point you could have an email set-out to notify yourself or who ever is responsible for this.

    Wrapping the process in a transaction some what takes care of this one, but I would advise against assuming keys 0 – 6 will exist. Sure, they *should* but that doesn't mean that they *will*. Adding some logic to make sure those keys exist will result in a more robust handling of the entire process.

    The key thing is when dealing with imports build in as many ways to notify yourself or whom ever of possible failure. The more detailed the message is the easier it will be to resolve the issue, when it happens.

    Also, when dealing with imports its best to succeed gracefully or not at all. Generally its easier to rerun an import that failed yet was rolled back than one that was partially imported.

    Really, the code you have will probably run fine, given *ideal* circumstances. However, things are always less than ideal when dealing with user supplied data. Taking it the extra step by adding some error handling and notification will make issues much easier to troubleshoot when they occur.

  4. #4
    Non-Member Kalon's Avatar
    Join Date
    Aug 2010
    Location
    At my computer
    Posts
    2,012
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    with 3800 lines to import I wouldn't do it as an all or nothing transaction unless you need every line to be imported successfully for the data to be useful.

    I don't see the point in reloading just about every line if say only 1 or 2 failed to load.

    What I've done in the past with similar situations is output any problem lines in a separate file, repair those lines and load them into the database.

    But how you do it is your call depending on your circumstances.

  5. #5
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    In case you were not aware you can use load data infile command for mysql

    load data infile csv mysql

    It might not be the exact db table you want, but it might be easier to massage it once it is loaded in a table.

  6. #6
    SitePoint Zealot auth1's Avatar
    Join Date
    Nov 2004
    Location
    Melbourne, Australia
    Posts
    167
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Cheers for the quick replies. It was very quick and I wanted to make sure the fundamentals looked good before I added in the error handling.

    @Kalon: Yeah, all the error suppression was done so I could use statements and end the script early if something didn't insert or create.

    @oddz: I have never worked work innodb before, will look into them. Do I just change them from myisam? Will do some reading regardless.

    I can use count($pieces) to ensure there are 6 values, but whats an elegant way to test there not empty? So for cases below:

    Code HTML4Strict:
    AAC,20101101, , ,1.55,1.555,558449

    I could do:

    Code PHP:
    if ( (empty($pieces[0])==TRUE) || (empty($pieces[1])==TRUE) ...

    but its rather messy and would like to avoid 6 * 3800 statements if possible.

    Thanks again

  7. #7
    SitePoint Zealot auth1's Avatar
    Join Date
    Nov 2004
    Location
    Melbourne, Australia
    Posts
    167
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Something like this, where FILE_NOT_ENOUGH_VALUES and FILE_ERROR_NULL_VALUES are constants.

    Code PHP:
        //Checks that the number of values are correct
        if (count($pieces) != "7") {
            echo "<b>Error:</b> ". FILE_ERROR_NOT_ENOUGH_VALUES ." Line: ". $line_num ."";
            exit;
        }
     
        //Checks that all values exist
        if ( empty($pieces[0]) || empty($pieces[1]) || empty($pieces[2]) || empty($pieces[3]) || empty($pieces[4]) || empty($pieces[5]) || empty($pieces[6]) ) { 
            echo "<b>Error:</b> ". FILE_ERROR_NULL_VALUES ." Line: ". $line_num ."";
            exit;
        }

  8. #8
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Use Spl to iterate the CSV, this will provide an array of values. You can then apply array_filter to this and remove empty values. Additionally, you could wrap this in a FilterIterator which will automatically remove anything that is not x in size.

    Something like...
    PHP Code:
    <?php
    $csv 
    = new SplFileObject('data.csv');
    $csv->setFlags(SplFileObject::READ_CSV);

    class 
    MyFilter extends FilterIterator
    {
      protected
        
    $count;
      
      public function 
    __construct($count$it){
        
    parent::__construct($it);
        
    $this->count = (int)$count;
      }
      
      public function 
    accept(){
        
    $line array_filter(
          
    parent::current()
        );
        return 
    $this->count === count($line);
      }
    }

    foreach(new 
    MyFilter(6$csv) as $line){
      
    #do something with 6 entries.
    }
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •