Optimizing file reading, looping and database entry

Hi there,

Ive just written something quick to enter End of Day stock data from text file into a MYSQL database.

This is what the stock data looks like, Ive cropped it, but it runs to 3800 lines.


AAC,20101101,1.6,1.6,1.55,1.555,558449
AAD,20101101,1.08,1.08,1.04,1.055,1120721
AAE,20101101,0.011,0.011,0.011,0.011,1504065
AAG,20101101,0.295,0.295,0.255,0.275,778858
AAI,20101101,13.1,13.1,13.1,13.1,0

This is the PHP code:



//Stock Format: Code, Date, Open, High, Low, Close, Volume

//File where Stock End of Day data is stored
$lines = file('stockdata.txt');

foreach ($lines as $line_num => $line) {
    $pieces = explode(",", $line);
    //Where $pieces[0] = Stock Code, $pieces[1] = Date etc
    
    $result = @mysql_query("SELECT * FROM ". $pieces[0] ."");

    if (!$result) {
        //No table exists - create table and update table
        //Creating table in MYSQL database
        @$create = mysql_query("CREATE TABLE ". $pieces[0] ." (date int(8), open float(7,3), high float(7,3), low float(7,3), close float(7,3), volume int(8) )");
  
        //Inserting data into the new table
        @$new = mysql_query("INSERT into `". $pieces[0] ."` (date, open, high, low, close, volume) values ('". $pieces[1] ."', '". $pieces[2] ."', '". $pieces[3] ."', '". $pieces[4] ."', '". $pieces[5] ."', '". $pieces[6] ."')"); 
        
    } else {
        //Table found - Insert data into table
        @$new = mysql_query("INSERT into `". $pieces[0] ."` (date, open, high, low, close, volume) values ('". $pieces[1] ."', '". $pieces[2] ."', '". $pieces[3] ."', '". $pieces[4] ."', '". $pieces[5] ."', '". $pieces[6] ."')"); 
                    
    }
}
?>

This script will be run every day to update my data base for daily prices. There are 3800 lines in the text file.

How does the code look? Are there areas to optimize so it runs quicker or less resource intensive? Granted it takes only 1 second my WAMP server, could it be a problem with 250 entries a year? And how are my data type selections?

one suggestion:

[fphp]mysql_query[/fphp] returns true or false when you do an insert.

I would check what value is returned, so that if the insert failed for some reason (corrupt or missing data in your text file) you can display an appropriate error message and then also output that problem $line to a separate text file to investigate later.

Here some suggestions:

Insert ~200 rows at a time, rather than one by one to reduce the number of queries, thus possible chances of failure.

Switch the table engine to innodb so that entire process can be wrapped in a transaction. If any single insert fails the entire thing can than be rolled back. At which point you could have an email set-out to notify yourself or who ever is responsible for this.

Wrapping the process in a transaction some what takes care of this one, but I would advise against assuming keys 0 – 6 will exist. Sure, they should but that doesn’t mean that they will. Adding some logic to make sure those keys exist will result in a more robust handling of the entire process.

The key thing is when dealing with imports build in as many ways to notify yourself or whom ever of possible failure. The more detailed the message is the easier it will be to resolve the issue, when it happens.

Also, when dealing with imports its best to succeed gracefully or not at all. Generally its easier to rerun an import that failed yet was rolled back than one that was partially imported.

Really, the code you have will probably run fine, given ideal circumstances. However, things are always less than ideal when dealing with user supplied data. Taking it the extra step by adding some error handling and notification will make issues much easier to troubleshoot when they occur.

with 3800 lines to import I wouldn’t do it as an all or nothing transaction unless you need every line to be imported successfully for the data to be useful.

I don’t see the point in reloading just about every line if say only 1 or 2 failed to load.

What I’ve done in the past with similar situations is output any problem lines in a separate file, repair those lines and load them into the database.

But how you do it is your call depending on your circumstances.

In case you were not aware you can use load data infile command for mysql

[google]load data infile csv mysql[/google]

It might not be the exact db table you want, but it might be easier to massage it once it is loaded in a table.

Cheers for the quick replies. It was very quick and I wanted to make sure the fundamentals looked good before I added in the error handling.

@Kalon: Yeah, all the error suppression was done so I could use statements and end the script early if something didn’t insert or create.

@oddz: I have never worked work innodb before, will look into them. Do I just change them from myisam? Will do some reading regardless.

I can use count($pieces) to ensure there are 6 values, but whats an elegant way to test there not empty? So for cases below:

AAC,20101101, , ,1.55,1.555,558449

I could do:


if ( (empty($pieces[0])==TRUE) || (empty($pieces[1])==TRUE) ...

but its rather messy and would like to avoid 6 * 3800 statements if possible.

Thanks again

Something like this, where FILE_NOT_ENOUGH_VALUES and FILE_ERROR_NULL_VALUES are constants.


    //Checks that the number of values are correct
    if (count($pieces) != "7") {
        echo "<b>Error:</b> ". FILE_ERROR_NOT_ENOUGH_VALUES ." Line: ". $line_num ."";
        exit;
    }
    
    //Checks that all values exist
    if ( empty($pieces[0]) || empty($pieces[1]) || empty($pieces[2]) || empty($pieces[3]) || empty($pieces[4]) || empty($pieces[5]) || empty($pieces[6]) ) { 
        echo "<b>Error:</b> ". FILE_ERROR_NULL_VALUES ." Line: ". $line_num ."";
        exit;
    } 


Use Spl to iterate the CSV, this will provide an array of values. You can then apply array_filter to this and remove empty values. Additionally, you could wrap this in a FilterIterator which will automatically remove anything that is not x in size.

Something like…


<?php
$csv = new SplFileObject('data.csv');
$csv->setFlags(SplFileObject::READ_CSV);

class MyFilter extends FilterIterator
{
  protected
    $count;
  
  public function __construct($count, $it){
    parent::__construct($it);
    $this->count = (int)$count;
  }
  
  public function accept(){
    $line = array_filter(
      parent::current()
    );
    return $this->count === count($line);
  }
}

foreach(new MyFilter(6, $csv) as $line){
  #do something with 6 entries.
}
?>