SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 27
  1. #1
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Need some good info and/or tutorials on parsing.

    I have done some simple parsing but I really don't know much about this aspect of php. I have a remote .txt file that I have urgent need to parse and place info into mysql. The .txt file itself is not set up well for this so I really need to learn more about parsing. Any suggestion?

  2. #2
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Depends on what you mean by "parsing". Theres parsing where you scan through the file a character at a time and theres parsing using Regex.

    Perhaps if you posted a sample of the file you're trying to parse?

  3. #3
    SitePoint Wizard silver trophy redemption's Avatar
    Join Date
    Sep 2001
    Location
    Singapore
    Posts
    5,269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Neither do I (have experience in file parsing in PHP) really, but I've had lots of it in Perl. I think it shouldn't be too difficult in PHP though.

    Depending on what you want to parse, you should probably have to loop thru every line and extract the relevant info using regular expressions, or use substr() (for record based data), or explode() (for delimited data, or a combination of all of these.

    What is the structure of the file like?

  4. #4
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  5. #5
    SitePoint Wizard silver trophy redemption's Avatar
    Join Date
    Sep 2001
    Location
    Singapore
    Posts
    5,269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm... considering the clean structure of the data, as well as the presence of delimiters, it should be an easy, if tedious, task.

    What problems are you facing, and what are you trying to parse?

  6. #6
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The problem is I haven't got a clue where to start. This is why I am looking for some good referance material. But can't seem to to find any.

    I need to grab every single item and quantity and put it in a database for further use.

  7. #7
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well why don't you post a sample of the file then someone can give you an idea of how to do it?

  8. #8
    SitePoint Wizard silver trophy redemption's Avatar
    Join Date
    Sep 2001
    Location
    Singapore
    Posts
    5,269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Phil.Roberts
    Well why don't you post a sample of the file then someone can give you an idea of how to do it?
    He actually did (check post #4)

    Normally, I would try to post some code for you to work thru but it's a busy time for me (I should be studying, as you may already know ).

    I suggest you take a look at the filesystem functions (http://www.php.net/manual/en/ref.filesystem.php), particularly at fopen(), fread(), fgets(), fseek(). Also, take a look at explode() to deal with separating the comma-delimited data.

    Luck!

  9. #9
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    welll from looking at that file id say start with breaking the file up into sections by using explode() and specifying a blank line as the delimeter... then you can process eaxch section individually.

    Can't do anything with it myself as I'm at work. (grrrr)

  10. #10
    SitePoint Wizard siteguru's Avatar
    Join Date
    Oct 2002
    Location
    Scotland
    Posts
    3,631
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    If you file() the .txt file then you get an array with each element containing a line from the file. Since the lines follow a predetermined sequence then you can simply iterate through the array and use explode() on each array element (on the comma) to create further arrays holding each discrete item.
    Ian Anderson
    www.siteguru.co.uk

  11. #11
    Sidewalking anode's Avatar
    Join Date
    Mar 2001
    Location
    Philadelphia, US
    Posts
    2,205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would explode on a blank line over using file(), as with the first you can operate on each record as a distinct unit.
    TuitionFree a free library for the self-taught
    Anode Says... Blogging For Your Pleasure

  12. #12
    SitePoint Wizard siteguru's Avatar
    Join Date
    Oct 2002
    Location
    Scotland
    Posts
    3,631
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Fair point, I was merely posing other options available.
    Ian Anderson
    www.siteguru.co.uk

  13. #13
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by anode
    I would explode on a blank line over using file(), as with the first you can operate on each record as a distinct unit.
    That would seperate them into different units. How do you explode on a blank line?

  14. #14
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    $data = explode("\n\n", $array);

    The big problem is that without knowing the format used to store the data its hard to come up with a way to parse it. The data in the text file doesn't seem to form any kind of logical structure like you would find in a CSV file....

  15. #15
    SitePoint Wizard siteguru's Avatar
    Join Date
    Oct 2002
    Location
    Scotland
    Posts
    3,631
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Unless I didn't look far enough into the file, it looked like it did follow a sequence.
    Ian Anderson
    www.siteguru.co.uk

  16. #16
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It follows a squence yeah, but what is the schema behind it all?

  17. #17
    SitePoint Wizard siteguru's Avatar
    Join Date
    Oct 2002
    Location
    Scotland
    Posts
    3,631
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Only Jump knows that.
    Ian Anderson
    www.siteguru.co.uk

  18. #18
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok for example.
    The item betewen the "" is the actual identifier/name that would basicaly be Primary.
    The next number ,8 would tell how many facilities/shops it can hold.
    Next, The Main Gate, would be the location.
    Next, 46.1,-22.1,-27.0, would be 3 dim coordinates in that location.
    Next,public, would be wether it was public or private.
    Next, A:# F:# P:# would be 3 types of services it might offer and the cost.


    " Q.U.I.T. Arty Arcade",8,The Main Gate,46.1,-22.1,-27.0,public,A:1000 F:1000 P:100

    Since it has space for 8 facilities/shops they are listed here even if unused. It can be 2, 4, or 8. So if it was 2 there would only be 2 listings here not 8.

    Ore Silo
    Refueling Tank
    Repair Shop
    Ammunition Shop
    Empty Slot
    Empty Slot
    Empty Slot
    Empty Slot

    The rest is items offered for sale, quantity, price. The c in price needs to be striped.

    PC-DSS4 'Poor',4,c35000
    PC-DSS2,2,c45000
    HellRazor,2,c20000
    Flatiron,1,c11500
    PC-DSS3,4,c50000
    PCP-1,2,c450000
    FlashFire,22,c10000

    That's basically it. Thanks for the info on exploding on a blank line Phil.Roberts. I didn't know you could do that.

  19. #19
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm, I'm gonna have a shot at this when I get home as its definitly got me interested.

    Seems to me like you'd need 2 related MySQL tables for the data. The first containing the header line and the second for the on-sale items......

  20. #20
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, I've put together a class that parses the data from the file into a (very big) array. I'll leave getting the data from that array into the database up to you.

    Heres the class source:
    http://www.flatnet.net/source.php?file=parsefile.php
    How to use it:
    http://www.flatnet.net/source.php?file=testparse.php
    Example output:
    http://www.flatnet.net/testparse.php

    I should warn that this class is blindingly inefficient as reading this much data into memory really ain't a good idea....

    [edit:

    This class isn't perfect as any commas in the section titles will result in the array being corrupted.

    ]

  21. #21
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wow, my head is spinning. I wish I could whip stuff out that fast. Might take me a bit to chew on this. Thanks.

  22. #22
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That really is a big file to read into memory. Is there anything I can do to make it more efficient? I haven't had alot of time to look at it this week, but plan to this weekend.

  23. #23
    No. Phil.Roberts's Avatar
    Join Date
    May 2001
    Location
    Nottingham, UK
    Posts
    1,142
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Umm yeah there is probably a way.... Can't really think of it offhand but it'd probably involve processing each section seperatly rather than sucking them all into memory....

  24. #24
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ouch. Yep your right. Comma's in the section titles do corupt the array. Not sure how to fix that. They are marked with " " arround them. Anyway to identify them between the " " and strip/ignore commas in those titles?

  25. #25
    Now with customized title Jump's Avatar
    Join Date
    Sep 2002
    Location
    The Restaurant at The End of The Universe
    Posts
    1,423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok with the help of Website and Trav the comma's issue is fixed.

    PHP Code:
    <?php
    class ParseDatafile
    {
     var 
    $section_array;
     var 
    $final_output;
     function 
    ParseDataFile($file)
     {
      
    // Suck the input file into memory
      // Very inefficient, but I'm lazy.
      
    $data implode(''file($file));
      
    // Remove unwanted carriage-return chars
      
    $data str_replace("\r"""$data);
       while (
    preg_match('#"(.*?),(.*?)"#'$data)) { 
        
    $data preg_replace('#"(.*?),(.*?)"#''"$1:$2"'$data); 
       }
      
    // Split the data into sections
      
    $this->section_array explode("\n\n"$data);
     }
     
    /***
      * ParseDataFile parseFile()
      * Iterates through each section, and adds the processes
      * result to the $final_output variable.
      * @access  public
      * @returns array
      ***/
     
     
    function parseFile()
     {
      foreach(
    $this->section_array as $section) {
       
    $this->final_output[] = $this->parseSection($section);
      }
      return 
    $this->final_output;
     }
     
    /***
      * ParseDataFile parseSection()
      * Scans each line of the section and adds the data to its
      * named array field.
      * @access  private
      * @return  array
      * @param   string  $section
      ***/
     
     
    function parseSection($section)
     {
      
    // Make sure this section isn't just a blank line
      
    if($section != "") {
       
    // Split the section into an array of lines
       
    $lines explode("\n"$section);
       
       
    // Grab the header data
       
    $header_tmp explode(","$lines[0]);
       
    $output['header']['station'] = $header_tmp[0];
       
    $output['header']['facility_count'] = $header_tmp[1];
       
    $output['header']['sector'] = $header_tmp[2];
       
    $output['header']['x_coord'] = $header_tmp[3];
       
    $output['header']['y_coord'] = $header_tmp[4];
       
    $output['header']['z_coord'] = $header_tmp[5];
       
    $output['header']['access'] = $header_tmp[6];
       
    $services_tmp explode(" "$header_tmp[7]);
       foreach(
    $services_tmp as $service) {
        list(
    $name$price) = explode(":"$service);
        
    $service_array[$name] = $price;
       }
       
    $output['header']['services'] = $service_array

       
    // Grab the facilities data, basing the item count on the
       // figure gleaned from the header
       
    for($i=1$i<$output['header']['facility_count'] + 1$i++) {
        
    $output['station_modx'][] = $lines[$i];
       }
       
    $sizeof_lines sizeof($lines);
       
    // Grab the items, each item is an array of its values
       
    for($i2 $i$i2 $sizeof_lines$i2++) {
        
    $tmp_array explode(","$lines[$i2]);
        
    // Remove the "c" from the item price value
        
    $tmp_array[2] = str_replace("c"""$tmp_array[2]);
        
        
    $output['items'][] = $tmp_array;
       }
       return 
    $output;
      }
     }
    }
    ?>
    Anyone else have any ideas on making this more efficient?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •