SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Zealot
    Join Date
    May 2002
    Posts
    108
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    help with parsing a text file

    i'm not quite sure where i'm trying to go with this (the final ability), but i've been tossing around the idea for awhile now...

    i need to parse a text file...that would be simple cuz i've done it...but i need some more functionality out of it.

    to make things easier, i'm wanting to parse gedcom files. which, for those who don't know look like this:
    Code:
     0 HEAD
    1 SOUR Legacy
    2 VERS 4.0
    2 NAME Legacy (R)
    1 DEST Legacy
    1 DATE 16 Jul 2003
    1 FILE Merchant_Halterman.ged
    1 GEDC
    2 VERS 
    2 FORM LINEAGE_LINKED
    1 CHAR ANSI
    1 SUBM @SUBM@
    0 @SUBM@ SUBM
    1 NAME jon
    1 ADDR http://www.gencircles.com/users/vr6stress
    0 @I1384@ INDI
    1 NAME Albert M /Halterman/
    2 GIVN Albert M
    2 SURN Halterman
    1 SEX M
    1 BIRT
    2 DATE 4 Oct 1861
    2 PLAC Hardy Co., VA.
    1 DEAT
    2 DATE 10 Jun 1911
    1 AFN 4F7X-V3
    1 BAPL
    2 DATE 27 Feb 1974
    2 TEMP PROVO - Provo Utah
    1 ENDL
    2 DATE 19 Mar 1974
    2 TEMP PROVO - Provo Utah
    1 SLGC
    2 DATE 17 Apr 1974
    2 TEMP PROVO - Provo Utah
    1 CHAN
    2 DATE 5 Aug 1999
    3 TIME 01:00
    1 FAMS @F430@
    1 FAMC @F265@
    1 NOTE @NI1384@
    1 SOUR @33389723@
    1 NOTE This individual was found on GenCircles at: http://www.gencircles.com/users/vr6stress/3/data/1384
    0 @NI1384@ NOTE
    1 CONC Source 935255/15/7325303. B2 Apr 75 PV,E 3 Jun 1975 PV,S
    1 CONC P 1 Jul 1975 PV = 822878/44/7432312 
    1 CONT 
    1 CONT From the book Ancestors and Decendents of Jonathan Halterma
    1 CONC n of Hardy County Va now W. Va. by Hamilton Gamble Grady.Pa
    1 CONC ge 128.
    0 @S33389723@ SOUR
    1 TITL Merchant_Halterman
    1 AUTH jon
    1 NOTE http://www.gencircles.com/users/vr6stress/3
    0 TRLR
    what i'm looking for here is the ability to output or print specific lines...like if i wanted to only print:
    Code:
    1 NAME Albert M /Halterman/
    2 GIVN Albert M
    2 SURN Halterman
    1 SEX M
    1 BIRT
    2 DATE 4 Oct 1861
    2 PLAC Hardy Co., VA.
    1 DEAT
    2 DATE 10 Jun 1911
    in a specific format...how do i do that?
    but it may get harder, this is a single person file and some of these things can run lots of multiple peoples....
    i want to be a nerd....

  2. #2
    ********* wombat firepages's Avatar
    Join Date
    Jul 2000
    Location
    Perth Australia
    Posts
    1,717
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Half a reply ...

    PHP Code:
    <?
    <?
    $yaks file'parse_this.txt' ) ;
    $arr = array( ) ;
    foreach( 
    $yaks as $line ){
        list( 
    $num $tag $content ) = split (" "$line) ; 
        
    $arr[$tag]=array( 'num' => $num 'tag' => $tag 'data' => $content ) ;
    }
    //print_r($arr);
    $disp = array( 'NAME' 'GIVN' 'SURN' ) ;
    foreach(
    $disp as $key){
        echo 
    implode' ' $arr$key ] ) . "\n<br />" ;
    }
    ?>
    only half a reply as when you get multiple tags of the same name they get overwritten , but you could keep another array counting occurances of tag names and remane them $tag.'0' / 1 etc , all depends.

    as for multiple records in the same file , its hard to say unless we know what the delimiter is for that , is it a newline ? or something more verbose , wither way thats not a real problem to overcome , more the dodgy naming conventions.

  3. #3
    SitePoint Zealot
    Join Date
    May 2002
    Posts
    108
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    first things first...

    firepages...hmmm i know you - well sort of, i know of your website - i always snag a copy of phpdev when ever i reload a system...(when's the next version coming out???)

    nextly...i probably should have mentioned how bad i am at php...lol

    and so here's come the real noobie type questions

    can we go over the code you posted line by line?
    i mean i know the first one, and second one:
    PHP Code:
    $yaks file'parse_this.rxt' );
    $arr = array(); 
    then once we get to:
    PHP Code:
    $disp = array( 'NAME' 'GIVN' 'SURN' ); 
    that is, i'm guessing were it locates those values in the array and uses that as "id" if you will on grabbing those values next to it:
    NAME = Albert M Halterman
    GIVN = Halterman
    etc....

    i'm a little fuzzy on the next part though:
    PHP Code:
    foreach($disp as $key){ 
    but i think i get the rest...the implode "outputs" the info in a specific "format", right?

    i hope you don't mind helping me on this...or mind that i ask so many questions...

    but thanks so far for the help...

    oh and as far as the delimiter, for multiple records it would be the 0 @I1384@ INDI, line closer to the top, each person starts with this line, but it does change, the @XXXXX@ is basically their id number...then it ends with the 1 NOTE line. everything in between would be 1 individual.
    i want to be a nerd....

  4. #4
    ********* wombat firepages's Avatar
    Join Date
    Jul 2000
    Location
    Perth Australia
    Posts
    1,717
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi , ok well this is for multiple INDI's , there are probably a hundred other ways of approaching this , this was the first one that sprang to mind..

    note that this still does not cover the issue of multiple tags.

    here we load the file and then scroll through checking each time if we have a new user ID , if we do we place that users data into its own array.

    with >PHP4.3.3 and sqlite it would be a damn fine idea to insert the data into a temporary table as we parse it , then you could use SQL to grab the data as you wished !
    PHP Code:
    <?
    $yaks 
    file'parse_this.txt' ) ;
    $arr = array( ) ;
    foreach( 
    $yaks as $line ){
    /*
    each time we find a new user id grab that id for reference
    */
    if( preg_match("/@(.*)@ INDI/",$line,$regs) ){
    $this_user_id=$regs[1];
    }
    /* sadly this misses all the data before the first INDI tag !!*/
    if( isset( $this_user_id ) ){
    /*
    use split to explode the line into 3 parts , using a space as the delimiter
    the first part will be the number , the second the 'tag' the third the rest
    of the string which is data
    */
    list( $num $tag $content ) = split (" "$line) ;
    /*
    now $num , $tag and $content contain the data from this $line
    I have stored them in an array $arr() for easy access but how 
    you store the data is up to you
    I have used $tag as the array index as it makes for easy retreival
    */
    $arr[$this_user_id][$tag]=array( 'num' => $num 'tag' => $tag 'data' => $content ) ;
    }
    }
    /*see the whole array of data*/
    echo '<pre>';
    print_r($arr);
    echo 
    '</pre>';
    ?>
    everything below is purely an example of how to display the data
    e.g. just a shorthand way of saying ...
    echo $arr['I1385']['NAME'][0] . $arr['I1385']['NAME'][1] . $arr['I1385']['NAME'][2] .'<br />';
    echo $arr['I1385']['GIVN'][0] . $arr['I1385']....etc
    this way you could create a function that took $disp(lay) as an argument
    and return formatted data for the given tag names
    e.g.
    PHP Code:
    <?
    function display$data $user_id $display ){
    foreach(
    $display as $key){
    //implode the array into a string for display//
    $str .= implode' ' $data$user_id ][ $key ] ) . "\n<br />" ;
    }
    return 
    $str;
    }
    //would display the data for the given tag names NAME,GIVN,SURN etc for user I1385//
    echo display$arr 'I1384' , array( 'NAME' 'GIVN' 'SURN' ) ) ;
    ?>

  5. #5
    SitePoint Zealot
    Join Date
    May 2002
    Posts
    108
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    what about dumping it into mysql?
    i want to be a nerd....


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •