SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Zealot ricklach's Avatar
    Join Date
    Nov 2004
    Location
    Victoria BC
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Help on reading a text file

    The following is an example extract from a GEDCOM text file
    Code:
    0 @I1@ INDI
    1 REFN 1
    1 NAME Maurice /Lampron/ Lacharité
    2 GIVN Maurice
    2 SURN Lampron
    2 NSFX Lacharité
    2 SOUR @S188@
    3 PAGE Page 1
    2 SOUR @S1457@
    1 NAME Maurice /Laspron/ dit Lacharité
    2 GIVN Maurice
    2 SURN Laspron
    2 NSFX dit Lacharité
    2 SOUR @S31@
    3 PAGE See File 1133-1
    2 SOUR @S43@
    3 PAGE See File 1927-1
    2 SOUR @S98@
    3 PAGE See Page 659
    1 NAME Maurice /Lapron/ dit Lacharité
    2 GIVN Maurice
    2 SURN Lapron
    2 NSFX dit Lacharité
    2 SOUR @S247@
    3 PAGE E-copy of the marriage entry for Maurice Lapron dit Lacharité and Jeann
    4 CONC e Archambault.
    1 SEX M
    1 CHAN
    2 DATE 27 NOV 2006                  
    1 BIRT
    2 DATE 26 AUG 1685
    2 PLAC Nicolet River, Nicolet, Nicolet-Yamaska, Québec, Canada, 461300N0723700W
    2 SOUR @S31@
    3 PAGE See File 1133-1
    2 SOUR @S284@
    3 PAGE Extract from the church register
    2 SOUR @S1457@
    1 EVEN
    2 TYPE Anecdote
    2 NOTE Maurice, son of Jean Laspron dit Lacharite and Anne Michelle Renaud, wa
    3 CONC s born on 26 Aug 1685 and baptized at the cripécuriale de Cressé, as Maur
    3 CONC ice Lapron dit Lacharite on 2 Sep 1685 on the Nicolet River, Nicolet, Qué
    3 CONC bec, Canada.  The following is a transcript of the original church record
    3 CONC :
    3 CONT Le deuxième jour de septembre del'an mil six cent quatre vingt cinq pa
    3 CONC r moy [moi], J.G. de Brurlon, curé de l'Eglise paroissiale de Notre Dam
    3 CONC e des Trois Rivières, a esté [été] baptisé en la maison cripécuriale de C
    3 CONC ressé, oû l'on dit la messe, Maurice, fils de Jean Lapron dit Lacharité e
    3 CONC t de Michelle Anne Renaud sa femme, habitants du dit lieu de Cressé.  L'E
    3 CONC nfant est né du vingt sixième d'aoust [août] dela mesme [meme] année.  So
    3 CONC n parrain fut Maurice Cardin, fils de Pierre Loiseau et la marraine (Loru
    3 CONC sse ?) Lemirre, femme de Pierre Pepin, tous habitants du dit lieu de Cres
    3 CONC sé, lesquels ont déclaré ne ....., si signer, de ce (..quis ?) suivant l'
    3 CONC ordonnance. - J.G. de Brurlon"
    3 CONT He may also have been known as Maurice Laspron dit Lacharité.
    3 CONT At around age 20 he must have moved to Pointe aux Trembles because on 1
    3 CONC 3 Apr 1711 he married Marie Aubuchon, daughter of Jean Aubuchon dit Lespe
    3 CONC rance and Marguerite Sédillot at Eglise Enfant-Jésus, Pointe aux Trembles
    3 CONC , Ile de Montréal, Québec, Canada. He had two known children with Marie b
    3 CONC ut there were probably more. Following Marie Aubuchon's death sometime be
    3 CONC fore 1749, he married Marie Jeanne Archambault, daughter of Laurent Archa
    3 CONC mbault and Anne Courtemanche on 7 Jan 1749 at L'Enfant Jesus, Pointe au
    3 CONC x Trembles, Isle de Montréal, Québec, Canada.
    3 CONT 
    3 CONT  The coureurs de bois were a hardy and sometimes savage group of Frenchm
    3 CONC en that illicitly traded with the Indians to get the pick of the firs an
    3 CONC d sometimes get the better of them in a trade by getting them drunk.  Sin
    3 CONC ce Montréal was the headquarters of these lawless men, it is entirely pos
    3 CONC sible that Maurice was indeed a coureurs de bois.  It is known that he wa
    3 CONC s employed by one of the fur trading companies, probably the Company of N
    3 CONC ew France or of a Hundred Associates as it became known, circa 23 May 171
    3 CONC 7.  There is plenty more history to be discovered.
    3 CONT 
    3 CONT He died on 19 Dec 1749, just 11 months after his second marriage, and wa
    3 CONC s buried on 20 Dec 1749 in the cemetery, Pointe du Trembles, Pointe du Tr
    3 CONC embles, Isle de Montréal, Québec, Canada.  He was 64 years old.
    1 DEAT
    2 DATE 19 DEC 1749
    2 PLAC Pointe-aux-Trembles, Montréal, Québec, Canada, 453900N0733000W
    2 SOUR @S247@
    3 PAGE E-copy of the burial entry for Maurice Lapron dit Lacharité.
    1 BURI
    2 DATE 20 DEC 1749
    2 PLAC Cemetery, Pointe-aux-Trembles, Pointe-aux-Trembles, Montréal, Québec, Canada, 453900N0733000W
    2 SOUR @S247@
    3 PAGE E-copy of the burial entry for Maurice Lapron dit Lacharité.
    1 EVEN
    2 TYPE Baptism
    2 DATE 02 SEP 1685
    2 PLAC Cripecuriale de Cressé, Nicolet, Nicolet-Yamaska, Québec, Canada, 461300N0723700W
    2 NOTE The following is from the original church record:  "Baptism of Maurice La
    3 CONC pron dit Lacharité - Le deuxième jour de septembre del'an mil six cent qu
    3 CONC atre vingt cinq par moy [moi], J.G. de Brurlon, curé de l'Eglise paroissi
    3 CONC ale de Notre Dame des Trois Rivières, a esté [été] baptisé en la maison c
    3 CONC ripécuriale de Cressé, oû l'on dit la messe, Maurice, fils de Jean Lapro
    3 CONC n dit Lacharité et de Michelle Anne Renaud sa femme, habitants du dit lie
    3 CONC u de Cressé.  L'Enfant est né du vingt sixième d'aoust [août] dela mesm
    3 CONC e [meme] année.  Son parrain fut Maurice Cardin, fils de Pierre Loiseau e
    3 CONC t la marraine (Lorusse ?) Lemirre, femme de Pierre Pepin, tous habitant
    3 CONC s du dit lieu de Cressé, lesquels ont déclaré ne ....., si signer, de c
    3 CONC e (..quis ?) suivant l'ordonnance. - J.G. de Brurlon"
    2 SOUR @S44@
    1 OBJE
    2 FORM JPEG
    2 TITL Burial
    2 FILE c:\The Master Genealogist\Documents\1-02 Maurice Lampron Lacharite Burial.jpg
    1 OBJE
    2 FORM JPEG
    2 TITL Marriage
    2 FILE c:\The Master Genealogist\Documents\1-03 Maurice Lampron Lacharite Marriage.jpg
    1 OBJE
    2 FORM JPEG
    2 TITL Marriage
    2 FILE c:\The Master Genealogist\Documents\1-01 Maurice Lampron Lacharite Marriage.jpg
    1 FAMS @F1@
    1 FAMS @F2@
    1 FAMS @F3@
    1 FAMC @F4@
    There are about 40-50 different well defined tags and the flow of the file is reasonably apparent. What I want to be able to do is read the file and parce it into its components so that each section of text starting at @I1@ INDI is assigned to various complete strings in a name/value pair (probably inside a hash???) that can be operated on to assign to various fields in various tables. Since there seems to be an easy way to do this kind of parsing in RoR I wonder if anyone can give me a suggestion as to how to start. I am thinking look for the pattern "@I..@ INDI and put everything between the first occurance and subsequent occurances into a hash then once this is all done then iterate through the hashs to collect things into strings with name/value pairs and once that is done add the results to tables. I would like to hear other solutions that may be more elegant and rubyesque.

    Rick
    Ruby, Ruby when will you be mine

  2. #2
    SitePoint Guru
    Join Date
    Aug 2005
    Posts
    986
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Maybe this would work:

    Code:
    result = []
    
    IO.readlines('file.txt').each do |line|
      all, number, tag, rest = line.match(/([0-9]) ([A-Z]{4}) (.+)/).to_a
      ((result[number.to_i] ||= {})[tag] ||= '') << rest.to_s
    end
    It puts the data from the file in an array. The numbers before each line are used as the keys of the array. This array contains a hash. The keys of this hash are the tags (second thing on the line, eg FORM, FAMC). The values of this hash is the rest of the line. If it sees multiple lines with the same number and tag then the rest of the line is concatenated to the previous contents of the tag.

    Is this what you want?

  3. #3
    SitePoint Zealot ricklach's Avatar
    Join Date
    Nov 2004
    Location
    Victoria BC
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I am not sure if it is what I want. I will play with it and look at the values in the arrays and then see if I can parce the arrays to extract the data for saving. To give you a better understanding of the format I have included this snippet that describes an individual:
    Code:
    @<XREF:INDI>@  INDI {1:1}
        +1 RESN <RESTRICTION_NOTICE>  {0:1}
        +1 <<PERSONAL_NAME_STRUCTURE>>  {0:M}
        +1 SEX <SEX_VALUE>   {0:1}
        +1 <<INDIVIDUAL_EVENT_STRUCTURE>>  {0:M}
        +1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>>  {0:M}
        +1 <<LDS_INDIVIDUAL_ORDINANCE>>  {0:M}
        +1 <<CHILD_TO_FAMILY_LINK>>  {0:M}
        +1 <<SPOUSE_TO_FAMILY_LINK>>  {0:M}
        +1 SUBM @<XREF:SUBM>@  {0:M}
        +1 <<ASSOCIATION_STRUCTURE>>  {0:M}
        +1 ALIA @<XREF:INDI>@  {0:M}
        +1 ANCI @<XREF:SUBM>@  {0:M}
        +1 DESI @<XREF:SUBM>@  {0:M}
        +1 <<SOURCE_CITATION>>  {0:M}
        +1 <<MULTIMEDIA_LINK>>  {0:M}
        +1 <<NOTE_STRUCTURE>>  {0:M}
        +1 RFN <PERMANENT_RECORD_FILE_NUMBER>  {0:1}
        +1 AFN <ANCESTRAL_FILE_NUMBER>  {0:1}
        +1 REFN <USER_REFERENCE_NUMBER>  {0:M}
          +2 TYPE <USER_REFERENCE_TYPE>  {0:1}
        +1 RIN <AUTOMATED_RECORD_ID>  {0:1}
        +1 <<CHANGE_DATE>>  {0:1}
    If you need additional information the whole standard is at this link:http://homepages.rootsweb.com/~pmcbr...om/55gcch2.htm

    Thanks for the start.

    Rick
    Ruby, Ruby when will you be mine


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •