SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    Mal Reynolds Mandibal's Avatar
    Join Date
    Aug 2003
    Location
    Columbus
    Posts
    718
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    RegEx help in Ruby

    I've got a source php file that was created from a dreamweaver template. So its a big mess. I want to pull out just the content part that is in between the template tags marking it.
    HTML Code:
    <!-- #BeginEditable "main" --><img src="../images/adv_com/lady_reading.jpg" alt="one lady reading to another" width="195" height="147" align="right"> 
                Welcome to the Module on advanced communication skills. Hopefully, 
                you already feel proficient in the active listening skills you learned 
                in the Basic Communication Module. If you haven't already, review <a href="../basic_com/index.php">
    			take another look at this 
                Module</a>. The ability to discuss bad news or the topic of death 
                and dying requires a level of therapeutic communication competency 
                that is built upon your own comfort level talking about a very big 
                subject. 
                <p>We will be building on that foundation in order to become skillful 
                  in learning to talk to our residents about end of life and the care 
                  that they need. In this Module we will learn to use a communications 
                  protocol as a tool for discussing bad news. Helping people make 
                  decisions about their code status and assisting them with transitions 
                  from curative to palliative care will be reviewed.</p>
                
                <p>Is there anyone in your facility who is a good role model as someone 
                  who communicates well with residents and families surrounding issues 
                  of death? If so, how can this person help you to improve your skill 
                  level? <!-- #EndEditable -->
    The above is an example of what I want to pull out of the file. The #BeginEditable and #EndEditable comments are the markers. There may or may not be whitespace including newlines before and after the markers. There is a lot of html/php before and after these tags. I need the content between the tags (including the tags is ok too).

    Any help with a regular expression (or Hpricot although I don't think it's possible) to capture this would be greatly appreciated. I'm really struggling with finding a working regular expression and need it to parse about 150 files. Thanks for the help.
    Erh

  2. #2
    SitePoint Evangelist
    Join Date
    Feb 2006
    Location
    Worcs. UK
    Posts
    404
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think the pattern to use is:
    Code:
    /#BeginEditable(.*)#EndEditable/m
    The key points are that .* represents any number of any characters. Putting this within a set of parenthesis makes it easy to extract just the included text. The m at the end makes the matching work over multiple lines.

    If you make a new file called test.rb and put the following code in it, you can play with the pattern matching.
    Code:
    test_text = <<EOF
    <!-- #BeginEditable "main" --><img src="../images/adv_com/lady_reading.jpg" alt="one lady reading to another" width="195" height="147" align="right"> 
                Welcome to the Module on advanced communication skills. Hopefully, 
                you already feel proficient in the active listening skills you learned 
                in the Basic Communication Module. If you haven't already, review <a href="../basic_com/index.php">
          take another look at this 
                Module</a>. The ability to discuss bad news or the topic of death 
                and dying requires a level of therapeutic communication competency 
                that is built upon your own comfort level talking about a very big 
                subject. 
                <p>We will be building on that foundation in order to become skillful 
                  in learning to talk to our residents about end of life and the care 
                  that they need. In this Module we will learn to use a communications 
                  protocol as a tool for discussing bad news. Helping people make 
                  decisions about their code status and assisting them with transitions 
                  from curative to palliative care will be reviewed.</p>
                
                <p>Is there anyone in your facility who is a good role model as someone 
                  who communicates well with residents and families surrounding issues 
                  of death? If so, how can this person help you to improve your skill 
                  level? <!-- #EndEditable -->
    EOF
    
    test_pattern = /#BeginEditable(.*)#EndEditable/m
    
    result = test_pattern.match(test_text)
    
    puts "Everything found:"
    puts result[0]
    
    puts "\n\nExtracted text:"
    puts result[1]

  3. #3
    Mal Reynolds Mandibal's Avatar
    Join Date
    Aug 2003
    Location
    Columbus
    Posts
    718
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Gah! I didn't think (.*) was working for me. Maybe it was the rest of the regex I had and the way I was using it in Ruby. Thanks man. I tested it with a larger set of text and it works well.
    Erh


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •