SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Evangelist
    Join Date
    Jan 2005
    Location
    UK
    Posts
    539
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    some regex help!

    I need to clean up some text.

    - I need to remove all instances where < occurs as first and only char on a line (but keep the line empty):

    eg

    the cat sat on the mat
    >
    the dog sat on the cat


    - I also need to remove instances where we have uncessary < followed by a space and then a <b> (first char on line)

    eg

    < <b>The cat sat on the mat

    and finally I need to make sure that each line that only contains <b>...any text in here...</b> has one empty line above it (except for the first occurence in the whole string being processed)

    Anyone able to help put this into a regex?

  2. #2
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    72 Post(s)
    Tagged
    0 Thread(s)
    Simple - you have 3 conditions. You need to do 3 evaluations, not 1.
    ~^<$~ will match condition 1.
    ~^< <b~ will match condition 2.
    ~\R+<b>.*?</b>\R~ will match condition 3.

    NOTE: Conditions 1 and 2 are matched on an array. Condition 3 is matched on a string.
    Never grow up. The instant you do, you lose all ability to imagine great things, for fear of reality crashing in.

  3. #3
    SitePoint Evangelist
    Join Date
    Jan 2005
    Location
    UK
    Posts
    539
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks! I'm not very familiar with regex. How would I put this into a pre_replace to act on my string?

    Quote Originally Posted by StarLion View Post
    Simple - you have 3 conditions. You need to do 3 evaluations, not 1.
    ~^<$~ will match condition 1.
    ~^< <b~ will match condition 2.
    ~\R+<b>.*?</b>\R~ will match condition 3.

    NOTE: Conditions 1 and 2 are matched on an array. Condition 3 is matched on a string.

  4. #4
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    72 Post(s)
    Tagged
    0 Thread(s)
    the first two should be put through an -array- based preg_replace (file() the text, or else explode on \n and trim each element). This is done so that the start and end operators can process correctly (otherwise you'd be looking for \n's, which would miss the first line of text...)
    the last one can be preg_replaced directly onto the string, capturing the middle part (so you'll actually need () around the .*? ) and replanting it in your replace.

    Give it a go, and come back with questions.
    Never grow up. The instant you do, you lose all ability to imagine great things, for fear of reality crashing in.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •