SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Zealot marcoBR's Avatar
    Join Date
    Jun 2002
    Location
    Brazil
    Posts
    149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular Expression Trouble

    Hello!

    Please, take a look in the following code:
    PHP Code:
    $test '<tag/>Hello, but i shouldnt appear here...<tag>I should appear here!!!</tag>';

    preg_match_all('/<.*?[^\/]>.*?<\/.*?>/s'$test$tags);

    foreach(
    $tags[0] as $tag){
        echo 
    htmlentities($tag);

    Test it and see what's happening...

    I'm trying to match only:
    <tag>I should appear here!!!</tag>

    but it's matching the whole string:
    <tag/>Hello, but i shouldnt appear here...<tag>I should appear here!!!</tag>

    Note: I'm using '/s' to allow pattern also matches strings with '\n' new lines.

    What's wrong with my pattern??? Please, help me!!!

    Regards,

    marcoBR

  2. #2
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    <.*?[^\/]>

    should be

    PHP Code:
    '<[^\/]*?>' 
    hope that helps.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  3. #3
    SitePoint Zealot
    Join Date
    Dec 2001
    Location
    UK
    Posts
    105
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    preg_match_all('/<[a-zA-Z0-9]+[^\/]>.*<\/.*>/s'$test$tags); 
    Also I removed some of the ? as you didn't need them because you are using .* (0 or more occurrances, which basically means it's optional).

  4. #4
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the ? also plays a part in making sure it is ungreedy.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  5. #5
    SitePoint Zealot marcoBR's Avatar
    Join Date
    Jun 2002
    Location
    Brazil
    Posts
    149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks cyborg, it works like a charm.

    For intending purpose, can you explain me what did you do?

    For example: Why did you remove the '.' meta-character?

  6. #6
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the . was matching everything, including the / and then you were saying make sure there is no /.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  7. #7
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    and actually, i'd do this:

    PHP Code:
    preg_match('/<([^\/]*?)>.*?<\/\1>/s'$test$tags); 
    to make sure you only match an ending tag which matches the starting tag.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  8. #8
    SitePoint Zealot marcoBR's Avatar
    Join Date
    Jun 2002
    Location
    Brazil
    Posts
    149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What are meaning *? in this case <([^\/]*?)>

  9. #9
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    * means 0 or more occurrances of [^\/], meaning find 0 or more characters which aren't /. The ? there means that it should be ungreedy.

    on second thought, I think this would be better:
    PHP Code:
    preg_match_all('/<([\w]+)[^\/]*?>.*?<\/\1>/s'$test$match); 
    I think that would be better.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  10. #10
    SitePoint Zealot marcoBR's Avatar
    Join Date
    Jun 2002
    Location
    Brazil
    Posts
    149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Please, give a short explanation about greedy and ungreedy and if possible a real sample. I'm having some difficult to understand it reading the php pcre manual, sorry.

  11. #11
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    example:

    Code:
    <b>Hello</b> <b>what?</b>
    PHP Code:
    preg_match('/<b>.*<\/b>/'$code); 
    would match <b>Hello</b> <b>what?</b>, whereas

    PHP Code:
    preg_match('/<b>.*?<\/b>/'$code); 
    would match <b>Hello</b> and <b>what</b> seperately, instead of matching the outside <b> and </b>. Hope that explains it a little better....

    (Also, my regex in the last post works well right now.)
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  12. #12
    SitePoint Zealot marcoBR's Avatar
    Join Date
    Jun 2002
    Location
    Brazil
    Posts
    149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent explanations cyborg, thankyou very much and sorry for spent your precious time!!!

  13. #13
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The regexps listed here will also match the string:

    <tag/>Hello, but i shouldnt appear here...<tag>I should appear here!!!</Anythingyoucaretoputhere>
    If this is not what you want, You might want to try something like

    PHP Code:
    '/<(\w+)\s*([^>]*)>(.*)<\/\1>/Usi' // Untested 
    This will match pairs of the same tag.

    \1 is the tag
    \2 is the tags attributes
    \3 is the contents of the tag

    This regexp fails for this case:

    <tag attribute="<yipes>">stuff</tag>
    Sadly, not everyone uses entities when they should.

    Beware this too:

    <FONT FACE="Arial">Parsing HTML with regular expressions can get you into<FONT COLOR="RED">Trouble</FONT> when you have the possibility of nested Tags</FONT>
    This regexp will further break down the attributes of the tag if you need to:
    PHP Code:
    "/(\w+)\s*(=\s*(\"|')?((?(3)[^'\"]*|[^\s]*))(?(3)\\3))?\s*/" 

  14. #14
    "Of" != "Have" bronze trophy Jeff Lange's Avatar
    Join Date
    Jan 2003
    Location
    Calgary, Canada
    Posts
    2,063
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Read post #9.

    I didn't know how complex to make the regex, because I don't know exactly what he wants to match.
    Who walks the stairs without a care
    It shoots so high in the sky.
    Bounce up and down just like a clown.
    Everyone knows its Slinky.

  15. #15
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cyborg from dh
    Read post #9.
    Oops, sorry. I don't know how i missed that.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •