SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Thread: regexp question

  1. #1
    SitePoint Wizard
    Join Date
    Jan 2005
    Location
    blahblahblah
    Posts
    1,447
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    regexp question

    Hi,

    I'd like to do the following: explode an html document using all the occurences of tag. If [tag] appears 12 times, array will contain 12 elements.

    Here's what I'm trying to do:

    PHP Code:
    $blocks preg_split("/<$tag(.)+>/"$html); 
    It doesn't work as you can expect... Would someone know how to fix this?

    Regards,

    -jj.

  2. #2
    SitePoint Guru risoknop's Avatar
    Join Date
    Feb 2008
    Location
    end($world)
    Posts
    834
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    SitePoint Wizard
    Join Date
    Jan 2005
    Location
    blahblahblah
    Posts
    1,447
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It doesn't take into account the fact that there may be 'class="blah"' and 'id="foo"' within the html tag.

  4. #4
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Have you tried:

    PHP Code:
    $blocks preg_split("/<{$tag}([^>]*)>/"$html); 
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  5. #5
    SitePoint Wizard
    Join Date
    Jan 2005
    Location
    blahblahblah
    Posts
    1,447
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It didn't work...

    I changed my approach a little. I'm trying to find a reg exp that could do the following (please consider this string):

    HTML Code:
    id="container">
    I know there are a few characters before we reach the ">". I'd like to explode my string after the first ">" occurence, and get what was before the ">" and what comes after the ">" as two array elements... But regExp are beyond me...

  6. #6
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Piece of advise that SilverBullet is always giving people doing the same thing as you - if you want to find elements etc in HTML or XML, utilise the DOM!

    I really need to delve into programming with the DOM sometime soon...
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  7. #7
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    *Runs in, sticks chest out and looks to the sky in a Superman-esk way*

    Someone mention my name and DOM?
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  8. #8
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    rofl

  9. #9
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    <?php
    $sSomeDocument 
    '
    <rootNode>
        <tag class="" atrrib="">One</tag>
        <tag atrrib="">Two</tag>
        <tag id="" atrrib="">Three</tag>
        <tag id="" class="" atrrib="">Four</tag>
        <tag>Five</tag>
        <nested>
            <tag>Six</tag>
            <tag id="" class="" atrrib="">Seven</tag>
            <tag id="" atrrib="">Eight</tag>
            <tag id="" class="" atrrib="">Nine</tag>
            <tag id="">Ten</tag>
        </nested>
    </rootNode>
    '
    ;

    $oDOM = new DOMDocument();
    $oDOM->loadXML($sSomeDocument);
    foreach(
    $oDOM->getElementsByTagName('tag') as $oNode)
    {
        echo 
    $oNode->nodeValue '<br />';
    }
    /*
        One<br />
        Two<br />
        Three<br />
        Four<br />
        Five<br />
        Six<br />
        Seven<br />
        Eight<br />
        Nine<br />
        Ten<br />
    */
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  10. #10
    SitePoint Wizard
    Join Date
    Jan 2005
    Location
    blahblahblah
    Posts
    1,447
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    cool! sounds great

    However, I get this error :

    Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Input is not proper UTF-8, indicate encoding !
    Yet I made sure that my html string contains a tag specifying that the encoding is utf-8...


  11. #11
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Do you have any incorrectly encoded characters? Euro sign, Pound sign etc...?
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  12. #12
    SitePoint Wizard
    Join Date
    Jan 2005
    Location
    blahblahblah
    Posts
    1,447
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I had. I have now fixed it by making sure my $html string is utf-8 encoded.

    However, I am now facing another problem. Please consider the following html code:

    PHP Code:
    <div id="some-id">
      <
    div>
        
    Welcome
      
    <div>
    </
    div
    The "welcome" word is retrieved twice... How would you solve nested elements content retrieved more than once?

    And also, if I print_r($oDOM->getElementsByTagName('div')), I can't get an array to be displayed. How could I do this?

    Regards,

    -jj.
    Last edited by jjshell; Apr 17, 2009 at 04:16.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •