SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Hybrid View

  1. #1
    SitePoint Member
    Join Date
    Sep 2006
    Location
    Currently Toronto, Canada.
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question regex: remove everything not div tags

    Hi,

    Searched but haven't found a solution to this.
    I want to remove everything from html code that is not a <div> or </div> tag (opening or closing).
    Since this matches the divs:
    Code:
    <div.*?>|</div>
    I thought I could just negate it somehow, such as:
    Code:
    [^(<div.*?>)]|[^(</div>)]
    (does not work)

    Any ideas?
    Cheers
    Last edited by pog; Jul 20, 2010 at 15:48. Reason: pasted wrong code

  2. #2
    SitePoint Evangelist
    Join Date
    Jun 2007
    Location
    North Yorkshire, UK
    Posts
    483
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I want to remove everything from html code that is not a <div> or </div> tag (opening or closing).
    Am I misreading this. Surely you will then just be left with a string containing <div>s and </div>s which doesn't seem much use.

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,892
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    This works for me:

    Code:
    ~</(?!div).*?>|<(?!/)(?!div).*?>~is
    Use in PHP as follows:
    PHP Code:
    $some_string preg_replace('~</(?!div).*?>|<(?!/)(?!div).*?>~is'''$some_html); 
    Breakdown of this regex:

    ~ - Start regex
    </ - match </ literally
    (?!div) - Negative lookahead for the literal string div
    .*? - match anything, lazyly. Shouldn't be needed here, but without it the regex doesn't work !?
    > - match > literally
    | - OR match the following:
    < - match < literally
    (?!/) - Negative lookahead for the literal string /
    (?!div) - Negative lookahead for the literal string div
    .*? -match anything, lazyly.
    > - match > literally
    ~ - End regex
    is - Modifiers: Case Insensitive (i) and Single Line mode (s)

    Single line mode is to also remove HTML that spans multiple lines, like

    <script language="javascript"
    src="/some/path/to/some/javascript.js">

    For info on negative lookahead, see here: http://www.regular-expressions.info/lookaround.html

    Hope that helps
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    om nom nom nom Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,233
    Mentioned
    47 Post(s)
    Tagged
    1 Thread(s)
    Agreed with Phillip. Is this the story where someone asks how to move a mountain because they want to lay a pipeline from point A to point B?

  5. #5
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,892
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    Agreed with Phillip. Is this the story where someone asks how to move a mountain because they want to lay a pipeline from point A to point B?
    How I understood it is that the OP wished to remove all tags except for div tags, thus leaving everything outside tags (content) and div tags in tact. Which is exactly what my regex provided in post #3 does
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  6. #6
    om nom nom nom Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,233
    Mentioned
    47 Post(s)
    Tagged
    1 Thread(s)
    I'll have to see it to understand it then.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •