SitePoint Sponsor

User Tag List

Results 1 to 11 of 11

Thread: Extract Comment

  1. #1
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Extract Comment

    hi

    I have a script that can read the html code of any website and put that into a txt file.

    Now I want to extract all the comment tags out of that text.

    Like i want to extract anything in between <!-- and --> , please give me some thoughts and guide me if I can use regex for this. Please also put some code of regex, or locate some tutorial for the regex to do what I need.

    Thanks
    Zeeshan

  2. #2
    SitePoint Member
    Join Date
    Jun 2009
    Posts
    19
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Buddy

    can u pass the code which reads the HTML code any website
    to this id shkhsiraj@mail.com

    it will be very helpful and i will give the idea how to extract the comment

  3. #3
    rajug.replace('Raju Gautam'); bronze trophy Raju Gautam's Avatar
    Join Date
    Oct 2006
    Location
    Kathmandu, Nepal
    Posts
    4,013
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I do not know more complicated one but i hope this will help you :
    PHP Code:
    $string 'hell <!-- and --> world';
    $string preg_replace('/<!--(.*)-->/'''$string);
    echo 
    $string
    Mistakes are proof that you are trying.....
    ------------------------------------------------------------------------
    PSD to HTML - SlicingArt.com | Personal Blog | ZCE - PHP 5

  4. #4
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks rajug,

    but your code will provide 'hell world'.
    What i am looking is to get the 'and' within the comments.

  5. #5
    rajug.replace('Raju Gautam'); bronze trophy Raju Gautam's Avatar
    Join Date
    Oct 2006
    Location
    Kathmandu, Nepal
    Posts
    4,013
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Oh.. sorry I just misunderstood.

    Try:
    PHP Code:
    $string 'hell <!-- and --> world';
    preg_match_all('/<!--(.*)-->/'$string$matches);
    print_r($matches); 
    Mistakes are proof that you are trying.....
    ------------------------------------------------------------------------
    PSD to HTML - SlicingArt.com | Personal Blog | ZCE - PHP 5

  6. #6
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    You can use regular expressions, rajug almost got it except that his would eat up more than just comments if there was more than one in the the HTML string (on one line), and wouldn't eat enough if the comment spanned multiple lines.

    A simple regex would be: /<!--.*?-->/s

    You could use a capturing group to get at only the comment content /<!--(.*?)-->/s, or a lookbehind/ahead /(?<=<!--).*?(?=-->)/s.


    An alternative approach would be to do things properly. Parse the HTML document into the DOM and then ask for all of the comment nodes.

    PHP Code:
    $dom = new DOMDocument;

    libxml_use_internal_errors(TRUE);
    $dom->loadHTMLFile('http://www.sitepoint.com/forums/showthread.php?t=627974');
    libxml_use_internal_errors(FALSE);

    $xpath = new DOMXPath($dom);

    $comments $xpath->query('//comment()');
    foreach (
    $comments as $comment)
    {
        echo 
    $comment->data"\n--------------------\n";

    You could also be more specific about which comments to grab by changing the XPath query: maybe only those comments within the body element, or only those directly descendant to it, for example.
    Salathe
    Software Developer and PHP Manual Author.

  7. #7
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    <?php
    $sHTML 
    '
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
            <title></title>
        </head>
        <body>
            <p>Some text<!-- embedded comment --> that we dont require<!-- so there --></p>
        </body>
    </html>
    '
    ;
    if(
    preg_match_all('~(?<=<!--)(.*?)(?=-->)~'$sHTML$aComments) > 0)
    {
        
    print_r(
            
    array_shift(
                
    $aComments
            
    )
        );
    }
    /*
        Array
        (
            [0] =>  embedded comment 
            [1] =>  so there 
        )
    */
    ?>
    Edit: Nice work Salathe.

    Off Topic:


    *Mumbles in Salathes general direction*
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  8. #8
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    WOW

    Thanks a lot ! for all you people.

    thanks rajug and silverbulletUK
    Your code works !

  9. #9
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hey

    1 more thing, what is the best way to remove all the non-alphabetic characters from a string ?

  10. #10
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $filtered preg_replace('/[^a-z]/i'''$string); 
    Off Topic:

    *sticks tongue out in SilverBulletUK's direction* It's ok, you get the thanks
    Salathe
    Software Developer and PHP Manual Author.

  11. #11
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ah thanks a lot !


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •