SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Help wiht Data Extraction

    Hi

    I have an situation where I have to extract some data from a website. On the whole page, there is a <a> tag with "target = _blank" while the others are simple <a> tags.

    I want ot get the HREF of the <a> which has the attribute target = _blank,

    Please let me know how this can be matched via Xpath. Or if Xpath can not do that. Please help

  2. #2
    SitePoint Addict wibble wobble's Avatar
    Join Date
    Dec 2008
    Posts
    242
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This seems to work:

    PHP Code:
    <?php

    $content 
    '
    <a href="asd.php">asd</a>
    <a href="asd.html" target="_blank">asd</a>
    <a href="asd2.php">asd</a>
    <a href="asd2.html" target="_blank">asd</a>'
    ;

    // Start of open tag
    $regex '~<a href="';
    // Match a url
    $regex .= '(.*?)"';
    // Match any attributes
    $regex .= '([a-zA-Z0-9\=\-\"\s]*)';
    // Match target="_blank"
    $regex .= ' ?target="_blank" ?';
    // Any attributes again
    $regex .= '([a-zA-Z0-9\=\-\"\s]*)';
    // End of open tag
    $regex .= '>~i';

    preg_match_all($regex$content$matches);

    print_r($matches[1]);
    Find freelance jobs from all the major sites in one place:
    on twitter / on the web / twitter rss feed

  3. #3
    We're from teh basements.
    Join Date
    Apr 2007
    Posts
    1,205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not really up on XPath, but this seems logical:

    //a[@target="_blank"][0]
    Last edited by World Wide Weird; Dec 8, 2008 at 06:40. Reason: Typos.

  4. #4
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Here you go, a quick PHP, SimpleXML example for you:
    PHP Code:
    <?php
    $sData 
    '
    <html>
        <head>
            <title>Sample</title>
        </head>
        <body>
            <a href="path/to/something">One</a>
            <a href="path/to/something" target="_blank">Two</a>
            <a href="path/to/something">Three</a>
            <a href="path/to/something" target="_blank">Four</a>
            <a href="path/to/something">Five</a>
        </body>
    </html>
    '
    ;
    $oDoc = new SimpleXMLElement($sData);
    $oElements $oDoc->xpath("//a[@target='_blank']");
    print_r($oElements);
    /*
    Array
    (
        [0] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [href] => path/to/something
                        [target] => _blank
                    )

                [0] => Two
            )

        [1] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [href] => path/to/something
                        [target] => _blank
                    )

                [0] => Four
            )

    )
    */
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  5. #5
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    @SilverBulletUK

    thanks a lot your code worked for me.

    I have another question.

    I have a page with this structure

    body > table > tbody > tr > td > table > tbody > tr > td > div> center > table > tbody > tr > td > span

    in that span tag, i have a text and I want to gather that text.

    Please help


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •