Loop through file()

Hi Guys!

I am using file() to get contents of a URL. I am then looping through the array to get the contents of the file line by line. How can I get php to get all <a title=“”> links off the page? Basically I want PHP to grab whatever the title=“” is.

Hope someone can help.

I am guessing I need something like:


if(preg_match('<a title="/^[a-zA-Z]$/">', $line, $matches)) {
print_r($matches);
}

The above does not work, but would appreciate it if anyone could correct the code.

Thanks!

Almost correct. Use:


if(preg_match_all('/<a.*?title=\\"([^\\"]*)/i', $str, $matches)) {
print_r($matches);
}

To break it down:

/ - start regex
< - match “<” literally, 1 time
a - match “a” litterally, 1 time
.? - match any character, the ? makes this part lazy (see here, section “Laziness Instead of Greediness”) - this is because the title does not have to directly next to the start of the tag. This way you can also grab <a href=“someurl” title=“my title”>
title=" - match title=" literally, 1 time
([^"]
) - match as many characters as possible, but not " (double quote) - this makes the regex stop when it finds a ". Take care when crawling websites that do not adhere to standards (<a title=mypicture href=someurl>)
/i - end the regex, and make it case insensitive (the “i” at the end)

Thanks, worked a treat :slight_smile:

It would be much more appropriate to load the HTML document into a proper parser (handily, we have the DOM) and use that to grab what you need. For example:


$dom = new DOMDocument;
$dom->loadHTMLFile('./myhtmlfile.html');

foreach ($dom->getElementsByTagName('a') as $anchor) {
    if ($anchor->hasAttribute('href')) {
        echo $anchor->getAttribute('href') . PHP_EOL;
    }
}

see attach file :blush: