Regular expressions

Hi

Could somebody have a look at my regular expression for a preg_match_all and see if there is you can spot a problem with it. It’s supposed to identify links in a webpage, was working but seems to fail constantly now. I suspect it might be the GET variable checking is the problem ([=|-|_|@|a-zA-Z0-9]) - perhaps I need to allow characters such as hash etc.?

$pattern = '#\<a href( )?=( )?("|\')?(?!javascript|mailto)([^\>]*?\.(?:html|htm|php|asp)([0-9])?([?][=|-|_|@|a-zA-Z0-9]*)?)("|\')?( )?\>#s';

preg_match_all($pattern, $thepage, $thearray);

thanks in advance
Garrett

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.