Results 1 to 1 of 1
Thread: On-domain link matching regex
Jun 12, 2007, 09:43 #1
- Join Date
- Oct 2005
- 0 Post(s)
- 0 Thread(s)
On-domain link matching regex
How can I write a regular expression to match all on-domain links on a page but not off-domain links.
Here's the regex that matches any link in a <a> tag (the link matches the parenthesized set):
$link_regex = '#<a\b[^>]*\bhref=["]([^"]+)["][^>]*>#is';
But if I know the domain name (say it's http://www.example.com) and I want to match all relative links on the page and the absolute ones that are on-domain only (Some pages have abs. links while others have rel. links).
How can I write the following in regex:
If the link starts with http://www.example.com OR if it does not start with http:// then match it?
Now if the link is on-domain and absolute, then the first part of the condition would be true, the second false, true OR false = true => a match.
If the link is on-domain relative, then the first part of the condition would be false, the second true, false OR true = true => a match.
If the link if off-domain, then the first part of the condition would be false, the second false (it has to start with http:// since it's off-domain), false OR false = false => no match.
The problem is how to write that condition in regex?
P.S. I want to do this in ONE regex.