Hello,

How can I write a regular expression to match all on-domain links on a page but not off-domain links.

Here's the regex that matches any link in a <a> tag (the link matches the parenthesized set):

$link_regex = '#<a\b[^>]*\bhref=["]([^"]+)["][^>]*>#is';

But if I know the domain name (say it's http://www.example.com) and I want to match all relative links on the page and the absolute ones that are on-domain only (Some pages have abs. links while others have rel. links).

How can I write the following in regex:
If the link starts with http://www.example.com OR if it does not start with http:// then match it?

Now if the link is on-domain and absolute, then the first part of the condition would be true, the second false, true OR false = true => a match.
If the link is on-domain relative, then the first part of the condition would be false, the second true, false OR true = true => a match.
If the link if off-domain, then the first part of the condition would be false, the second false (it has to start with http:// since it's off-domain), false OR false = false => no match.

The problem is how to write that condition in regex?

P.S. I want to do this in ONE regex.


.