SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Addict SwordsmanX's Avatar
    Join Date
    Sep 2005
    Posts
    211
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Good regex for Link Checking

    I'm looking for a good regex to use in a link exchange script that will check if another website is linking to me

    Searching through the old post I found this

    '@<a\s.*href\s*=(?:.*\.|\s*["\'])yoursite\.com[^>]*>.*(?<!

    But there are some things that are left out, could anyone review this and solve the remaining problems, so that we can get a really good regex? I think it'll be useful for everybody!

  2. #2
    SitePoint Guru aamonkey's Avatar
    Join Date
    Sep 2004
    Location
    kansas
    Posts
    953
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    maybe

    "#<a\s[^>]*?href\s*=[^>]*yoursite\.com#is"

  3. #3
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    <a[^>]+href\s*=\s*["']?http:\/\/[w]{0,3}\.yoursite\.com["']?[^>]*>

    That should so it.
    It finds anything staring with "<a" then anything besides a ">" untill it hits an "href", any number of spaces, an "=", any number of spaces, single/double or no quotes, the http to your site (with or without the www) fallowed by single/double or no qoutes, any number of characters besides ">" and the closing ">" tag.

  4. #4
    SitePoint Addict SwordsmanX's Avatar
    Join Date
    Sep 2005
    Posts
    211
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for helping!
    One last thing, as you know rel="nofollow" basically makes your link worthless, so could you include a small check to see if rel=nofollow is present?
    i'm not skilled in regex but it should be something like [^rel="nofollow"], I just don't know where to put it

  5. #5
    SitePoint Guru aamonkey's Avatar
    Join Date
    Sep 2004
    Location
    kansas
    Posts
    953
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you could accomplish this with one regex, but it would be a big mess since pcre regex don't support variable-length look aheads/behinds.

    so I would use 2 regex:

    first to match the tag:

    "#<a\s[^>]*?href\s*=[^>]*yoursite\.com.*?>#is"

    then to see if it contains rel="nofollow"

    "#rel\s*=\s*['\"]?nofollow#is"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •