SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regex to match all anchors on the page that have a href attribute

    PHP Code:
    preg_match_all('/<a.*?href="(.*?)".*?>([^<>]*?)<\/a>/is'$output$matches); 
    This is what I have thus far, my problem is that if I have an anchor like this: <a name="top"></a> it'll match from the start of that anchor all the way up to the closing tag of the next anchor which contains a href attribute.

    I'm sure it's something simple! Also while we're here would it be easy enough to make sure the anchor text does not contain an img tag ?

    Regards,
    - Dean

  2. #2
    SitePoint Addict Wildhoney's Avatar
    Join Date
    Apr 2006
    Location
    Nottingham
    Posts
    246
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I was unable to really replicate your issue. However, I've rewritten your regex to make it somewhat cleaner and more effective. I've also utilised the greedy label. Something which you couldn't have done in yours else it'd have returned only the one link.

    PHP Code:
    preg_match_all('/<a.*href="([^\"]+)".*>([^<]+)<\/a>/iUs'$output$matches); 
    I tested it on:

    Code:
    <a class="myLink" href="test.php">Liiiink</a>
    <table><tr><td>Meh</td></tr></table>
    <a name="top">Test</a>
    <b>Lol!</b>
    <a href="link.html">My Link</a>
    <a href="a.html">Bad Link</a>
    ...And it picks out the 2 links. Discards the one without an HREF attribute. Your link labels and their corresponding links will then be in $matches[1] and $matches[2], respectively.
    TalkPHP.com - The Friendly PHP Community

    Watch Reaper Online - Watch Chuck Online

  3. #3
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It seems to work on some things, but not all. I have a massive page of markup (all valid) so it's hard to work out what's wrong heh.

  4. #4
    SitePoint Addict Wildhoney's Avatar
    Join Date
    Apr 2006
    Location
    Nottingham
    Posts
    246
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Could you possibly link me to the HTML page so I can test it, please? Also, you said it works on some things but not all. Are you referring to my regex there?
    TalkPHP.com - The Friendly PHP Community

    Watch Reaper Online - Watch Chuck Online

  5. #5
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    www.vbulletin.com/forum

    I can't link you to my test environment unfortunately but I'm using a stock version of vBulletin which is almost identical to what they're using on the above linked website

  6. #6
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    preg_match_all('%(<a.*href="[^"]+"[^>]*>[^<]+</a>)%i'$subject$result);
    print_r($result); 
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  7. #7
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by logic_earth View Post
    PHP Code:
    preg_match_all('%(<a.*href="[^"]+"[^>]*>[^<]+</a>)%i'$subject$result);
    print_r($result); 
    I got it working using this Thanks mate

  8. #8
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry to double-post. I'm almost there. I want to allow any HTML to be within the anchor text except images. This is the regex as it stands:

    PHP Code:
        preg_match_all('%(<a.*href="([^"]+)"[^>]*>(.*?)</a>)%i'$html$matches); 
    The bit that logic_earth posted didn't allow for strong tags within the anchor text etc. As much as i'd like to hope that kind of markup was never used it's a realistic possibility so I need to allow for any html inside of my anchor tags really that isn't an image.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •