SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    Serial Publisher silver trophy aspen's Avatar
    Join Date
    Aug 1999
    Location
    East Lansing, MI USA
    Posts
    12,937
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I need some help doing this.

    Basically I have a string that can have text formatting HTML tags in it but not link tags.

    [^<]*

    That will give me the string up to the first HTML tag - but how do I specify up to the first "<a"

    [^<a]* does not work because that means "up to either the first "<" or the first "a" I need something that means "up to the first "<a"
    Chris Beasley - I publish content and ecommerce sites.
    Featured Article: Free Comprehensive SEO Guide
    My Guide to Building a Successful Website
    My Blog|My Webmaster Forums

  2. #2
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    "^<a"
    Last edited by 7stud; Apr 29, 2001 at 06:42.

  3. #3
    Serial Publisher silver trophy aspen's Avatar
    Join Date
    Aug 1999
    Location
    East Lansing, MI USA
    Posts
    12,937
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    that wouldn't work.

    That wouldn't match any string with any tag other than a link tag in it. These strings have other tags in them and I want it to match up to the link tag.
    Chris Beasley - I publish content and ecommerce sites.
    Featured Article: Free Comprehensive SEO Guide
    My Guide to Building a Successful Website
    My Blog|My Webmaster Forums

  4. #4
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Yep, you are right. The problem is you want to match everything up to a certain character, and it isn't possible to match the characters then reverse direction and go back two spots. I came up with a solution for someone else who wanted to remove all links in his file by using the split() regular expression function with the tags as the the delimiter, which eliminates them from the text. I will see if I can find it for you.

  5. #5
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi again,

    I would use preg_match_all() with this regexp:

    preg_match_all("|([.]*)<a|", $string, $matches)

    The full pattern matches will be stored in the array $matches[0][x] where x = 1, 2, 3...etc., and matches to the first parethesized substring of the pattern will be stored in $matches[1][x]. You can eliminate the "<a" by just enclosing the preceding part of the pattern in parenthesis. The "|" on each end of the pattern is required by perl compatible functions. The character can be anything.

    Or, just read the whole file into a string and use the split() regexp function with an appropriate regexp to mimic an anchor which is just something like this:

    "<a[.]*/a>

    and then add the substrings in the array back together. One thing to think about is if the html isn't in perfect form:

    < a href="www.yahoo.com">Click me< a / >

    You can take care of that by adding in a space followed by a * everywhere you think there could be a space:

    "< *a[.]*/ *a *>"
    Last edited by 7stud; Apr 29, 2001 at 07:15.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •