SitePoint Sponsor |
|
User Tag List
Results 1 to 5 of 5
Thread: Regular Expression Help
-
Apr 29, 2001, 07:03 #1
- Join Date
- Aug 1999
- Location
- East Lansing, MI USA
- Posts
- 12,937
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I need some help doing this.
Basically I have a string that can have text formatting HTML tags in it but not link tags.
[^<]*
That will give me the string up to the first HTML tag - but how do I specify up to the first "<a"
[^<a]* does not work because that means "up to either the first "<" or the first "a" I need something that means "up to the first "<a"Chris Beasley - I publish content and ecommerce sites.
Featured Article: Free Comprehensive SEO Guide
My Guide to Building a Successful Website
My Blog|My Webmaster Forums
-
Apr 29, 2001, 07:30 #2
- Join Date
- Mar 2001
- Posts
- 3,537
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hi,
"^<a"Last edited by 7stud; Apr 29, 2001 at 07:42.
-
Apr 29, 2001, 07:36 #3
- Join Date
- Aug 1999
- Location
- East Lansing, MI USA
- Posts
- 12,937
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
that wouldn't work.
That wouldn't match any string with any tag other than a link tag in it. These strings have other tags in them and I want it to match up to the link tag.Chris Beasley - I publish content and ecommerce sites.
Featured Article: Free Comprehensive SEO Guide
My Guide to Building a Successful Website
My Blog|My Webmaster Forums
-
Apr 29, 2001, 07:49 #4
- Join Date
- Mar 2001
- Posts
- 3,537
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hi,
Yep, you are right. The problem is you want to match everything up to a certain character, and it isn't possible to match the characters then reverse direction and go back two spots. I came up with a solution for someone else who wanted to remove all links in his file by using the split() regular expression function with the tags as the the delimiter, which eliminates them from the text. I will see if I can find it for you.
-
Apr 29, 2001, 08:07 #5
- Join Date
- Mar 2001
- Posts
- 3,537
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Hi again,
I would use preg_match_all() with this regexp:
preg_match_all("|([.]*)<a|", $string, $matches)
The full pattern matches will be stored in the array $matches[0][x] where x = 1, 2, 3...etc., and matches to the first parethesized substring of the pattern will be stored in $matches[1][x]. You can eliminate the "<a" by just enclosing the preceding part of the pattern in parenthesis. The "|" on each end of the pattern is required by perl compatible functions. The character can be anything.
Or, just read the whole file into a string and use the split() regexp function with an appropriate regexp to mimic an anchor which is just something like this:
"<a[.]*/a>
and then add the substrings in the array back together. One thing to think about is if the html isn't in perfect form:
< a href="www.yahoo.com">Click me< a / >
You can take care of that by adding in a space followed by a * everywhere you think there could be a space:
"< *a[.]*/ *a *>"Last edited by 7stud; Apr 29, 2001 at 08:15.
Bookmarks