Is there a way to match anything that isn’t a "
Would this work? Trying to extract: website.com
[^"]+
Thanks,
Nick
Is there a way to match anything that isn’t a "
Would this work? Trying to extract: website.com
[^"]+
Thanks,
Nick
For something as limited as your example that would work. But I suspect there’s more to it than you showed. There’s a lot of stuff that isn’t "
If what you want really is that trivial, IMHO it would be better to use a string function instead of regex.
just str_replace() the quotes away
I’m extracting urls from a html document, I’ve got a regex setup that works, it is just a little ugly. Was just wondering if this would work. So for example:
href = “google.com”
My regex to match google.com would be:
href = “[^”]+
Nick
Parsing URLs out of an HTML document is fairly complicated. There are a few variations that need to be taken into account. My guess is that your existing “ugly” regex is close to the “prettiest” it can be and still work for the different kinds of link syntax. If used, the not " would only be a small part of a larger regex pattern.
Url types
href=URL
href='URL'
href="URL"
Regex:
/href\\s*=[\\s]*["']?(.*?)[\\s"']/g
That should work for most your links (all the good ones any way).
Hey Vali - great reply! That is some really nice regex! And inlife; regex looks hard, but it really isn’t that bad. You should take some time to read the tutorial here:
Helped me get started
Here’s a break-down of what it means
/…/g - slashes delimit a regular expression. The g makes it a global regex
href - look for this text
\s* - followed by spaces, the * means 0 or more of them
= - followed by an equal sign
[\s]* - and 0 or more spaces
[“']? - then an optional quote mark
(.*?) - capture 0 or more characters, the question mark allows an empty capture
[\s”'] - finally ending in a space or quote
As the other poster said, regular-expressions.info is a really good reference. For example, grouping and backreferences which can take a lot of study to get your head fully around the topic.
Paul, your answer won’t be read
This kind never open same thread back.
Regardless of that, the audience here is greater than one. Whether or not he benefits, others will too.
This tool: http://www.gskinner.com/RegExr/ is a great one, when working with RegEx.
Paste a sample of the data you’re working on, into the textarea, and then watch the matches in realtime, as you build your RegEx…