SitePoint Sponsor |
|
User Tag List
Results 1 to 15 of 15
Thread: regex help
-
Apr 13, 2008, 04:24 #1
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
regex help
I'm trying to get both the link url and img url from this string:
Code:<a href="linkurl"><img src="imgurl" alt="alttext" width="346" height="260" border="1"></a>
-
Apr 13, 2008, 06:48 #2
- Join Date
- Dec 2004
- Posts
- 240
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
You would need to use preg_match_all(). I am giving you a very simple working example. Probably you would need to make something a little bit more complicated.
PHP Code:<?php
// Sample string:
$str = <<< TEXT
<a href="linkurl1"><img src="imgurl1" alt="alttext" width="346" height="260" border="1"></a>
<a href="linkurl2"><img src="imgurl2" alt="alttext" width="346" height="260" border="1"></a>
<a href="linkurl3"><img src="imgurl3" alt="alttext" width="346" height="260" border="1"></a>
TEXT;
// Extracting URL's to the array $m
preg_match_all("/<a href=\"([^\"]*)\"><img src=\"([^\"]*)\".*?><\/a>/si",$str,$m);
// Displaying the result
echo '<pre>' . htmlspecialchars(print_r($m[1],true)) . '</pre>'; // link URL's
echo '<pre>' . htmlspecialchars(print_r($m[2],true)) . '</pre>'; // image URL's
?>
-
Apr 13, 2008, 07:32 #3
Another simple example here:
PHP Code:$reg = "/(a +href ?= ?|img +src ?= ?)(\"|\')?(http:\/\/)?([\w-_]*\.?)*[\/\w\.?]*/i";
preg_match_all($reg, $string, $match);
foreach($match[0] as $key => $val){
$urls[$key] = str_replace(array("=", "src", "a href", "img", " ", "'", "\""), "", $val);
}
echo "<pre>".htmlspecialchars(print_r($urls, true))."</pre>";
Code:<a href="http://www.someurl-website.com/subfolder/index.php"><img src="http://domain.something.com/images/imgurl.gif" alt="alttext" width="346" height="260" border="1"></a> <a href="www.someurl-website.com/subfolder/index.php"><img src="http://domain.something.com/images/imgurl.gif" alt="alttext" width="346" height="260" border="1"></a> <a href="someurl-website.com/subfolder/index.php"><img src="http://something.com/images/imgurl.gif" alt="alttext" width="346" height="260" border="1"></a> <a href="/subfolder/index.php"><img src="something.com/images/imgurl.gif" alt="alttext" width="346" height="260" border="1"></a> <a href="index.php"><img src="images/imgurl.gif" alt="alttext" width="346" height="260" border="1"></a>
Code:<a target = "_blank" href = "linkhere"><img width = "" alt = "" src="linkhere" /></a>
H u m o
Uncensored Forums for Intelligent People
-
Apr 13, 2008, 12:47 #4
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks guys,
If i use eg this one:
"/<a href=\"([^\"]*)\"><img src=\"([^\"]*)\".*?><\/a>/si"
how can i make it case insensitive? Some of the links are eg <a HREF
-
Apr 13, 2008, 12:55 #5
- Join Date
- Dec 2004
- Posts
- 240
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
This regexp is already case-insensitive because of i-modifier
Code:"/<a href=\"([^\"]*)\"><img src=\"([^\"]*)\".*?><\/a>/si"
-
Apr 13, 2008, 13:16 #6
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
ah yes, actually the problem is when this example crops up:
<a HREF="imglink"><img name="" src="imgurl" width="140" height="140" border="0"></a>
basically i need the regex to ignore everything other than the src for both the a tag and the inbetween img tag
-
Apr 13, 2008, 13:56 #7
- Join Date
- Dec 2004
- Posts
- 240
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Try this (not checked):
Code:"/<a [^>]*?href=\"([^\"]*)\"><img [^>]*?src=\"([^\"]*)\".*?><\/a>/si"
-
Apr 15, 2008, 08:45 #8
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
ok that was working brilliantly until i hit some examples where there was other stuff before the closing a tag:
eg:
Code:<a HREF="link"><img src="image" width="360" height="270" border="0"><br> </a>
so how can i make it so that anything else can appear after the close of the img tag, but before the close a tag?
-
Apr 15, 2008, 12:38 #9
- Join Date
- Dec 2004
- Posts
- 240
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
E.g. you could try this:
Code:"/<a [^>]*?href=\"([^\"]*)\"><img [^>]*?src=\"([^\"]*)\".*?>.*?<\/a>/si"
-
Apr 15, 2008, 12:40 #10
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
ok thanks
-
Apr 15, 2008, 12:45 #11
- Join Date
- May 2006
- Location
- Lancaster University, UK
- Posts
- 7,062
- Mentioned
- 2 Post(s)
- Tagged
- 0 Thread(s)
To be honest, when it comes to something like this where attributes can be in different orders, and tags can vary on the inside, shouldn't you be taking advantage of PHP's DOM capabilities?
Jake Arkinstall
"Sometimes you don't need to reinvent the wheel;
Sometimes its enough to make that wheel more rounded"-Molona
-
Apr 15, 2008, 12:50 #12
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
-
Apr 15, 2008, 13:00 #13
- Join Date
- May 2006
- Location
- Lancaster University, UK
- Posts
- 7,062
- Mentioned
- 2 Post(s)
- Tagged
- 0 Thread(s)
Yeah, however if you do come across a different URL, you may run into problems.
Jake Arkinstall
"Sometimes you don't need to reinvent the wheel;
Sometimes its enough to make that wheel more rounded"-Molona
-
Apr 17, 2008, 05:12 #14
- Join Date
- Jan 2005
- Location
- UK
- Posts
- 539
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Another issue:
sometimes there are spaces and carriage returns between the <a> and the <img tag...
eg:
Code:<A HREF="link"> <IMG SRC="image" WIDTH=163 HEIGHT=62 BORDER=0 ALT=""></A>
-
Apr 17, 2008, 11:53 #15
- Join Date
- Dec 2004
- Posts
- 240
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I think the simplest would be to use \s*
Code:"/<a [^>]*?href=\"([^\"]*)\">\s*<img [^>]*?src=\"([^\"]*)\".*?>.*?<\/a>/si"
Bookmarks