Hey guys,
I got the following line that finds links in a string and stories them in urls array
if (preg_match_all('/((ht|f)tps?:\\/\\/([\\w\\.]+\\.)?[\\w-]+(\\.[a-zA-Z]{2,4})?[^\\s\\r\
\\(\\)"\\'<>\\,\\!]+)/si', $text, $urls))
I don’t have a big understanding of regex and such, the above line works only with lniks starting with http, what do i need to add to make it also work for links starting with www?
sorry i think there’s a small miss understanding, the links are not contained within <a> tags in the string
The string may look like this:
$text = "this string has a link starts with www.example.com and also a link starts with http://example.com or http://www.example.com, all these 3 links should be put into an array named urls."
Sorry the above solution didnt work perfectly it works with links that starts with ‘http://www’ and ‘www’ but not with ‘http://’, for example the following link:
Well i ran into another problem regarding the matter, i use the following function to replace all links in a string with <a> tags, and make them shorter if they are longer than 35 characters.
I doubt i need all of that just for such a (simple) task, I managed to get the above problem fixed using preg_match instead of str_replace to repalce only excact links.
However now a new problem! (it just never stops)
links with parameters are not getting transferred (like www.example.com/page.php?param=1)
~\\b([a-z]+://)?([a-z0-9-]+\\.)+[^\\s]+\\b~si
what do i need to add to the above pattern to make it solved?
#============================
class string_to_urls
{
#============================
#
#============================
private function url_maker($text)
{
$result = '';
# remove http://
$text = str_replace('http://', '', $text);
# split into separate words
$words = explode(' ', $text);
$item = array(); #required result
foreach( $words as $word ):
#assume URL if and only if has period - should trap tailing . here
if( strpos( $word, '.' ) )
{
$urltext = strlen($word) > 20 ? substr($word, 0, 17) .'...' : $word;
$item[] = '<a href="http://'
. $word
. '" target="_blank" rel="nofollow">'
. $urltext
. '</a>';
}
else
{
$item[] = $word; # plain text
}
endforeach;
#DEBUG
echo '<pre>';
#print_r($item);
echo '</pre>';
$result = implode($item, ' ' );
return $result;
}
#============================
#
#============================
function index()
{
$text = "this string http://www.example.com/page.php?param=1 has a link starts with www.example.com and also a link starts with http://example.com or http://www.example.com, all these 3 links should be put into an array named urls.";
echo '<dl style="width:42em; margin:0 auto; border:solid 1px #f00">';
echo '<dt>Original $text</dt>';
echo '<dd>' .$text .'<br /><br /></dd>';
echo '<dt>function url_maker($text)</dt>';
echo '<dd>' .$this->url_maker($text) .'<br /><br /></dd>';
echo '</dl>';
}
Output:
Original $text
this string http://www.example.com/page.php?param=1 has a link starts with
www.example.com and also a link starts with http://example.com or
http://www.example.com, all these 3 links should be put into an array named
urls.
function url_maker($text)
this string www.example.com/p... has a link starts with www.example.com and also a link starts with example.com or www.example.com, all these 3 links should be put into an array named urls.
Only the last trailing period requires some attention
Hey John thanks for your solution it looks like a nice way of solving this however i cant trust only checking for dots as many words ends with a dot (like an end of a sentence)
How can we just check if a certain word in a string starts with http or www OR has one of the following strings in it? (.co , .org , .net , .gov) then its a link for sure i’d say… (unless there is something i dont know, if checking for domain extenstion there’s no even need to check for www|http)
Well I tested your method alittle bit deeper, it seems to screw up linebreaks, For example with the following input:
this string
www.example.com
has links
example.com
with line breaks www.tests.com
in it tests.com and before it
I’m getting such an html result:
this <a href="http://string<br />
www.example.com<br />
has" target="_blank" rel="nofollow">string<br />
www.example.com<br />
has</a> <a href="http://links<br />
example.com<br />
with" target="_blank" rel="nofollow">links<br />
example.com<br />
with</a> line breaks <a href="http://www.tests.com<br />
in" target="_blank" rel="nofollow">www.tests.com<br />
in</a> it <a href="http://tests.com" target="_blank" rel="nofollow">tests.com</a> and before it
I guess it doesnt consider newlines as a space and therefore aint splitting it…
note : same issue even without nl2br getting involved.
And i’ve done a small progress with my regex try aswell
~\\b([a-z0-9-]+\\.)+[^\\?\\s]+\\b~si
(just to remind it converts all links correctly except of links with parameters
So for a link like: test.com/index.php?param=1
It’ll return test.com/index.php (all this as link) and then ?param=1 but as normal text… anyone?
Sorry if it wasn’t clear that the text could also contain line breaks
Anyways this thing just driving me nuts! my aim is to get this thing done with preg_match as it looks much cleaner code like that, im so close to the solution but the only problem is when parameters are in the url ! and i bet it is because preg_match sees ‘?’ as a “reserved character” and it somehow needs to be escaped when using in preg_match_all… anyone with any ideas?
full code:
function make_clickable($text)
{
// $text = str_replace('?', "\\?", $text); lol..., nah that didn't work ;)
$text = str_replace('http://', '', $text);
if (preg_match_all('~\\b([a-z0-9-]+\\.)+[^\\s]+\\b~si', $text, $urls))
{
foreach (array_unique($urls[0]) AS $url)
{
$urltext = strlen($url) > 35 ? substr($url, 0, 21).'...'.substr($url, -10) : $url;
$text = preg_replace('~^'.$url.'~m',"<a href=\\"http://$url\\" target=\\"_blank\\" rel=\\"nofollow\\">$urltext</a>",$text);
}
}
return $text;
}