SitePoint Sponsor

User Tag List

Results 1 to 22 of 22

Thread: finding links in a string

  1. #1
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    finding links in a string

    Hey guys,
    I got the following line that finds links in a string and stories them in urls array
    Code:
    if (preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si', $text, $urls))
    I don't have a big understanding of regex and such, the above line works only with lniks starting with http, what do i need to add to make it also work for links starting with www?

    thanks.

  2. #2
    SitePoint Enthusiast
    Join Date
    Sep 2011
    Posts
    69
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    try this
    Code:
    $s = '<a href="http://www.example.com">example.com</a> test <a href="/foo">foo</a> test <a href="../bar">bar</a>';
    $pattern = '#href="((?:(?:http|ftp)s?://)?[^"]+)"#si';
    if (preg_match_all($pattern, $s, $m))
    {
            $links = $m[1];
            print_r($links);
    }

  3. #3
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    sorry i think there's a small miss understanding, the links are not contained within <a> tags in the string
    The string may look like this:

    Code:
    $text = "this string has a link starts with www.example.com and also a link starts with http://example.com or http://www.example.com, all these 3 links should be put into an array named urls."

  4. #4
    SitePoint Enthusiast
    Join Date
    Sep 2011
    Posts
    69
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK. Try this
    Code:
    $s = 'http://www.example.com test www.foo.com test test www.bar.com';
    $pattern = '#\b((?:(?:http|ftp)s?://)?www\.[^\s]+)\b#si';
    if (preg_match_all($pattern, $s, $m))
    {
            $links = $m[1];
            print_r($links);
    }

  5. #5
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    thanks that works.

  6. #6
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Sorry the above solution didnt work perfectly :P it works with links that starts with 'http://www' and 'www' but not with 'http://', for example the following link:
    Code:
    http://bar.com

  7. #7
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Try this one:

    $pattern = '~\b([a-z]+://)?([a-z-]+\.)+[^\s]+\b~si';

    This one should work on ANY string that resembles a URI

  8. #8
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    doesn't seem to work for me, for the following string:
    Code:
    $s = 'text http://www.example.com test www.foo.com test test http://bar.com text ';
    it returns:
    Code:
    Array ( [0] => http:// [1] => [2] => http:// )

  9. #9
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    You need to redo print_r($m); to see where the full strings you want are.

    Also it should be:

    $pattern = '~\b([a-z]+://)?([a-z0-9-]+\.)+[^\s]+\b~si';

  10. #10
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Right...it works, thanks

  11. #11
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Well i ran into another problem regarding the matter, i use the following function to replace all links in a string with <a> tags, and make them shorter if they are longer than 35 characters.
    Code:
    function make_clickable($text)
    {
    	if (preg_match_all('~\b([a-z]+://)?([a-z0-9-]+\.)+[^\s]+\b~si', $text, $urls))
    	{
    		foreach (array_unique($urls[0]) AS $url)
    		{
    			$urltext = strlen($url) > 35 ? substr($url, 0, 21).'...'.substr($url, -10) : $url;
    			$text = $url[0]!='h' ? str_replace($url, '<a href="http://'.$url.'" target="_blank" rel="nofollow">'.$urltext.'</a>', $text) : str_replace($url, '<a href="'.$url.'" target="_blank" rel="nofollow">'.$urltext.'</a>', $text);
    		}
    	}
    	return $text;
    }
    However, when putting more than 1 link it might get messed, because it finds the same link twice, like so:
    http://www.example.com
    www.example.com

    It'll find the same link twice and replace it twice, making a nested <a> tags which messes up the string, any idea on how can i solve that?

  12. #12
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,109
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  13. #13
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I doubt i need all of that just for such a (simple) task, I managed to get the above problem fixed using preg_match instead of str_replace to repalce only excact links.
    However now a new problem! (it just never stops)
    links with parameters are not getting transferred (like www.example.com/page.php?param=1)
    Code:
    ~\b([a-z]+://)?([a-z0-9-]+\.)+[^\s]+\b~si
    what do i need to add to the above pattern to make it solved?

  14. #14
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    14,356
    Mentioned
    64 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by ulthane View Post
    I doubt i need all of that just for such a (simple) task,
    ....
    However now a new problem! (it just never stops)
    As you are finding out it's not so simple as it may seem at first glance. IMHO you should try AnthonySterling's suggestion

  15. #15
    Community Advisor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,138
    Mentioned
    34 Post(s)
    Tagged
    2 Thread(s)
    Here's my New Wheel:

    PHP Code:

    #============================
    class string_to_urls
    {

    #============================
    #
    #============================   
    private function url_maker($text)
    {
      
    $result ''
      
      
    # remove http://
      
    $text str_replace('http://'''$text);

      
    # split into separate words 
      
    $words   explode(' '$text);
      
      
    $item = array(); #required result
      
    foreach( $words as $word ):
        
        
    #assume URL if and only if has period - should trap tailing . here
        
    if( strpos$word'.'  ) )
        {
          
    $urltext strlen($word) > 20 substr($word017) .'...' $word;
           
    $item[] = '<a href="http://' 
                      
    .   $word
                      
    .   '" target="_blank" rel="nofollow">'
                      
    .   $urltext
                      
    '</a>'
        }
        else
        {
          
    $item[] =  $word# plain text
        
    }
      endforeach;

      
    #DEBUG
        
    echo '<pre>';
          
    #print_r($item);
        
    echo '</pre>';

      
    $result implode($item' ' );
      
      return 
    $result;
    }
       
    #============================   
    #
    #============================
    function index()
    {
      
    $text "this string  http://www.example.com/page.php?param=1  has a link starts with www.example.com and also a link starts with http://example.com or http://www.example.com, all these 3 links should be put into an array named urls.";
      
      echo 
    '<dl style="width:42em; margin:0 auto; border:solid 1px #f00">'
        echo 
    '<dt>Original $text</dt>'
        echo 
    '<dd>' .$text  .'<br /><br /></dd>';
      
        echo 
    '<dt>function url_maker($text)</dt>';  
        echo 
    '<dd>' .$this->url_maker($text)   .'<br /><br /></dd>';

      echo 
    '</dl>'


    # Output:
    Code:
    Original $text
        this string http://www.example.com/page.php?param=1 has a link starts with
        www.example.com and also a link starts with http://example.com or
        http://www.example.com, all these 3 links should be put into an array named
        urls.
    
    
    function url_maker($text)
        this string www.example.com/p... has a link starts with www.example.com and also a link starts with example.com or www.example.com, all these 3 links should be put into an array named urls.
    Only the last trailing period requires some attention
    Last edited by John_Betong; Apr 2, 2012 at 00:34. Reason: formatting and spelling: not my fortay

  16. #16
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Hey John thanks for your solution it looks like a nice way of solving this however i cant trust only checking for dots as many words ends with a dot (like an end of a sentence)
    How can we just check if a certain word in a string starts with http or www OR has one of the following strings in it? (.co , .org , .net , .gov) then its a link for sure i'd say... (unless there is something i dont know, if checking for domain extenstion there's no even need to check for www|http)

  17. #17
    Community Advisor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,138
    Mentioned
    34 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by ulthane View Post
    Hey John thanks for your solution it looks like a nice way of solving this however i cant trust only checking for dots as many words ends with a dot (like an end of a sentence)
    How can we just check if a certain word in a string starts with http or www OR has one of the following strings in it? (.co , .org , .net , .gov) then its a link for sure i'd say... (unless there is something i dont know, if checking for domain extenstion there's no even need to check for www|http)
    @ulthane,


    Try this:

    PHP Code:

        
    # Old line
        # if( strpos($item, '.') )

        #  replace with this line to elimininate  .' and ." and ...
        
    if( strpos($item'.') && ( ! strpos($item'."') )   && ( ! strpos($item".'") )  && ( ! strpos($item"..") )  ) 
       {
          ...
          ...
       } 

    I cannot think of any other occurrences of the period except those eliminated, if you think of any let me know.
    Last edited by John_Betong; Apr 2, 2012 at 06:35. Reason: spelling: not my fotay

  18. #18
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Well I tested your method alittle bit deeper, it seems to screw up linebreaks, For example with the following input:
    Code:
    this string
    www.example.com
    has links
    example.com
    with line breaks www.tests.com
    in it tests.com and before it
    I'm getting such an html result:
    Code:
    this <a href="http://string<br />
    www.example.com<br />
    has" target="_blank" rel="nofollow">string<br />
    www.example.com<br />
    has</a> <a href="http://links<br />
    example.com<br />
    with" target="_blank" rel="nofollow">links<br />
    example.com<br />
    with</a> line breaks <a href="http://www.tests.com<br />
    in" target="_blank" rel="nofollow">www.tests.com<br />
    in</a> it <a href="http://tests.com" target="_blank" rel="nofollow">tests.com</a> and before it
    I guess it doesnt consider newlines as a space and therefore aint splitting it...
    note : same issue even without nl2br getting involved.

    And i've done a small progress with my regex try aswell
    Code:
    ~\b([a-z0-9-]+\.)+[^\?\s]+\b~si
    (just to remind it converts all links correctly except of links with parameters
    So for a link like:
    test.com/index.php?param=1
    It'll return
    test.com/index.php (all this as link) and then ?param=1 but as normal text... anyone?

  19. #19
    Community Advisor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,138
    Mentioned
    34 Post(s)
    Tagged
    2 Thread(s)
    @ulthane,

    Post: #3

    $text = "this string has a link starts with www.example.com and also a link starts with http://example.com or http://www.example.com, all these 3 links should be put into an array named urls."
    The code supplied (Post #15 and #17) extracts relevant text from your original $text and makes the correct html links.

    By adding line breaks the original specification has changed.

    It is now quite late and if you or other posters are unable to offer a solution then tomorrow I will endeavour to create a new script.

    PS Any chance of a later version having images

  20. #20
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Sorry if it wasn't clear that the text could also contain line breaks :P

    Anyways this thing just driving me nuts! my aim is to get this thing done with preg_match as it looks much cleaner code like that, im so close to the solution but the only problem is when parameters are in the url ! and i bet it is because preg_match sees '?' as a "reserved character" and it somehow needs to be escaped when using in preg_match_all.... anyone with any ideas?
    full code:
    PHP Code:
    function make_clickable($text)
    {
        
    // $text = str_replace('?', "\?", $text); lol..., nah that didn't work ;)
        
    $text str_replace('http://'''$text);
        if (
    preg_match_all('~\b([a-z0-9-]+\.)+[^\s]+\b~si'$text$urls))
        {
            foreach (
    array_unique($urls[0]) AS $url)
            {
                
    $urltext strlen($url) > 35 substr($url021).'...'.substr($url, -10) : $url;
                
    $text preg_replace('~^'.$url.'~m',"<a href=\"http://$url\" target=\"_blank\" rel=\"nofollow\">$urltext</a>",$text);
            }
        }
        return 
    $text;


  21. #21
    SitePoint Evangelist
    Join Date
    Jun 2010
    Location
    Israel
    Posts
    511
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    success!!!
    If anyone is interested, for the moment it works in ANY case!
    PHP Code:
    function make_clickable($text)
    {
        
    $text str_replace(array('http://','?'),array('','qmark'),$text);
        if (
    preg_match_all('~\b([a-z0-9-]+\.)+[^\s]+\b~si'$text$urls))
        {
            foreach (
    array_unique($urls[0]) AS $url)
            {
                
    $urltext strlen($url) > 35 substr($url021).'...'.substr($url, -10) : $url;
                
    $text preg_replace('~^'.$url.'~m',"<a href=\"http://$url\" target=\"_blank\" rel=\"nofollow\">$urltext</a>",$text);
            }
        }
        
    $text str_replace('qmark','?',$text);
        return 
    $text;


  22. #22
    Community Advisor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,138
    Mentioned
    34 Post(s)
    Tagged
    2 Thread(s)
    @ulthane,

    Congratulations - treat yourself to a Special Mocha Coffee with Double Cream

    Here is my lengthy, revised, more readable version:

    PHP Code:

    function url_maker($text)
    {
      
    $result = array();

      
    # KLUDGE to  replace and finally restore line-feeds - // ordinary-space ALT 400 ordinary-space 
      
    $x400    " É "

      
    $text str_replace('http://'''$text);
      
    $text str_replace'<br />'$x400$text);   
      
      
    $items    explode(' '$text);
      
      foreach( 
    $items as $item ):
        
          if( 
    strpos($item'.') && ( ! strpos($item'."') ) )
          {
              
    $urltext  strlen($item) > 99 substr($item,0,17) .'...' $item;
              
    $result[] = '<a href="http://' 
                   
    .   $item
                       
    .   '" target="_blank" rel="nofollow">'
                       
    .   $urltext
                       
    '</a>'
          }
          else
          {
            
    $result[] =  $item;
          }

      endforeach;

      
    #DEBUG
        
    echo '<pre>';
          
    #print_r($result);
        
    echo '</pre>';

      
    # Beware: $result array is changed to string
      
    $result implode($result' ' );
      
    $result    str_replace$x400,'<br />'$result);
      
      return 
    $result;


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •