Regular Expression problem converting URL's to hyperlinks

I have written this function, with help from patterns I’ve found online. But I’ve ran into a problem, while it outputs the link correctly, the actual link is something like:

http://localhost:8081/myfolder/www.sitepointforums.com/grab?foo=bar

which is obviously wrong. I am trying to simply grab multiple types of urls (in various formats [strict or lazy]) and convert them to hyperlinks.

Here’s my function:


	/*******************************
	 * Make URL's In Text <a> links	       
	 * ------------------------------------ 
	 * @params	<str> $text			       
	 *		<str> $link_params      
	 * @returns	<str> $text			       
	 *						       
	 * Returns the text string supplied		
	 * with URL's converted to working <a>	
	 * links. The optional $link_params		
	 * parameter can be used to add "rel"	
	 * or "class", "id", etc to the actual	
	 * <a> link.							
	 */									

		function Make_URLS_HTML_Links($text, $link_params = false)
		{
			/* OLD */
			/*
			$regexp = '#([\
 ])www\\.([a-z0-9\\-]+)\\.([a-z0-9\\-.\\~]+)((?:/[^,\	 \
\\r]*)?)#i';
			$link_format = '\\\\1<a '.$link_params.' href="http://www.\\\\2.\\\\3\\\\4" target="_blank">www.\\\\2.\\\\3\\\\4</a>';
			return preg_replace($regexp, $link_format, $text);
			*/
			/* NEW */
			$pattern = "@\\b(https?://)?(([0-9a-zA-Z_!~*'().&=+$&#37;-]+:)?[0-9a-zA-Z_!~*'().&=+$%-]+\\@)?(([0-9]{1,3}\\.){3}[0-9]{1,3}|([0-9a-zA-Z_!~*'()-]+\\.)*([0-9a-zA-Z][0-9a-zA-Z-]{0,61})?[0-9a-zA-Z]\\.[a-zA-Z]{2,6})(:[0-9]{1,4})?((/[0-9a-zA-Z_!~*'().;?:\\@&=+$,%#-]+)*/?)@";

			$link_params = ( $link_params != false && $link_params != '' ) ? ' '.$link_params : $link_params;
			$link = preg_replace($pattern, '<a href="\\0"'.$link_params.'>\\0</a>', $text);

                        # here i am trying to find http or https, but this doesn't seem to work either
			if ( preg_match("#(http|https)#", $link) == false ):
				$link = 'http://'.$link;
			endif;

			return $link;
		}

This code will get confused if the input has more than one link in it.
preg_replace_callback the thing instead of preg_replace , and use the callback function to test for http/https.


<?php

define( 'LINK_LIMIT_LENGTH', 30 );
define( 'LINK_FORMAT', '<a href="&#37;s" rel="ext">%s</a>' );

function prase_links  ( $m )
{
    $href = $name = html_entity_decode( $m[0] );

    if ( strpos( $href, '://' ) === false )
        $href = 'http://' . $href;

    if( strlen($name) > LINK_LIMIT_LENGTH ) {
        $k = ( LINK_LIMIT_LENGTH - 3 ) >> 1;
        $name = substr( $name, 0, $k ) . '...' . substr( $name, -$k );
    }

    return sprintf( LINK_FORMAT, htmlspecialchars( $href ), htmlspecialchars( $name ) );
}

$s = '...lots of text...';

$reg = '~((?:https?://|www\\d*\\.)\\S+[-\\w+&@#/%=\\~|])~';
print preg_replace_callback( $reg, 'prase_links', $s );

If you want or need the length to be exactly 30 or whatever value you put in.
Either way you will have to do with rounding.

But the point of the length limit is to cut long URLs so they don’t cause layout problems.
Nothing more.

Off Topic:

I just borrowed it from Stereofrog. (:

If LLL = 30
LLL - 3 = 27
>> 1 = 13
So you pull in 13 from the left, 13 from the right, stick … in the middle… and end up 1 short of 30.

Does this mean that you should always use an odd value of LLL?

Off Topic:

That’s a nice trick! :slight_smile: