Url validation with preg_match

Hi there,

I have the following function to check if a string is a valid url:


function is_url($uri){
	if(preg_match( '/^(http|https):\\/\\/[a-z0-9]+([\\-\\.]{1}[a-z0-9]+)*\\.[a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
  	return $uri;
	}
	else{
		return false;
	}
}

This works - with one exception: when the (valid) inputted url contains an underscore it returns false. Obviously it’s got something to do with the regex pattern, which is not my forte to put it mildly!

Could someone have a look and help me out?
Thanks!

Change [a-z0-9] to [a-z0-9_] ? You might also want to consider checking for other valid URL characters, such as hyphens.


function is_url($uri){
    if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
      return $uri;
    }
    else{
        return false;
    }
}

Should do it

Edit:

Which is pretty well what Bill said…!

Your specfic problem can be solved by adding an underscore to the second character class
[a-z0-9_]+

But your regex probably won’t work as expected, for example it would also return true in case of http://0-4.pkz

If you have to match many slashes, then I would recommend not to use a slash as your regex delimiter. If you use another character you don’t have to escape the slashes within the regex, e.g.

‘~http://[a-z]+~i’
instead of
‘/http:\/\/[a-z]+/i’

which makes the regex more readable.

Also, within a character class, you don’t have to escape the dot and the underscore (if it’s the first character):
[-.]
instead of
[\-\.]

Edit:

my typing is much too slow

Edit:

Nevermind - works now, thanks!

kleineme: how would I check for a valid tld?

Have a look again at the snippet from my post…


if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){

there is an underscore on the second atom [a-z_0-9]

Aaah, missed that one Spike… thanks! :slight_smile: