Using Regular Expressions in PHP

Share this article

When I first started programming in PHP, I found regular expressions very difficult. They were complicated, looked ugly, were hard to figure out, and there seemed to be a real lack of documentation in this area. This article will provide you with an insight as to what they are, how they are useful, and how to apply them.

What are Regular Expressions?

Regular expressions started out as a feature of the Unix shell. They were designed to make it easier to find, replace and work with strings — and since their invention, they’ve been in wide use in many different parts of Unix based Operating Systems. They were commonly used in Perl, and since then have been implemented into PHP.

What could I use them for?

There are a few common uses for regular expressions. Perhaps the most useful is form validation. For example, you could use regular expressions to check that an email address entered into a form uses the correct syntax. We’ll consider this specific example later on in this article.

You could also use them to complete complex search and replace operations within a given body of text that would not be possible with PHP’s standard str_replace function. Yes, the possibilities are endless!

How do I use them?

Let’s look at how we might use a regular expression to check the syntax of an email address entered into a form that’s submitted to a PHP script.

There are two types of regular expression functions included in PHP:

  • the ereg functions — PHP’s standard regular expression syntax
  • the preg functions, which use a Perl-compatible regular expression syntax

    For this article we’ll use the eregi function. The eregi function is used to match a string to a particular regular expression. The ‘i‘ in the function name means ‘case insensitive’ — you can also use ereg if you want it to be case sensitive.

    You can see the PHP Manual pages for the eregi function here.

    Now, as you know, email address are always in a particular format:

    username @ domain . extension

    That makes them an ideal candidate to be tested with a regular expression. So let’s take a look at an expression I wrote to check the validity of an email address. We’ll look at each section of the expression individually, and then I’ll include a syntax reference at the end of the article. But first, here’s the expression itself:

    eregi('^[a-zA-Z0-9._-]+@[a-zA-Z0-9-] 
    +.[a-zA-Z.]{2,5}$', $email)

    If you’re anything like I was when I first used regular expressions, that example probably looks very confusing! Let’s split it into sections and make sense of each part individually:

    ^[a-zA-Z0-9._-]+@

    This part of the expression validates the ‘username’ section of the email address. The hat sign (^) at the beginning of the expression represents the start of the string. If we didn’t include this, then someone could key in anything they wanted before the email address and it would still validate.

    Contained in the square brackets are the characters we want to allow in this part of the address. Here, we are allowing the letters a-z, A-Z, the numbers 0-9, and the symbols underscore (_), period (.), and dash (-). As you’ve probably noticed, I’ve included letters both in capitals and lower case. In this instance, this isn’t strictly necessary, as we’re using the eregi (case insensitive) function. But I’ve included them here for completeness, and to show you how the functions work. The order of the character pairs within the brackets doesn’t matter.

    The plus (+) sign after the square brackets indicates ‘one or more of the contents of the previous brackets’. So, in this case, we require one or more of any of the characters in the square brackets to be included in the address in order for it to validate. Finally, there is the ‘@‘ sign, which means that we require the presence of one @ sign immediately following the username.

    [a-zA-Z0-9._-]+.

    This part of the expression is very similar to the section we t looked at. It validates the domain name in the email address. As before, we have a series of characters in square brackets that we’ll allow in this part of the address, followed by a plus (+) sign, requiring one or more of those characters.

    At the end of this section, there is a backslash, then a period sign. This tells the expression that a period is required at this point in the expression (ie. between the domain and extension). However, the backslash is slightly more complicated. In a regular expression, a period actually means ‘any character’. In order to make this expression take the period’s literal value rather than use it as a wildcard for any character, we need to ‘escape’ it — this is done by preceding the period with a backslash. You may have come across this before if you use databases such as MySQL, as escaping characters is very important there too.

    [a-zA-Z]{2,4}$

    This is the final part of the expression. At the beginning is another set of characters enclosed in square brackets. This time, I have simply allowed the letters a-z and A-Z, because numbers and other characters are not valid in domain extensions.

    Instead of the + sign we used before, here we have ‘{2,4}‘ immediately following the square brackets. This means that we require between 2 and 4 of the characters from the square brackets to be included in the email address. So com, net, org, uk, au, etc. are all valid, but anything longer than these will not be accepted.

    Finally, the $ sign at the end of the expression signifies the end of the string. If we didn’t include this, then a user could type anything after the end of the email address and it would still validate.

    Here’s the source code of a script you can use to test this regular expression — and any others you want to play with:

    <?php  
    if (!$_REQUEST['action']) {  
    ?>  
    <form action='<?=$_SERVER['PHP_SELF']; ?>' method='POST'>  
    Email Address: <input type='text' name='email'>  
    <input type='hidden' name='action' value='validate'>  
    <p>  
    <input type='submit' value='Submit'>  
    </form>  
    <?php  
    }  
     
    if ($_REQUEST['action'] == 'validate') {  
    if (eregi('^[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+.([a-zA-Z]{2,4})$',  
           $_REQUEST['email'])) {  
    echo 'Valid';  
    } else {  
    echo 'Invalid';  
    }  
    }  
    ?>

    Feel free to use the regular expression we made above on your own site to validate email addresses, or modify it for your own purposes.

    Syntax Reference

    This is a quick reference to some of the basic syntax. We’ve already seen much of it earlier on, but there are a few new things here that you may find useful.

    ^   start of string
    $ end of string
    [a-z] letters a-z inclusive in lower case
    [A-Z] letters A-Z inclusive in upper case
    [0-9] numbers 0-9 inclusive
    [^0-9] no occurrences of numbers 0-9 inclusive
    ? zero or one of the preceding character(s)
    * zero or more of preceding character(s)
    + one or more of preceding character(s)
    {2} 2 of preceding character(s)
    {2,} 2 or more of preceding character(s)
    {2,4} 2 -- 4 of preceding character(s)
    . any character
    (a|b) a OR b
    s empty space (known as whitespace)

  • Frequently Asked Questions about PHP Regular Expressions

    What are the basic components of a PHP regular expression?

    A PHP regular expression is a sequence of characters that forms a search pattern. It’s used for pattern matching with strings, or string matching, i.e. “find this pattern within this string.” The basic components of a PHP regular expression include delimiters, a pattern, and optional modifiers. Delimiters are any non-alphanumeric, non-backslash, non-whitespace character. The pattern is the character or sequence of characters you want to find. Modifiers are optional and change the search pattern.

    How do I use regular expressions in PHP?

    In PHP, regular expressions are used with the preg_match(), preg_match_all() and preg_replace() functions. The preg_match() function searches a string for a pattern, returning true if the pattern exists, and false otherwise. The preg_match_all() function does the same, but returns all matches in the string. The preg_replace() function searches a string for a pattern, and replaces it with specified text.

    What are the different types of PHP regular expressions?

    There are two types of regular expressions in PHP: POSIX and PCRE (Perl Compatible Regular Expressions). POSIX is the older type and is not as powerful or flexible as PCRE. PCRE, on the other hand, is more powerful and is the preferred type for most PHP developers.

    How do I create a regular expression pattern in PHP?

    A regular expression pattern in PHP is created by enclosing the pattern in delimiters. The pattern can be any sequence of characters, and the delimiters can be any non-alphanumeric, non-backslash, non-whitespace character. For example, ‘/abc/’ is a regular expression pattern that matches the string ‘abc’.

    What are some common modifiers used in PHP regular expressions?

    Some common modifiers used in PHP regular expressions include ‘i’ (case-insensitive), ‘m’ (multiline mode), ‘s’ (dot matches newline), and ‘x’ (extended mode). These modifiers change the behavior of the regular expression.

    How do I use the preg_match() function in PHP?

    The preg_match() function in PHP is used to search a string for a pattern. It takes two arguments: the pattern and the string to search. If the pattern is found, the function returns true; if not, it returns false.

    How do I use the preg_replace() function in PHP?

    The preg_replace() function in PHP is used to search a string for a pattern and replace it with specified text. It takes three arguments: the pattern, the replacement text, and the string to search.

    What is the difference between the preg_match() and preg_match_all() functions in PHP?

    The main difference between the preg_match() and preg_match_all() functions in PHP is that preg_match() stops searching after it finds the first match, while preg_match_all() continues searching and returns all matches in the string.

    How do I use regular expressions to validate user input in PHP?

    Regular expressions can be used to validate user input in PHP by matching the input against a pattern. If the input matches the pattern, it is valid; if not, it is invalid. This can be done using the preg_match() function.

    Can I use regular expressions to split a string in PHP?

    Yes, you can use regular expressions to split a string in PHP. The preg_split() function splits a string into an array by a regular expression. It takes two arguments: the pattern and the string to split.

    James Ussher-SmithJames Ussher-Smith
    View Author

    James is a student and freelance Web developer, specialising in database driven Websites. He is also developing a powerful link directory script, SideLinks.

    Share this article
    Read Next
    Get the freshest news and resources for developers, designers and digital creators in your inbox each week