Simple Captchas with PHP and GD

Mehul Jain

sample image

By now, we’ve all encountered captcha images in online forms. Captchas are a necessary evil, and this article will teach you how they’re made.

Please note that while there are better, automatic third party solutions for captchas out there such as ReCaptcha, this tutorial aims merely to explain and demonstrate how such technology actually works. We won’t be explaining what captchas actually are, as it’s assumed to be common knowledge and already covered in greater detail elsewhere.

Drawing captchas

You must have the GD(Graphics Draw) library installed before proceeding. This library enables drawing of graphics and images through built-in PHP functions. To install it, run sudo apt-get install php5-gd or if on non-Ubuntu-based operating systems, follow instructions.

Captchas are usually made up of 3 things – shape, distortion, and the text.
We’ll follow the steps mentioned below:

  1. Display an empty image on the browser.
  2. Create a shape.
  3. Generate random lines.
  4. Generate random dots.
  5. Generate random text.

The procedural style used in this article is present only because this is a proof of concept, and to keep the final file as simple as possible. In a real project, you would go OOP.

Display an empty image

The image will be treated by HTML as if an external image is being displayed using the “img” tag. Two functions are used – one for creating the image and another for displaying.

<?php
session_start();
?>

    <title>demo.php</title>
    <body style="background-color:#ddd; ">

    <?php
    create_image();
    display();
    /***** definition of functions *****/
    function display()
    {
        ?>

        <div style="text-align:center;">
            <h3>TYPE THE TEXT YOU SEE IN THE IMAGE</h3>
            <b>This is just to check if you are a robot</b>

            <div style="display:block;margin-bottom:20px;margin-top:20px;">
                <img src="image.png">
            </div>
            //div1 ends
        </div>                          //div2 ends

    <?php
    }

    function  create_image()
    {
        $image = imagecreatetruecolor(200, 50);
        imagepng($image, "image.png");
    }

    ?>
    </body>
<?php
?>

The first line indicates the start of the user’s session on our page.

The display() function has nothing other than a normal HTML code that displays an image in the browser. Other than that, only styling is done for the output to look presentable.

Inside the create_image() function, a variable is used to refer the image returned by the imagecreatetruecolor() function which takes the width and length of the image as its arguments. imagepng() creates a png image of the specified name and path (in the same directory).

A black image will be the output after our first step.

scrn1

Note that the function imagepng() will be the last line of our function and all the following steps are to be inserted in the create_image() function before this function call only, else they would not take effect.

Create a shape

Any shape can be chosen for the captcha. We’ll choose a rectangle by using the function imagefilledrectangle(). It takes five arguments – image reference, starting x-pos, starting y-pos, ending x-pos, ending y-pos, and the background color. You may use the corresponding function for an ellipse for generating elliptical captcha.

The imagecolorallocate() function allocates a color to a variable as it takes the RGB combination of the color as arguments. The following code is to be appended in the create() function.

$background_color = imagecolorallocate($image, 255, 255, 255);  
imagefilledrectangle($image,0,0,200,50,$background_color);

The previous image will be white after this step.

sample image

Generate random lines.

Now, we actually start with making the distortion part of the captcha. In PHP, the lines are generated from the starting point(x1,y1) to the end point(x2,y2). Now as we want our lines to touch both ends of the box, we will keep the <x1,x2> coordinates as <0,200> i.e., the complete width of our box. The <y1,y2> coordinates will be randomly generated. This will create just one random line. We will generate multiple lines by putting this functionality inside a for loop.

$line_color = imagecolorallocate($image, 64,64,64); 
for($i=0;$i<10;$i++) {
    imageline($image,0,rand()%50,200,rand()%50,$line_color);
}

The imageline() function takes the x1,x2,y1,y2 coordinates as arguments in that order apart from the image reference and color of the line. The line color has been allocated just as the background color had been allocated in the previous step.

The y-coordinate is given as rand()*%50 because this is the height of our box and will always return a value under 50. You may alternatively use rand(0,50). They will yield the same output range.

sample image

Generate random dots.

Random dots will be generated in the same way as random lines. The function used is imagesetpixel(). This function takes the value of coordinates where the dot will be placed in the box.

$pixel_color = imagecolorallocate($image, 0,0,255);
for($i=0;$i<1000;$i++) {
    imagesetpixel($image,rand()%200,rand()%50,$pixel_color);
}  

The x-coordinate is randomly generated by using rand()*%200 as this is the width of our box and this will always return a value under 200. You may alternatively use rand(0,200). They will yield the same output range. The y coordinate is generated as in the lines step.

sample image

Generate random text

We will randomly point to a position in the string (which contains the alphabet in both lower and upper case) and assign it to the variable $letter

$letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
$len = strlen($letters);
$letter = $letters[rand(0, $len-1)];

$text_color = imagecolorallocate($image, 0,0,0);

When put inside a loop, it looks like this-

for ($i = 0; $i< 6;$i++) {
    $letter = $letters[rand(0, $len-1)];
    imagestring($image, 5,  5+($i*30), 20, $letter, $text_color);
    $word.=$letter;
}
$_SESSION['captcha_string'] = $word;

We will explain the lines

$word.=$letter;
$_SESSION['captcha_string'] = $word;   

in the next section.

The function imagestring() writes the text in our image. It has 6 arguments:

  1. Image reference.
  2. Font-size of the text (it can be 5 at most).
  3. x-coordinate (changing proportionally for every alphabet).
  4. y-coordinate (kept the same, although we could change this randomly too).
  5. Actual string to be written.
  6. Font-color of the text.

You can also use the function imagettftext() if you wish to have a bigger font and different font style. It takes 2 additional arguments for angle and font style of the text.

Calculation of x-coordinate is done by inspection. Roughly, the letters are spaced about 35 pixels (5+($i*30)) where $i=0,1,2,3,4,5,6. This is because if we had kept this value around 15-20px, there would have been a possibility of two letters overlapping. If the value had been more than 40px, the letters altogether would have not fit into the box.

This will generate a 6 alphabet captcha text. We can always create more randomness by changing the aspects that have been kept constant due to simplicity, like color, y-coordinates etc.

The final captcha will look like this

sample image

The text written in the captcha will change every time you refresh the page.
More randomness can be achieved by creating designs with the pixels or by changing the color or size.

Validating

It is here that the user’s response is taken and after processing it, he/she receives a reply. At first, a simple form is made with an input textbox and a submit button. There can be many ways of processing a captcha as per the requirements of complex web applications. But keeping it simple for the sake of this example, we’ll process it on the same page.

The two lines left unexplained in the previous code snippets come into play now:

  1. $word.=$letter; – the concatenation operator . is used to append all the individual letters one after another, generating the 6-letter word.
  2. $_SESSION['captcha_string'] = $word; Our captcha string is stored in a session variable which will be used for validation purposes.

We’ll change the definition of display() to add a form-like structure.

Two submit buttons will be used, one to submit the string and another other to refresh the page.

The following lines will be added in between the two closing div tags (see comments in the previous display() function)

function display()
{
    ?>

    <div style="text-align:center;">
        <h3>TYPE THE TEXT YOU SEE IN THE IMAGE</h3>
        <b>This is just to check if you are a robot</b>

        <div style="display:block;margin-bottom:20px;margin-top:20px;">
            <img src="image<?php echo $_SESSION['count'] ?>.png">
        </div>
        <form action=" <?php echo $_SERVER['PHP_SELF']; ?>" method="POST"
        / >
        <input type="text" name="input"/>
        <input type="hidden" name="flag" value="1"/>
        <input type="submit" value="submit" name="submit"/>
        </form>

        <form action=" <?php echo $_SERVER['PHP_SELF']; ?>" method="POST">
            <input type="submit" value="refresh the page">
        </form>
    </div>

<?php
}

Before moving further we must know when to display and when not to display the input box. It will be displayed only

  1. if the page has just loaded.
  2. if the user’s answer was incorrect.

The first condition is met by using a $flag which is set to ‘1’ each time the submit button is clicked. Initially, it has been set to any other value. The second condition is achieved by checking if the value stored in our session variable is the same as user input (see the code below).

To achieve this, we will replace the following lines of our starting step at the beginning of the article:

    create_image();
    display();

with:

$flag = 5;

if (isset($_POST["flag"])) //  check that POST variable is not empty
{
    $input = $_POST["input"];
    $flag = $_POST["flag"];
}

if ($flag == 1) // submit has been clicked
{
    if (isset($_SESSION['captcha_string']) && $input == $_SESSION['captcha_string']) // user input and captcha string are same
    {

        ?>

        <div style="text-align:center;">
            <h1>Your answer is correct!</h1>

            <form action=" <?php echo $_SERVER['PHP_SELF']; ?>" method="POST"> // refresh the page
                <input type="submit" value="refresh the page">
            </form>
        </div>

    <?php

    } else // incorrect answer, captcha shown again
    {

        ?>
        <div style="text-align:center;">
            <h1>Your answer is incorrect!<br>please try again </h1>
        </div>
        <?php
        create_image();
        display();
    }

} else // page has just been loaded
{
    create_image();
    display();
}

Note that the functions create_image() and display() are called only as per the 2 conditions discussed above.

We’ll need the session variable from the previous page, so the session is not destroyed here. The session will be automatically destroyed once the browser window is closed.

The captcha will look like this-

sample image

If the input is incorrect, only then user will be prompted again.

sample image

If the input is correct, user will be shown a message.

sample image

There is a minor caveat – when the user presses the back button, any image already present in the cache of the browser will not reload, while the page does. In a POST request, browser back button will show a “document expired” page, but when the request is GET, the image does not regenerate.

The solution is simple – creating unique names of images every time, so that the browser doesn’t find them in cache. We will append a unique string returned to us by the built-in time() function to the image name while creating and while displaying in the browser.

Add this line just below where you started your session:

$_SESSION['count']=time(); // unique string stored

Replace the img src tag in the display() function with

<img src="image<?php echo $_SESSION['count']?>.png">

And the part where we created the png image in create_image() function will also be replaced with

imagepng($image,"image".$_SESSION['count'].".png");

The images will now be called something like image39342015.png. This procedure will create images as many times as the page is refreshed which can waste huge amounts of disk space, so we’ll make sure that before creating an image, all other images of the png extension are deleted. Add the following just before the imagepng() function is called.

$images = glob("*.png");
foreach($images as $image_to_delete)
{
    unlink($image_to_delete);      
}

In a production app, just make sure that you isolate the folder where captcha images are being stored else other useful images may get deleted too.

Download the full code here.

Conclusion

Making various types of captchas in PHP is very easy. This article covered the three basic things used for creating a standard captcha – shape, distortion, and text. This article was a proof of concept, and the code presented herein should not be used in production – especially since excellent alternatives such as ReCaptcha exist, which also supports sound output to help people with hearing impairments. We hope you found this article interesting. Leave your comments and feedback below!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • ccornutt

    One thing you probably want to add is an “is_null” check on the $_SESSION['captcha_string'] if check when you’re comparing it to input.

    If someone was feeling malicious, they’d see that the form has a field for “flag” and wonder what it does. Since the value is “1″, they’d pass that first, maybe without an “input” value. In this case the “$_SESSION['captcha_string']” wouldn’t exist yet (NULL) and “input” wouldn’t have a value (NULL) so the statement:

    if ($input == $_SESSION['captcha_string']) {

    would evaluate to true and tell them the answer was correct. Since using sessions like this is a pretty common PHP way to do CAPTCHAs, it’s not too big of a leap to figure it out. I’d recommend:

    if (isset($_SESSION['captcha_string']) && $input == $_SESSION['captcha_string']) {

    • http://www.bitfalls.com/ Bruno Skvorc

      Thank you, fixed! Obviously, this is no production-value example, but that still doesn’t excuse insecure code.

      • ccornutt

        Any code posted in a tutorial should be treated as production-value :) People copy and paste all the time…

        • http://www.bitfalls.com/ Bruno Skvorc

          I generally agree and hold the channel to this standard. I’ll let the occasional newbie article through with little to no production value, when it’s a proof of concept or an elaboration of something that has better alternatives anyway (like this one), and in those cases I’ll put in warnings about the code quality all over the piece. Though, if the community will react in a less than a favorable manner this time, I’ll definitely consider “stricting” things up.

          • ccornutt

            Easy there, I didn’t mean any hostility by the comment. I just know the habits of developers, especially those new to PHP, and want them to stay safe. The quality of the code in these posts is normally good. This was just one spot where I saw room for improvement.

          • http://www.bitfalls.com/ Bruno Skvorc

            Didn’t mean to sound overly defensive, your comments are much appreciated :)

  • http://carbonize.co.uk/ Carbonize

    This is one incredibly overly complicated and convoluted tutorial.
    For a start not everything requires going OOP and it annoys me when people seem to think not using OOP means you’re code is not up to scratch.
    You say that captcha uses shape then go on to say you will use a rectangle. Ever seen a captcha that was not a rectangle?
    You refer to the lines as distortion when they are not distortion, they are obfusction at best.
    Next you are mixing upper and lowercase characters will end up with confusion
    between a capital i and a lowercase L depending on the font used.
    You only use the built in GD font which, even at it’s largest, is very very small.
    Why are you creating a variable to hold the strlen of the characters and then taking 1 from it when you already know what this number is going to be anyway so are just wasting resources.
    There is absolutely no reason to require a function to create the html for the form.
    You
    weirdly decide to create a random name for the image file to prevent
    caching issues when the same is achieved by not only the hash that will
    be part of a variable in the get request for they image but by simply
    adding a second variable containing a random string.
    You use 3 different colours for the lines, dots and characters making it easy for a spammer to mess with the image to remove the distortions.

    Oh and finally, and most importantly, you are not sanitising the users input meaning the captcha can be easily bypassed.

    • http://www.bitfalls.com/ Bruno Skvorc

      There’s some valid feedback here, but please don’t nitpick on terminology. Whether you call lines obfuscation or distortion is irrelevant, and a rectangle is a shape.

      Adding a variable to hold the string length is hardly wasting resources, and makes the code more extensible, and there’s no OOP to be found. In fact, it was one of the gripes I had with the code of this article.

      The default font was used because of better multi-platform support of this example.

      • http://carbonize.co.uk/ Carbonize

        Because distortion and obfuscation are two entirely different thing. Distortion would be altering the text somehow as in the example image at the start.

        Yes a rectangle is a shape but the author refers to shape as if it is an important part of the captcha and is different in various captcha.

        Yes the variable may make it extensible but in this form there is no rom for extensibility as the string of characters is set in stone.

        Your comment about OOP proves my point about it being stupid to have mentioned it. To many people, not you but many others, go on about OOP as if it is the only way to write code and those who don’t are inferior. PHP Classes is a good example of this.

        • http://www.bitfalls.com/ Bruno Skvorc

          PHP Classes has some horrible code, granted, but the OOP mention in the article was my own addition to it – I’m a strong proponent of OOP and strongly against spaghetti code. Ultimately, it’s personal preference in many cases, but I do think I’m with the professional majority in that specific regard.

    • Mehul Jain

      The design issues- shapes, colour, kind of distortion are completely upto the progammer. Moreover, there can be other thousand kinds of distortions, but not all can be covered in a single tutorial :).
      GD library is used because masses can connect to simple GD libraries only. And again, there can be more ways of naming random images, but using a timestamp always ensures full uniqueness.

      • http://carbonize.co.uk/ Carbonize

        My point is all captcha is rectangle so referring to shape is moot. Colour is important if you don’t want to make it to easy for OCR. As top distortion there is none at all with the text just being in a straight line.

        I made no reference to using GD so why mention it?

        There is no need to using a random image name for the captcha as this serves no purpose since the bot would have to request the form in the first place to ensure a captcha string has been generated at which point it would grab the image name. Every little bit of extra overhead soon adds up on a big script run on a popular site.

        • Mehul Jain

          By GD libraries, what I meant was to answer your query of using only GD font, so no issues . Random image names are very important because if you manually refresh the page and images name are not unique, a new image never loads on the browser. If there is any other possible, simpler way, your feed backs are always welcome.

          • http://carbonize.co.uk/ Carbonize

            I already did offer an alternative to a random file name

            “weirdly decide to create a random name for the image file to prevent
            caching issues when the same is achieved by not only the hash that will
            be part of a variable in the get request for they image but by simply
            adding a second variable containing a random string.”

          • Mehul Jain

            Yeah absolutely its a convenient way but there were doubts about this method as there is a greater probability of bots tracking it down ( may be I’m wrong). Thanks for your suggestion. :)

  • Vladislav Gafurov

    Thank you for the article!

    Why exactly do you need to store image as a file? You could use script for image rendering and refer to it as an image with a proper header.

    I mean something like that:

    /* captcha-image.php */
    header(“Content-type: image/png”);
    imagepng($image);

    /* form.php */

    Since you already are using sessions, why bother with creating/deleting files and providing extra permissions to some directory.

    • http://www.bitfalls.com/ Bruno Skvorc

      That was just the choice the writer made. Thanks for the feedback!

  • Taylor Ren

    The steps are straightforward and a nice one.

    Though a little bit disappointed on the distortion part. If the chars are displayed like this, they are not distorted after all. Skewing, streching and etc will make it more difficult for a bot to recognize.

    • http://www.bitfalls.com/ Bruno Skvorc

      What do you mean by “in the HTML”?

      • Taylor Ren

        @brunoskvorc:disqus

        I mean, in HTML form, we can embed a hidden filed with value set to a reference given by server, say “abc”. The back end db, “abc”, together with the real captcha “ABC” will be stored.

        In the validation process, PHP will use “abc” to get “ABC” and then check if the user input is correct. By doing so, we eliminate the session and/or store the plain text of “ABC” in the client side.

        It adds the server load, of course. But should be more secure.

        An alternative is to store the hash of “ABC” in a hidden field and then at the server side, verify the hash of user input against the hash of “ABC” should also improve the security.

        • http://www.bitfalls.com/ Bruno Skvorc

          Bots can read hidden fields, rendering the captcha useless.

          • Taylor Ren

            I know. That is why I suggested to use a hash or reference.

          • http://www.bitfalls.com/ Bruno Skvorc

            I’m not sure I follow, can you whip up a gist demonstration of your alternative?

          • http://carbonize.co.uk/ Carbonize

            I think he means about storing the session ID in a hidden field which is not really necessary if you are using PHP sessions.

      • Taylor Ren

        @brunoskvorc:disqus @carbonize:disqus

        I won’t use session. The pseudo-code below:

        <?php

        pickup_some_random_chars(); // say "abc123"
        generate_a_unique_reference(); // say "123456"

        store_string_and_reference_into_db(); // create a record in db, which contains two fields: 123456, abc123

        create_the_input_form_with_hidden_field(); // hidden field = 123456


        <?php

        get_user_input(); //which contains a user input string and the hidden field
        get_real_string(); //select real_string from mapping_table where reference=:user_input
        if(real_string==user_input)
        {…
        }

        • http://www.bitfalls.com/ Bruno Skvorc

          I’m not sure I agree with this approach. Making a captcha depend on a database doesn’t seem very efficient or extensible.

          • Taylor Ren

            Agreed to the disagree. So another alternative is to use the hash of the real string. It can avoid the db backend support.

            On the other hand, to use a db is extensible. For example, it is easier to add in another field indicating the valid period for that captcha.

  • omnichad

    Do NOT delete all the images on every call. That is not thinking asynchronously. If two people load the page at the same time, the PNG could be deleted by the other user before the first user downloads. Scale that up from two people and it can get to be a real mess.

    Plus, enumerating the files and deleting them takes some serious time when you scale up. How about some randomized garbage collection? Generate a random number between 1 and 5000 – if it matches a predetermined value, delete all PNG files older than 30 seconds.

    • http://www.bitfalls.com/ Bruno Skvorc

      Excellent points, thanks

  • http://www.coolfields.co.uk/ Graham Armfield

    “Captchas are a necessary evil…” – no, they’re not. Captchas are a usability and accessibility nightmare. And websites that use them risk losing the goodwill (and potential trade) of their visitors.

    A better article perhaps would be to cover how to create an effective anti-spam function – something along the lines of the way the Akismet plugin in WordPress works. Things like scanning the submission for multiple links, or dodgy words or phrases. Or maybe how to link to available anti-spam APIs.

    Please forgive me if there is already one of those on Sitepoint.

    • http://www.bitfalls.com/ Bruno Skvorc

      Captchas aren’t used that much in comments any more, though – I mostly see them on account creation forms, and Akismet is useless there.