Detect Hacked Files via CRON/PHP

Tweet

As a Certified Ethical Hacker, I searched for a script which would help me to detect unauthorized file changes. I found a script (probably in the User Contributed Notes at php.net) which I modified to have working very nicely on my “test server” (Windows) as well as on my “production” server (Linux).

The logic is simple: “Build a database of hashed values for vulnerable files (those which hackers will modify to execute code on your server) and compare those values to the actual hashes on a regular basis and report added, changed and deleted files.”

Obviously, the code to traverse a server’s directory structure and provide hash values is far more complex than the statement above. I will go through the code for the production server.

Database Setup

For security, use a separate database for this which does not share access credentials with any other database. Use cPanel to create the new database and the new user with a strong password (I recommend a 16 character password generated by strongpasswordgenerator.com) and an innocuous name like baseline. Then use PHPMyAdmin’s SQL to create two tables:

    CREATE TABLE baseline (
        file_path VARCHAR(200) NOT NULL,
        file_hash CHAR(40) NOT NULL,
        acct VARCHAR(40) NOT NULL
        PRIMARY KEY (file_path)
    );
 
    CREATE TABLE tested (
        tested DATETIME NOT NULL,
        account VARCHAR(40) NOT NULL
        PRIMARY KEY (tested)
    );

The first table, “baseline,” contains a large field for your path/to/filenames, a fixed field for the file_hash (40 characters are required for SHA1) and acct to allow me to monitor accounts or domains separately. Set the file_path as the Primary Key.

The “tested” table will hold the DATETIME of every scan and the account is the same as baseline’s acct field so it will allow you to scan various accounts or domains and keep their data separate.

Initialize the PHP File:

First, DEFINE several constants

  • PATH is the physical path to the start of your scan, which is usually the DocumentRoot. Just remember not to use Windows’ backslashes because both Apache and PHP will be looking for forward slashes.
  • Database access constants SERVER ('localhost'), USER, PASSWORD and DATABASE.

and several variables

  • An array of the file extensions to examine. Because not all files are executable on the server, I only scan .php, .htm, .html and .js files and these need to be specified in an array. Note than an empty array will force ALL files to be scanned (best for security but uses the most server resources).
  • Directories to exclude. Iif you have a directory containing malware, shame on you! In any event, if you need to exclude a directory for any reason, you have the opportunity to list them in an array. Don’t omit any directories just because you only stored images or pdf files, though, there because a hacker can put his files in there, too!
  • Initialize the variables you’re about to use: The $file array as an empty array(), the $report string as an empty string and the $acct string (use the account/acct name from your database tables) need to be initialized.

 

Let’s get started!

<?php

//          initialize

$dir = new RecursiveDirectoryIterator(PATH);

$iter = new RecursiveIteratorIterator($dir);

while ($iter->valid())

{

    //          skip unwanted directories

    if (!$iter->isDot() && !in_array($iter->getSubPath(), $skip))

    {

        //          get specific file extensions
        if (!empty($ext))

        {

            //          PHP 5.3.4: if (in_array($iter->getExtension(), $ext))

            if (in_array(pathinfo($iter->key(), PATHINFO_EXTENSION), $ext))

            {
 
                $files[$iter->key()] = hash_file("sha1", $iter->key());

            }

        } else {

            //          ignore file extensions

            $files[$iter->key()] = hash_file("sha1", $iter->key());

        }

    }

    $iter->next();

}

What we’ve just done is use the RecursiveIteratorIterator() function (a function used to iterate through recursive iterators)  on the directory ($dir) as it iterates through the directory structure. The first thing it does is check whether a directory has been banned from the iteration then branch depending upon whether file extensions had been specified. The result is a two-dimensional matrix of files, ($files), with path/name.ext as the index and corresponding SHA1 hash value.

I’ll note here that the commented echo statements were used on my Windows test server without linking to the SMTP server but you’ll need to uncomment them if you need to verify the correct functionality.

The file count can be provided immediately by the files array:

$report .= "Files has " . count($files) . " records.rn";

The output, whether to your test monitor or email, has just been given its first non-empty value: the hashed file count.

 

Last Hash Scan

The next thing to do is fetch the data/time the last hash scan was accomplished and get the stored path/file and hash set from the database.

$results = mysqli_query($db,"SELECT tested FROM tested WHERE acct = '$acct'
    ORDER BY tested DESC LIMIT 1");

if ($results)

{

    while($result=mysqli_fetch_array($results))

    {

        $tested = $result['tested'];

    }

$report .= "Last tested $tested.rn";

}

 

Compare Hashed Files with Database Records

So far, we’ve only learned the current file count and datetime of the last scan. The value we’re looking for is to identify the changed files, i.e., those added, changed or deleted. Let’s create an array of the differences.

//          identify differences

if (!empty($files))

{

    $result = mysqli_query($db,"SELECT * FROM baseline");

    if (!empty($result))

    {

        foreach ($result as $value)

        {

            $baseline[$value["file_path"]] = $value["file_hash"];

            }

            $diffs = array_diff_assoc($files, $baseline);

            unset($baseline);

        }

    }

 

//          sort differences into Deleted, Altered and Added arrays

if (!empty($files))

{

    $results = mysqli_query($db,"SELECT file_path, file_hash FROM baseline WHERE acct = '$acct'");

    if (!empty($results))

    {

        $baseline = array();      //          from database

        $diffs = array();         //          differences between $files and $baseline

                                  //          $files is current array of file_path => file_hash

        while ($value = mysqli_fetch_array($results))

        {

            if (!array_key_exists($value["file_path"], $files))

            {

                //          Deleted files

                $diffs["Deleted"][$value["file_path"]] = $value["file_path"];

                $baseline[$value["file_path"]] = $value["file_hash"];

            } else {

                    //          Altered files

                    if ($files[$value["file_path"]] <> $value["file_hash"])

                    {

                        $diffs["Altered"][$value["file_path"]] = $value["file_path"];

                        $baseline[$value["file_path"]] = $value["file_path"];

                    } else {

                            //          Unchanged files

                            $baseline[$value["file_path"]] = $value["file_hash"];

                    }

            }

        }

        if (count($baseline) < count($files))

        {

            //          Added files

            $diffs["Added"] = array_diff_assoc($files, $baseline);

        }

        unset($baseline);

    }

}

When completed, the $diffs array will either be empty or it will contain any discrepancies found in the multi-dimensional array sorted by Deleted, Altered and Added along with the path/file and associated hash pairs for each.

 

Email Results

You will need to add the discrepancies to the report and email.

 

//          display discrepancies

if (!empty($diffs)) {

$report .= "The following discrepancies were found:rnrn";

foreach ($diffs as $status => $affected)

{

    if (is_array($affected) && !empty($affected))

    {

        ($test) ? echo "<li>" . $status . "</li>" : $report .= "* $status *rnrn";

        ($test) ? echo "<ol>" : '';
        foreach($affected as $path => $hash) $report .= " • $pathrn";

    }

}

} else {

    $report .= "File structure is intact.rn";

}

 

$mailed = mail('you@example.com', $acct . ' Integrity Monitor Report',$report);

 

Update the Database

You’re not finished yet!

//          update database

//          clear old records

mysqli_query($db,"DELETE FROM baseline WHERE acct = '$acct'");

 

//          insert updated records

foreach ($files as $path => $hash)

{

    mysqli_query($db,"INSERT INTO baseline (file_path, file_hash, acct)
        VALUES ('$path','$hash', '$acct')");

}

 

mysqli_query($db,"INSERT INTO tested (tested, acct) VALUES (NOW(), '$acct')");

 

mysqli_close($db);

?>

On the first pass, there will be nothing in the database’s baseline table and ALL files will display as Added so don’t be alarmed.

Now that you have the code, where do you upload it? Don’t even consider placing this code in your webspace (under the DocumentRoot) as that will mean that anyone can access your file and delete the saved information to invalidate your hash scans. For simplicity, put it in the same directory of your account which holds public_html (or similar) directory.

 

Activate

Now that you have the code, you need to have it activated on a regular basis. That’s where the CRON function of the server excels! Simply use your cPanel to create a new CRON job, set the time in the middle of the night when your server should be nearly idle (you don’t want to interfere with or delay visitors’ activities, which also means you should limit yourself to a single scan per day) and use the following directive:

/usr/local/bin/php -q /home/account/hashscan.php

where /usr/local/bin/php is the location of the server’s PHP executable and /home/account/hashscan.php is the path to your hashscan.php script (or whatever name you gave it).

 

Wrap-Up

We have created a new database with two tables, one to hold the dates and one to hold the baseline hashes. We have initiated every scan by identifying the file types (by extension) that we need to track and identified the start point (DocumentRoot) for our scan.

We’ve scanned the files avoiding the unwanted directories and compared the hashes against the baseline in the database. Closing the process, we’ve updated the database tables and either displayed (on a test server) or emailed (from the production server) the results. Our CRON job will then activate your hash scan on a regular basis.

This ZIP file contains the above CreateTable.sql, hashscan.php and CRON.txt files.

This is but one part of securing your website, though, as it will only inform you of changes to the types of files you’ve specified. Before you get this far, you must ensure that your files are malware free (maldet scans established by your host can do this but be sure that you keep a clean master copy off-line), ensure that no one but you can upload via FTP (by using VERY strong passwords) and keep “canned apps” up to date (because their patches are closing vulnerabilities found and exploited by hackers and their legions of “script kiddies”).

In summary, BE PARANOID! There may be no-one out to get you but there are those out for “kicks” who are looking for easy prey. Your objective is to avoid that classification.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.martinbean.co.uk/ Martin Bean

    How does this work in a set-up with version control, or continuous integration in place? Surely if a file is changed by way of syncing with a code repository this method will report false positives?

    • Patrick

      Obviously if a file is modified by any means it will cause false positives. But you will know about any changes you’ve made, so you can simply disregard them.

      You could easily repopulate the database whenever you make changes to prevent this.

      • http://datakoncepts.com DK Lynn

        Patrick,

        Too true … but that also give a good indication that the hackscan is working for you, too. I just love those “warm, fuzzy feelings!”

  • http://marcelocustodio.net Marcelo Custodio

    Such a wonderful approach to security! Congratulations. I’d like to know wheter it doesn’t add some overhead to app bootstraping since there’s a new security check task?

  • http://datakoncepts.com DK Lynn

    Marcelo,

    Thank you! There is nothing to attach to bootstrapping since you schedule the CRON to activate at the time of least server activity (while you’re sleeping). That’s part of the beauty of this.

    Martin,

    Keep your own version control. Yes, I get “confirmations” (rather than false positives) of changes made to my scripts. Because I made the changes, I’m expecting these confirmations but, if you’re synching automatically, it will provide verification that your synch process is working.

    Both,

    As a Certified Ethical Hacker, I’m well aware of the insidious activities of hackers and wrote a checklist of response activities and posted it in SitePoint’s Web Security forum board. The one measure which brought the most questions, though, was the regular verification that files had not been changed. That’s precisely what this does.

    I’d been hacked a while back with javascript inserted at the beginning of both PHP and HTML scripts. It took maldet a while to find all the instances of the hack attack but this CRON has verified all changes (including an attack within one account I don’t manage) for months.

    On a final note, because my server is Apache and I’m using PHP, I showed only the validation of PHP, HTML and JS files. Should any ASP files (or other server file types) show up, they’d not be validated by this script unless the extensions array is left empty. Therefore, please let me recommend now that you leave the extensions array empty. That will waste more time on images and CSS but the security will be well worth the little extra time.

  • http://dtaborda.goinov.com linklince

    upgrade for the script:
    Add some function to store the content of the actual file, if correct it is on a correct state, and if you detect some hack in the future on your cron job you can correct it faster and in automatic way…
    it seem to be a nice upgrade, nice job.

    Daniel

    • http://datakoncepts.com DK Lynn

      linklince,

      Thank you for the comment.

      As for the function to store in an actual file:

      1. You get the entire list e-mailed to you the first time you run the CRON.

      2. Saving a flat file is simply duplicating the database. However, when you save a text file, you’d probably be saving to the same file name as previous which only duplicates the database while, if you save it with a date in the filename, you run the risk of filling your allotted space on the server with an endless series of duplicated files.

      3. If you’re adamant about saving the info in a file in addition to the database, you could e-mail the changes to yourself for every scan that is run (be sure to include the hash values, too, for comparison). That would simply be changing $report .= ” • $pathrn”; to $report .= ” • $path . ” => ” . $baseline[$path]['file_hash'] . “rn”;

  • sam

    Hi DK Lynn,

    First of very nice article – i’m only getting started with PHP, SQL etc – but know my way around the basics. Figured it would be cool to try this out and was hoping you could help me clear out a few things.

    1. Where do I define my DB in your script? Is it correctly assumed, that I need to setup this myself within your script? I do see your $db reference in all queries.

    2. “An array of the file extensions to examine.” <- you mention this as a reference of which files to scan – but i'm a bit unsure where I actually define this in your script?

    3. "PATH is the physical path to the start of your scan" <- I see where to define this in your script – would you be able to give an example of a PATH?

    Thanks a lot! :)

    • http://datakoncepts.com DK Lynn

      sam,

      Thank you.

      A1. I use a PHP script outside the webspace and link to it to open the database and get the $db file handler. Because this script, too, should be outside the webspace, you can simply use $db = mysqli_connect(‘localhost’,USERNAME,PASSWORD,DATABASE); where you can define these constants elsewhere or simply replace with appropriate single quoted values.

      A2. They were defined in the initialization process – but I see that $ext = array(“php”,”html”,”js”);
      didn’t make the version online. As with my comment above, I’d recommend allowing the scan to create hashes of ALL files (because a hacker may add a different file type which would not be scanned) which would be $ext = array();.

      A3. Example: define(“PATH”, “/home/your_account_name/public_html/”);

      You’re very welcome!

  • http://www.corax.org Greg Raven

    I find it easier to set up a cron job to back up the site to Rsync.net, and then scan the resulting e-mail each morning. Even on a WordPress site with 25,000 pages, it takes only a moment to see if anything untoward has happened in the last 24 hours.

  • pineyscriper

    The wonders of the web always confuse me. How does a ‘hacker’ get access to my web space in order to put or alter a html or php file ?

    Thanks,
    Pineyscripter

    • http://datakoncepts.com DK Lynn

      Piney,

      That opens a whole new can of worms, however, social engineering is the easiest (how many people know your username/password?), hacking passwords, another person on your server shares resources so it would be easier for them than most, … No, this is not a tutorial in hacking but a way to detect when you’ve been hacked, it’s but one step to help you respond before your account is blacklisted around the world.

  • http://www.jimmyweb.net James Beattie

    Thank you, a very clever idea which will shall add to our toolbelt immediately!

  • Sebastiaan Stok

    For searching this many files its better to use http://symfony.com/doc/master/components/finder.html as it supports using the native find command which will properly speed-up things allot.

  • Keith S.

    This is a great looking script, however I have encountered an issue. When I try to execute the check, I get the following (trying to run it from the cli to set up the initial entries):

    PHP Parse error: syntax error, unexpected T_ECHO in /xxx/hashscan.php on line 105

    I’ve looked at that line a hundred times, and I don’t understand why it’s throwing the error. Got any pointers?

    • http://datakoncepts.com DK Lynn

      Keith,

      You’ve discovered a remnant of my test code which should have been removed. Please change that line to $report .= “* $status *rnrn”; AND remove the next line with my apologies.

      I have updated the article and code (zip file) and requested that SitePoint update to “cover my tracks” on this (and showing the $ext and $skip array examples and mysqli_connect point).

      Thank you for pointing that out!

  • http://notenabeld Richard

    Do you have a blog or do you tweet? I want to read and do more by you.
    Thank you for this post!!!!!!

    • http://datakoncepts.com DK Lynn

      That’s high praise, indeed! Thank you for that!

      No blogs and no tweets although I do have my web development and hosting website at dk.co.nz and an Internet security-related website at talonz.co.nz.

  • Les

    This is great advice and better that you’ve given working code too, will definitely be looking into this now, not that any website I do is insecure, it’s just good to be paranoid, as you say :)

  • http://datakoncepts.com DK Lynn

    With the comments below, I have updated the article and its code and uploaded a revised ZIP file to http://dk.co.nz/HashAlert2.zip (with thanks to sam and Keith S. for pointing out that I hadn’t shown where I initialized the $ext and $skip arrays, I hadn’t shown the creation of the $db MySQL handler and I’d left a development remnant – $test – in the code). Thank you, everyone, for the comments as well as for suggesting other options.

    This article was created because of the questions generated about item #4 of my posted Hack Attack Recovery Checklist in SitePoint’s Web Security forum board:

    1. Immediatly delete all FTP access except one (master for the account).

    2. Change the master password (cPanel/WHM and FTP) to a VERY STRONG one using an http://strongpasswordgenerator.com password of sufficient length.

    3. Use maldet scans (on an Apache server) which find and report all forms of malware. This will identify scripts which can be embedded in html, php and js scripts. Repeat the maldet scans until there are no files detected then add a CRON to run maldet scans on a regular basis. Be aware that recovery will primarily consist of deleting ALL html, php and js files and replacing them with originals (from your master copies).

    4. Additionally, I run a script via CRON to verify that files have remain unchanged over the last xx hours for “peace of mind.”

    5. Database: If you are running WordPress or the like (using database verification for admin accounts), create a new admin and delete all other admin records.

    6. Update all “canned scripts” (e.g., WordPress, Zencart, etc.) and be sure that they’re kept updated in order to prevent further attacks via security problems discovered in those scripts. This includes their third party plug-ins, too.

    7. Uploaded files: Be sure to do a thorough check of any file uploaded to your website (I limit uploaded files to images and they are recreated and resized by GD before being saved to my “webspace”).

    The message is that it’s easier to be paranoid than to recover from a hack attack.

  • Dave

    ….or type:
    yum install rkhunter