Web - Proudly sponsored by SiteGround
By DK Lynn

Quickly Detect Hacked Files via CRON/PHP: SuperScan

By DK Lynn
We teamed up with SiteGround
To bring you the latest from the web and tried-and-true hosting, recommended for designers and developers. SitePoint Readers Get Up To 65% OFF Now

protecting your server

As a Certified Ethical Hacker, I'm fully aware that prevention is the best tactic to prevent hackers but, should one break through, the sooner you know it, the quicker you can act to limit the damage.

A while back, I presented a script called hashscan, designed to track site changes. Executed via a daily CRON, the script reads the files for a specified directory (e.g., an account’s public_html directory on a server), generates hashes (for files with specific file extensions), and compares them with the previous scan’s hashes stored in a database. It's a great way for site owners to be alerted to files that have been added, altered or deleted by a hacker.

In this article, I'll present an updated version of the script, called SuperScan.

Benefits of SuperScan

The primary benefit is that SuperScan will report any changes to files in an account whether the file change is an addition, alteration or deletion. SuperScan was designed not to overwhelm the webmaster. It only provides a report of changes since the last scan (the default is one hour, but can be configured via CRON) and a summary report (daily by default, although, again, it can be configured via CRON).

Because the scan of a 1500 file account takes ~0.75 seconds, SuperScan can be run frequently without affecting server performance.

To support forensic investigation, the file last modified date and time are held in the database, along with the hash value of the most recent scan (and prior scan for altered files).

The scanner file need not be changed, as all variables are set within a required configure script. It's in the configure script where you can select specific (or ALL) file extensions to be scanned or, if ALL, the file extensions to omit. Additionally, you may specify directories which the scanner will not scan.

While the SuperScan files can be tested within a webspace, I recommend that it be moved outside the webspace for production use via CRON to protect against casual hackers.

Finally, a curious additional benefit is that changes in (extensionless) error_log files are captured and can direct the webmaster’s attention to coding problems that have slipped through the testing procedures.

SuperScan Logic

The logic flow of SuperScan is:

  • Read the baseline information about the files in the database
  • Scan the system’s files and compute their hashes
  • Compare the baseline files against the current files to determine the changed files to generate:
    • A list of added files
    • A list of altered files and
    • A list of deleted files
  • Handle each of the changed files lists (update the database)
  • Prepare and send a report (if required).

Database, Variables and the Working Arrays

Rather than bore you with the details here, I've inserted comments in all the scripts.

Thus, in short, there are three database tables:

  • baseline: this contains the $file_path, the file’s hash and the file last modified date and time. I also added the account so multiple accounts could use a single database)
  • history: this records every detected change—or lack thereof—in each scan.
  • scanned: this records scan summary date and time, as well as the number of changes and associated account.

Warning #1:
I can’t stress enough that the $testing variable set by configure.php will trigger an immense amount of output, so it must only be used for testing and never during a CRON job!

Warning #2:
Because the path/to/file is used as a key, it must be unique. That means that multiple accounts can never scan the same files.

Warning #3:
In addition, Windows servers will use backslashes, which are immediately changed to slashes, because they cause characters to go missing in the database. Also, use of an apostrophe in a file name will cause problems with database queries.

The working arrays are designed to make use of PHP’s functions, which access the key ($file_path; this is also the file structure iterator, so never alter $iter‐>key()).

$baseline is read before starting the scan, $current is the result of the scan, and the $added, $altered and $deleted arrays accumulate the changes from the $baseline and are used to update the $baseline for the next scan.


The superscan.zip file contains 7 files:

  • CreateTables.sql, which can be used to setup your tables
  • ReadMe.txt, which provides an overview of the SuperScan script
  • scanner.php, the scanning script that requires configure.php and scandb.php (which connects to your MySQL server and returns the $scandb handle)
  • reporter.php, which will provide a summary of recent scans via CRON
  • CRON.txt, which provides sample CRON instructions for both scanner.php and reporter.php


The $report is created as the file changes are detected, and is stored and emailed if not a “negative report.” The summary report is used for the “warm, fuzzy feeling” when you’re not receiving change reports.

During the cleanup, history and scanned tables have records older than 30 days are auto‐purged to prevent unlimited growth of the database, the large arrays are destroyed (reset to empty) and the database is closed.


I believe that SuperScan is a massive improvement over my prior effort, and is a worthy upgrade. It provides frequent notice of changed files, while “negative reports” won't overwhelm the webmaster with unnecessary “Unchanged” notices.

Download the SuperScan code from GitHub


SuperScan was suggested by Han Wechgelaer (NL), who emailed the suggestion that my earlier hashscan script be extended to capture a history of the changes to an account’s files, as well as making more frequent assessments and adding a daily summary.

Han was kind enough to provide a copy of his start on this project and, between us, this evolved into SuperScan. Without Han’s gentle prodding and assistance, SuperScan would never have gotten off the ground and would certainly not be the exceptional tool it is today.

I'd love to know how you find this script, or if you have any questions about it or feedback.

We teamed up with SiteGround
To bring you the latest from the web and tried-and-true hosting, recommended for designers and developers. SitePoint Readers Get Up To 65% OFF Now
  • Allow me to show you a genuine way to earn a lot of extra^ money by finishing basic tasks from your house for few short hours a day — See more info by visiting >MY*&___(DISQUS)*%___ID)

  • Aliasghar

    after setup and running the scanner i got this error:

    Notice: Undefined variable: count_baseline in /home/site/public_html/superscan/scanner.php on line 238

  • Ralph Mason

    The latest zip file has been uploaded now. :-)

    • Allow !me to show you a real way to earn a lot of extra money by finishing basic tasks from your house for few short hours a day — See more info by visiting >MY!___@+__ID|

  • There is an Intrusion Detection (file integrity check) tool which is the name AIDE.

  • There is at least one issue that needs fixing in the provided code. scandb.php does not return a database object, scanner.php doesn’t have a mysqli_connect statement (apparently expecting the object to be returned from scandb.php) and the mysqli_connect statement in reporter.php is commented out. The end result is that there’s no activity in the database until a mysqli_connect statement is added to scandb.php. Once that’s done and the rest of the parameters are set correctly, it works very well. I just downloaded the zip file again to check and the problem still exists.

    • DK Lynn


      Too true (with apologies to everyone). I’ve sent you the corrected Zip file and posted the update online (at my website). SitePoint will get the update linked soon.



  • Roman

    I think that switching to SQlite from MySQL would be a very large improvement. That would make setting up and testing much easier.

  • Roman

    DK, I will tell you on the basis of my multi-year experience that you are grossly mistaken about SQLite. SQLite is NOT an fopen()-type flat file database, but a REPLACEMENT for it.

    SQLite is a transactional SQL based, relational database system with LOT of features. It works GREATLY in a web environment. SQLite is fully ACID-compliant.

    It overperforms MySQL in shared hosting environments because SQLite DB files aren’t shared by multiple websites as is the case with MySQL servers. In other words, if another website is using a MySQL server while your script is trying to use it, your script will have to wait for the server to become available for it. This doesn’t happen with SQLite DB files. They are always available to scripts trying to access them.

  • Hi Matt!

    Thank you for your in depth review and comments on my code. I am currently in the process of an update to cover some “minor” issues and will certainly include some of those you suggested.

    In my introduction, I did not rail on about the strange quirks discovered in the initial coding effort:

    • The backslash was problematic because the $file_path was used immediately but the problem was discovered by looking at the $file_path values which made it into the testing database. The “missing characters” were a surprise which was also resolved by replacing the backslashes.

    • The `date_last_modified` field began life as a datetime field but that caused some weird problem so it was quickly changed to a varchar field.

    • In defining the data types and indexes, I was driven by the PHP array functions, specifically those which deal with indexes.

    • Without making timing measurements, I relied upon the file_path as an index to ensure speed when UPDATEing `baseline` records.

    I have read that a hash of a file is virtually impossible to duplicate (1 in 2^160) and is a very reliable method to ensure that a file has not been changed. To reinforce that, the `date_last_modified` field was also checked. IMHO, just duplicating the hash would be too much work for a hacker (easier to empty the database tables). Adding the file size would
    be overkill and `date_last_modified` is preferred as it actually provides useful (forensic) information.

    I took the approach that SuperScan would be run in the production environment ONLY by CRON and from a location outside the webspace. Therefore, if a hacker was able to get into the account (not just the website), there were bigger problems than someone attacking SuperScan or its database. With all the input coming from scanner reading the file structure, I was not concerned about bad input (I am; one of my “minor” issues is the handling of extensionless files).

    I completely agree that “…most attackers don’t spend that much time on analyzing the whole webspace when mass-infecting websites.” It’s just an example of “The Law of Diminishing Returns” applying on the hackers’ side.

    While it may look like I’ve been lazy, I subscribe to the concept that Security is a trade-off between Protection, Convenience and Cost.

    Thank you again for the great review!



  • Guindillas

    Sorry for my english.

    I’m testing SuperScan in a large site and gives me a error:

    Fatal error: Allowed memory size of 100663296 bytes exhausted (tried to allocate
    8192 bytes) in /usr/home/xxxx/www/ss/scanner.php on line 136

    Line 136: $current[$file_path] = array(‘file_hash’ => hash_file(“sha1”, $file_path), ‘file_last_mod’ => date(“Y-m-d H:i:s”, filemtime($file_path)));

    In large sites can be memory problems. How to resolve this?

    Thanks in advance.

  • Guindillas,

    Don’t worry about your English … it’s far better than my (Dutch? Okay, ANY other language!).

    There are two obvious ways to handle very large websites:

    1. Send PHP a directive to allocate more memory (128Mb?). I would NOT do that as it may impact your visitors and, possibly, exceed your limit with your host.

    2. Segregate the scan into manageable size parcels. I had given SuperScan the ability to ignore specified directories so you could use multiple scans (be sure to change the scan’s name!) and set ignored directories for each scan so that the entire website (or account) is eventually scanned. No need to do it all at once as the scan names (for me, they were to separate different accounts in the same database) can get your entire directory structure covered.

    I recommend the second option albeit your directory structure may not lend itself to this type of directory segregation (all files in the DocumentRoot?!? Account’s root?!?).

    I hope that helped!

  • Guindillas

    Thanks Lynn:

    Segregate the scan can solve the problem.

    But there is another problem with script. Is with the $skip array. Only work fine if the folder to skip is the last on the path string.

    Path example: /usr/home/www/forum/cache/folder1/folder2/files…

    If i like skip all the files in “/usr/home/www/forum/cache” and add this string to $skip array in configuration.php don’t work properly, it works if the path is “/usr/home/www/forum/cache/folder1/folder2” (the last folder) because in scanner.php line 109:

    if (!$iter->isDot() && !(in_array($iter->getSubPath(), $skip)))

    $iter->getSubPath() return path complete and in_array function returns false if $skip array is not full path.

    Sorry again for my english. I’ve explained?


    • Guindillas,

      If you can dedicate more memory, then that resolves your problem. Otherwise, you’ll need to “slice off” a directory from the top level and create a new account for that slice. If that slice is too large, too, then repeat the process (skip a large subdirectory and use it for a third/fourth account … until all your directories are scanned). One caution, though, is to run your scans at different times so they don’t occupy your server’s CPU and ignore your website visitors.

      From your path, I believe that www is your DocumentRoot and forum is a very large directory. Skip forum at the top level and create a new account for …/www/forum. If cache and its subdirectories are too large, then handle all the forum but skip cache and create another scanning account for cache. If cache has too many large folders, do the same to it, too, until your entire website (or account) is being scanned.

      WARNING: Because a duplication of the path WILL be detected and be rejected by the database, be sure NOT to scan the same directories in multiple scanning accounts!



      • Guindillas

        Hello again Lynn:

        I think my bad English has confused this new issue.

        I mean the array $skip in configuration.php. This array is for exclude paths on scanner. If the path refers to the last folder, then it is excluded from the search, but if it is not the last folder, path is not excluded.

        Line 109 in scanner.php:

        if (!$iter->isDot() && !(in_array($iter->getSubPath(), $skip)))

        compare the full file path with $skip array. If path is not complete in $skip array return false and is included in the search.

        You know what I mean now?

        Thanks again.

        • Again, your English is fine!

          I have looked at my code again and see that the $skip array contains the names of the subdirectories EXCLUDING THEIR PATH. This is liable to cause problems if you have duplicated subdirectory names across higher directories where only one is to be skipped but that was clearly not anticipated (and could be worked around by renaming one or the other).

          Please remember that the $iter steps down through each branch of the directory structure (from the directory you specify) to get all files in all subdirectories (unless a subdirectory, NOT including its path, is excluded via the $skip array) so each pass is relative to the current directory (examining its files and subdirectories).

          I hope that resolves your problem with parsing your directory structure into manageable segments.



          • Hi Sorry to bring this up yet again…

            1. He’s right, I want to skip scanning a folder (AND all its subfolders) – for example the “cache” folder. It will only skip literally the “cache” folder. BUT I want it to skip the “cache” folder AND all its subfolders….

            Is this possible…? WOULD BE BRILLIANT IF IT COULD as each run produces about 500 cached files that have changed and it’s a nightmare to wade through them.

            2. Could we define an array of ‘search patterns’ as well? This would be great because if there is a specific attach we could search the file to find the patterns defined. This would make this a fantastic product for both file changes AND signature scanning at the same time.

            I hope you get this message as these are 2 very important things that would make a huge difference to this product.

          • Hi Micro,

            The code is an iterator so it should skip any directory (and its subdirectory) if in the list of directories to skip. I’m away right now and unable to look into the iterator function but should be able to next week.

            As for a search pattern within a file, that should be as simple as adding a block of code to search for a match within an array of patterns. As that was not my intention (and speed was a major consideration), I have not done that. If you implement such and can determine that it does not expand the timeline inordinately, I would be happy to add it.



  • Eric S.

    Hi Dk Lynn

    I’m trying to setup your script on one of my servers.

    I have an problem with the $skip variable, no mather what I type in there, it scans it anyway.

    Here are my configs:

    define(“SCAN_PATH”, “/home/”);
    $skip = array(“virtfs”);

    I’ve tried “virtfs”, “/virtfs”, “virtfs/”, “/home/virtfs”, etc.. The scanner always scans /home/virtfs

    Could you tell me what I’m doing wrong ?

    Running on CentOS 7.2 and PHP 5.5.32

    Thanks alot!

  • Kisten Stern


    Thanks for a great script.
    I’m using SuperScan v2. Two questions:
    1. My extension array is as follows – $ext_array = array(‘php’, ‘html’, ‘htm’, ‘js’);

    However, other extensions are being scanned e.g .json, .csv, png etc.

    2. Is it possible to filter a sub folder. I have added the following but it does not work. Only work when I add only a folder name without a path:
    public static $FILTERS = array(
    ‘/media/data/’ // <= Edit THIS line

    • Hi Kisten,

      Thank you for your comments and questions.

      Q1. The config.php line 42 example was

      $ext_array = array(‘php’, ‘html’, ‘htm’, ‘js’);

      IMHO, it’s better to declare $ext_array as an empty array ($ext_array = array();) to allow all files to be examined (except those with extensions included in the $excl_array). This has yielded (for me) changes to error_log, .ftpquota, .htaccess … and any other file types (remember that .htaccess can alter the file handlers for extensions AND that jpg and pdf files are known to be able to carry a malware payload).

      The scanner is supposed to be given EITHER a list of file extensions to examine OR a list of file extensions to exclude. Thus, scanner.php line 97

      if ( (empty($ext_array) && !in_array($ext,$excl_array)) || in_array($ext,$ext_array) )

      compares the file_path extension captured in line 94

      $ext = strtolower(substr($file_path,strrpos($file_path,’.’)+1));

      against the $ext_array OR $excl_array. Using pseudo code:

      (the $ext_array is empty – all files should be examined
      the $ext is NOT in $excl_array)
      the $ext is in the (non-empty) $ext_array
      proceed with the scan (processing the current $file_path).

      If neither condition is met, the scan should “fall through” to line 141 for the next $iter/$file_path:

      } // End of accepted file extension

      I have re-checked that code and it should work as intended.

      Q2. public static $FILTERS = array(

      I believe that the error you’re seeing is that $file_path is ALWAYS a file or subdirectory of the current path ($dir which uses $iter to examine all members of the current $dir). A less convoluted way to say that it that you MUST name your subdirectories carefully because a member of the $FILTERS array can only be matched against the $file_path (file or directory). If a subdirectory shares its name with other subdirectories (at ANY level of your file structure), ALL will be filtered out of the scan (note that a $file_path cannot include a / for the obvious reason that it would represent a subdirectory PLUS its subdirectory). Unfortunately, I had not made this clear in config.php’s comments although my examples only showed a subdirectory name (not path).

      If those responses do not resolve your problems, please clarify here or PM me (not sure that will work from here – use dklynn@dk.co.nz for e-mail).