What’s your plan for __autoload()?

Tweet

Of all magic in PHP I probably like the __autoload() hook the most. It saves a good deal of tedious script inclusion calls and may drastically speed up your application by saving the parser from doing unnecessary work. Allthough it has been around since the release of PHP5, I haven’t found any convincing applications for it yet. Most of them follow the same scheme: Whenever an undefined class is being instantiated, a little __autoload() function tries to include a PHP file, which has to be named after it’s class:


  function __autoload($name) {
    require_once('classes/'.$name.'.php'); 
  }

However, this solution is inflexible and has some drawbacks. The most obvious one is the class name -> filename constraint. Furthermore, it implies that all class files are to be stored in one single folder. That is no option for projects with several hundert classes, naturally ordered in package directories. Overall, these implementations seem not quite mature but rather a proof of concept for __autoload(). This calls for a better solution.

Loading smartly

What if we had a little ‘class finder’, which searched directories recursively for PHP scripts and parsed each one of them for class definitions? One, that would be aware of all classes in those folders and tell us which file to find them in. We could combine this with __autoload() to help it find any required class by itself. “That’s silly, it would produce way to much overhead”, you might say. That’s correct! So what if we would cache the results after each search, knowing that the file structure rarely changes, unless a developer is working on it? Of course, I’m talking about the best way of caching, thus generating the class list as PHP code and saving it for later use.

No sooner said than done. I wrote a class which implements this idea and pretentiously called it “SmartLoader”, as it’s smart enough to find any class of your PHP application without any help. You can download it here under the Lesser General Public License. Now let’s take a closer look on how it works:

Behind the scenes

Step 1: SmartLoader recursively searches for PHP scripts and parses them for containing classes with the following regular expression:

(interface|class)s+(w+)s+(extendss+(w+)s+)?(implementss+w+s*(,s*w+s*)*)?{

Step 2: We now have a list of all available classes and where to find them. This list will be converted to PHP code and written to a cache file. It’s contents will look similar to this:


// this is a automatically generated cache file.
$GLOBALS['smartloader_classes']['Main'] = 'classes/main.class.php';
$GLOBALS['smartloader_classes']['Iterable'] = 'classes/containers/iterable.class.php';
$GLOBALS['smartloader_classes']['ActiveRecord'] = 'classes/database/activerecord.class.php';
/* etc. */

Step 3: Whenever a class needs to be loaded, SmartLoader checks the cache for it’s name and includes the appropriate PHP file. In case the class cannot be found or loaded, the cache is recreated and there will be another inclusion attempt.

Getting ready

smartloader.class.php contains the autoload function as well as the actual SmartLoader class. The only thing that has to be done is to customize the autoload method:


  function __autoload($class_name) {
    /* using a static loader object rather than a singleton to reduce overhead */
    static $ldr = null;
 
    /* initializing loader */
    if(!$ldr) {
      $ldr = new SmartLoader();
    }
 
      /* defining cache file, make sure write permissions */
      $ldr->setCacheFilename('cache/smartloader_cache.php');
 
      /* adding directories to parse. better use absolute paths. */
      $ldr->addDir("classes");
 
      /* what are the endings of your class files? */
      $ldr->setClassFileEndings(array('.php', '.class'));
 
      /* should SmartLoader follow symbolic links? */
      $ldr->setfollowSymlinks(false);
 
      /* it should probably ignore hidden dirs/files */
      $ldr->setIgnoreHiddenFiles(true);
    }
 
    /* load the class or trigger some fatal error on failure */
    if(!$ldr->loadClass($class_name)) {
      trigger_error("Cannot load class '".$class_name."'", E_USER_ERROR);
    }
  }

After that you just need to include smartloader.class.php in your scripts and never worry about class includes anymore.

Advantages

  • Convenience: SmartLoader simplifies class file management. You can rename them, move them around or reorganize them in (package) folders. As long as they are in your webspace, SmartLoader can find them.
  • Speed: With SmartLoader, class files are only loaded when they are actually needed. This approach called “lazy loading” or “just in time” (not to confuse with interactive programming) can save PHP a lot of parsing and compiling caused by redundant inclusions.
  • Portability, Backwards Compatibility: You can easily adopt SmartLoader without breaking your existing applications. __autoload() only gets involved when a non-existent class is being instantiated by default.

Potential pitfalls

  • Error handling: You may use any flavor of error handling you like. Well, almost: Exceptions can’t raise through __autoload(). Remember, that’s a feature, not a bug. It is possible though with a scary eval() hack but I won’t go further into that.
  • Version control systems: If your webspace is a working copy of some version control system you should probably tell SmartLoader to ignore hidden files. This will improve it’s performance and prevent it from scanning those spicious outdated copies of files sitting in the .SVN/.CVS folders.
  • I shouldn’t be telling you that, but don’t let SmartLoader scan symlink loops.
  • Profiling: Zend Profiler gets confused by SmartLoader and starts calculating invalid results for it’s execution time.
  • Opcode caches: I don’t know the inner workings of APC, Zend Optimizer and the likes well enough, but I can imagine that dynamic inclusions as used in SmartLoader negate the performance gained by opcode caching. I haven’t been able to verify this, since Zend Profiler refuses to work properly with SmartLoader (see above). I’d be happy to receive any feedback on this matter.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Vennie

    Impressive first contribution. I didn’t see much use for the autoload method before, but I’ll surely give this a shot.

  • http://www.phppatterns.com HarryF

    Interesting approach. On the one hand find __autoload a little dubious – suspect it may lead to security concerns. One possible scenario came up here.

    On the other, calls to require_once, even when a file has been included, can get expensive and it looks like this approach would really help reduce that.

  • Eko Budi Setiyo

    To Me __autoload is “very usefull” specially when combine with include path class. I create include_path class that treat the include path look as “normal php variable”

    Regards
    Eko Budi Setiyo

  • http://www.phpism.net Maarten Manders

    To Me __autoload is “very usefull” specially when combine with include path class. I create include_path class that treat the include path look as “normal php variable”

    Eko Budi Setiyo, thanks for your comment! However, I had to strip the source code to keep things clear and easy to read. You should probably provide it for download and place a link to it. Thank you for understanding.

  • http://www.cyberlot.net cyberlot

    Nice code, the only real problem I see is you have to edit the class/file itself to change the settings.

    I would split this into 2 seperate files, A class file and a “example”
    file.

  • http://www.cyberlot.net cyberlot

    Oh and a note to HarryF, this method is basically using a “whitelist” of valid classes so it would be a lot hard to exploit then a generic autoload method.

  • http://www.phpism.net Maarten Manders

    Nice code, the only real problem I see is you have to edit the class/file itself to change the settings.

    Very true. I just put those two together for this blog post.

  • Piotrek

    Hi,

    The autoloading feature is really neat. I’ve implemented it in my simple framework (you can find it here).

    It works like this:
    – __autoload() function is called,
    – Autoloader class scans all php files in the directory tree and generates a map of classes along with their location on a disk. This operation is performed only when you need to add new files to an application.
    – if a class had been declared in the map – it will be loaded. Otherwise, an exception is thrown.

    There’s no need to use require/include in the application (actually there’s only 2 includes in the whole code).

  • http://www.phpism.net Maarten Manders

    Oh and a note to HarryF, this method is basically using a “whitelist” of valid classes so it would be a lot hard to exploit then a generic autoload method.

    That’s security by obscurity which is not good. :-)

    Where that’s a problem is someone could “inject” class names via the serialized string and PHP’s unserialize will create an object from them, meaning the class constructor get’s executed.

    The loader just makes classes available for instantiation. Which class exactly gets instanciated, should be taken care of by the application. Harry, isn’t there a way to sign (parts of the) client-server communication to prevent abuse? Otherwise there’s always the possibility to exclude sensitive classes from being scanned.

  • http://www.cyberlot.net cyberlot

    That’s security by obscurity which is not good.

    Your method basically creates a “whitelist” on the fly valid classes making it impossible to inject a class to be auto loaded after the fact.

    I wouldn’t call that security by obscurity.. Obscurity means your trying to hide the method your using to secure something based on the fear that by knowning that said method it would be easier to exploit your code.

    Whitelists are a good thing, I use whitelists all over, right down to the SSH connection to my boxes ( a whitelist of valid ips to connect from ). There is nothing Obscure about that.

    I think what harry is talking about is the ability to inject a “non-legal” class name that might force the auto loading of a file you do not wish to be loaded to begin with. By creating a list of valid classes at runtime ( IE a whitelist ) You prevent such attacks from happening in your code.

  • http://www.lopsica.com BerislavLopac

    I have a Classloader singleton, where you can register your library and which will __autoload ClassName.class.php files whenever you need a ClassName class. That way you have control over what libraries are used (sorta kinda like Python’s import) while still not actually loading classes util they’re needed. Libraries are registered using the oft neglected include_path. If a class doesn’t exist, a MissingClassException is thrown.

  • http://www.phppatterns.com HarryF

    I’m coming from the angle that if end users can make your code do things you hadn’t planned, it’s “bad”. __autoload itself isn’t evil but someone is surely going to do something like this one day;

    $class = $_GET['class'];
    $obj = new $class;

    It then depends what they’re doing inside autoload – if they’re also including files, perhaps the file also contains other procedural logic that also get’s executed. Considering this;

    function __autoload($name) {
    require_once(‘classes/’.$name.’.php’);
    }

    What if $name has the value ‘../admin/resetdb’ ?

  • http://www.cyberlot.net cyberlot

    function __autoload($name) {
    require_once(’classes/’.$name.’.php’);
    }

    What if $name has the value ‘../admin/resetdb’ ?

    Which is why the ideas in this method are so usefull, the above exploit is just not possible.

    But in the end it all comes down to proper validation and escaping of data, the danger here in reality is not the __autoload but instead the poor use of require_once and lack of proper filtering or some sort of whitelist.

  • http://www.phppatterns.com HarryF

    But in the end it all comes down to proper validation and escaping of data, the danger here in reality is not the __autoload but instead the poor use of require_once and lack of proper filtering or some sort of whitelist.

    Agreed.

  • warjockey

    I just have an array that specifies the path to each class inside __autoload().
    Besides I like naming my files after the class name, it’s a java standard if you have used that before.

  • http://www.realityedge.com.au mrsmiley

    [quote]$class = $_GET[’class’];
    $obj = new $class;

    What if $name has the value ‘../admin/resetdb’ ? [/quote]

    Wouldn’t that cause a parse error because you cant have . or / in the class name? It shouldn’t even get to the point of running the class, unless I’ve misunderstood the php parser.

  • Nikos

    Here is the __autoload I personnaly use. It’s fonderfull and very short :o)

    function __autoload($classname)
    {
    # class 'abc_def_ghi' is searched in 'class/abc/def/ghi.php'
    $file = 'class/' . str_replace('_', '/', $classname) . '.php';

    # Here is the good trick :
    # The class are searched according to php.ini's include_path
    # the fopen(..., ..., true) thing could be equivalent to file_exists(),
    # but works with include_path (file_exists does'nt do that)
    # it's also better than just "@include $file", because then
    # I couldn't see any error message happening in $file...
    if ($fp = @fopen($file, 'r', true))
    {
    fclose($fp);
    include $file;
    }
    }

  • http://www.realityedge.com.au mrsmiley

    The problem is and always has been the performance of the stat lookups the engine does when trying to detirmine if a file is there or not. I heard a while ago that there was supposed to be a better file stat cache built into the zend engine to try and combat this problem, but I would guess that such a cache is only really useful (depending on how its implemented) when run in a sapi/module environment as opposed to cgi. Having a pre-parsed mapping of class name to file location is the ideal method as it performs the best. Couple that with the appropriate validation, and it should be an excellent solution.

  • Heimi

    Forgive me if this sounds stupid, but how is this any different from doing


    // Stuff...
    require_once 'thing1.php';
    $thing1 = new Thing1();
    // More Stuff...
    require_once 'thing2.php;
    $thing2 = new Thing2();

  • Ben

    It isnt.

    But you don’t need the require statements if you use this method. A useful thing when you have a large app/site with loads of classes.

  • Etnu

    I find __autoload invaluable, because it allows for a great deal of flexibility when it comes to moving libraries around, as well as helping reduce the clutter of all the include statements.

    I usually tweak autoload just slightly to do the following:

    
    GLOBAL_LIB_DIR.'Classes/Class1.php',
    'Class2'=>GLOBAL_LIB_DIR.'Classes/Class2.php',
    'Interface1'=>GLOBAL_LIB_DIR.'Interfaces/Interface1.php',
    );
    

    and so on and so forth.

    This allows me to easily define where any given class / set of classes will reside. I generally try to keep everything in a common lib folder, but within that you’ll still want to break down into subfolders for different “packages”.

  • http://www.freelanceninja.net/ coffee_ninja

    <devils_advocate>

    Harry had stated toward the beginning of this thread that

    …calls to require_once, even when a file has been included, can get expensive and it looks like this approach would really help reduce that.

    But how much cheaper is a call to __autoload() that searches the directory structures in the include path for some generic file name? Aren’t you essentially mimicking the functionality that require, require_once, include and include_once already provide (that is, searching the include path for the desired file)? It would be interesting to test and compare these two methods.

    I worry that many programmers are going to use __autoload() as a wrapper to require_once() and add accomplish nothing but add complexity to their applications. The only time I can see a benefit to using __autoload() is in the event that an object must be instantiated whose class isn’t known until runtime. And even then you’d need to be extremely careful for the security reasons mentioned above.

    </devils_advocate>

  • SuperBetaTester

    Little bug:
    SmartLoader do not caches classes when the bracket is tied to the classname:
    class ClassName2 extends ClassName2{
    ClassName2 won’t be cached. Some libraries use this (‘bad’) coding standard.

  • dasluq [at( gmail )d0t] com

    another small bug:

    SmartLoader fails with an error if the class was not found. However, it may have been called through class_exists(), in which case a non-existent class name is perfectly OK as an argument.

    IMHO the trigger_error is not necessary, so I just stripped it, replacing

    if(!$ldr->loadClass($class_name)) {
    trigger_error("SmartLoader: Cannot load class '".$class_name."'", E_USER_ERROR);
    }

    by


    return $ldr->loadClass($class_name);

  • Robert Schmelzer

    I just implemented your Smartloader into my project. Very fine stuff. Good work.

    But I encountered a bug:

    The preg_match_all statement did not work in my PHP 5.1.2 distribution. The result array was empty. So I had to change it to the following:

    if($buf = fread($php_file, filesize($file_path))) { $result = array();
    if(preg_match_all(“%(interface|class)s+(w+)s+(extendss+(w+)s+)?(implementss+w+s*(,s*w+s*)*)?{%”, $buf, $result)) {
    foreach($result[2] as $class_name)

  • Pingback: SitePoint Blogs » SmartLoader Reloaded

  • Kristian

    Nice work, I tested it online but am getting this weird error: Warning: fread() [function.fread]: Length parameter must be greater than 0 in /home/public_html/dev/includes/smartloader.class.php on line 299

    any idea why?