Alternatives to autoloading

I’m continuing from this thread: http://www.sitepoint.com/forums/showthread.php?t=681987

Basically I asked Lastcraft if there was a reason he didn’t use autoload.

[QUOTE=lastcraft;4616206]It tends to lead to classes having a subtle dependency on the file system. It also tends to distort class naming. When it doesn’t distort the name (a good thing) you need to maintain a global mapping of names to locations and tracking down the code can get mysterious.[/quote]
I agree it’s a dependency on the file system, and it occasionally makes for odd class names. But at the same time, it’s easy to know where to locate a class given its name. As for the global mapping, are you talking about if one chooses not to use the naming convention where an underscore signifies a directory separator? Because otherwise, mapping names to locations and tracking down code is easy.

I’ve always gone with one “pubic” class per file. For other classes which only exist to serve a single “public” class, I’ll also include those in the same file. These are normally exception classes or tightly coupled classes. I also use auto-loading just because it’s the most fail safe. PHP unfortunately cares about the order in which classes are included. So if for example, you have the following class structure…

interface ITest {}
class Test implements ITest {}
class SpecialTest extends ITest {}

…you have to include the class files in order. If you were to include the file containing the declaration of the “SpecialTest” class, you would have to make sure you’ve already included the file containing the “ITest” interface, as otherwise the execution will be aborted. When using auto-load, you ever have to think about this. As bonus, thanks to namespaces, it’s now a lot easier to manage directory structures and auto-loading.

It would be nice if PHP checked for the existence of parent classes/interface when the class is first used, instead of when the class is declared.

I tend to agree with Marcus’ posts a lot, as I think he’s a smart guy. I can’t really relate to the statements referred to above though. Yes, autoloading does generally imply a dependency between the classname and it’s location on the filesystem, but it doesn’t have to.

I use the latter system: I have a file with classname to filename mappings, which will be regenerated once a class can no longer be found by the autoloader while in development mode. This way, my classnames are not mutilated with it’s location, but you still have the ability to load a class without requiring it’s file first.

On the plus side, you’re not including files you don’t necessarily use, and it makes it easier to put the class in another file without having to replace all of the include statements. On the downside though, it is magic, but it is well documented, so I can live with that.

Not a fan of autoload, and definitely not a fan of a single class per file.


$container->registerShared('Foo', function()
  {
      include 'SomeIncludeWithAFooImplementationAndAssociatedClasses.php';

      return new SomeFooImplementation();
  });

Includes on demand, & removes the need for _once().

Don’t think there is any difference performance wise with any scheme, in a production setup.
Using APC, with apc.stat off and perhaps APC lazy loading patch.

APC Lazy Loading patch was included in 3.1.3

Hi…

This is a relatively minor issue, and I’m not about to campaign for the abolition of autoload :).

That’s right, there are two cases. The Very_Long_Class_Name_DumbLoader one and the one where there is a centralised map.

If I were in a situation where my benchmarks told me that autoload was the only way of speeding up a struggling app (I wasn’t allowed a byte code cache?), then I’d probably go for the centralised map, as that’s less intrusive. And even then probably only for the top level classes. That use case has never happened so far, and frankly I don’t think it ever will.

I’m not always the instigator of such schemes though. When I have the Very_Long_ClassNames inflicted on me, it correlates with other problems. Usually cluttered code, overlong methods and classes, too few classes, developers tracking code down with grep, and other inefficiencies.

These are mainly empirical observations, but if you are trying to keep code clean, anything that screws with class names has got to be a step back. I try to rip out every last unnecessary character in my code with a certain amount of zeal.

I’m not against magic per se, but I only use it when the developers are going to use the code long term. I only want to play the magic card a few times though. There is a great cost to casual and occasional users, so I need to get a real win from each incantation. Every change of staff on the project will hurt.

require_once(dirname(FILE) …) is my default until forced to do something else.

yours, Marcus

Hi…

Where it’s instantiated. Someone perusing the code will need to know this as they follow the execution path.

Yes, at the top. And take them out when the instantiation is moved. Makes dependencies super explicit.

This isn’t true. I can drop code with require_once() straight into an app.

It’s redundant.

Then I’m in the middle of a significant refactoring. Probably class relationships and a whole bunch of other stuff is changing too. The least of my work will be changing the require_once() from the caller.

In addition I can change part of the code at a time by duplicating the location. As each require_once is scoped there won’t be a clash unless both versions get loaded at once. __autoload() has no such scoping.

I’ve yet to find it to be an issue. YMMV. If it was, then I would roll out an __autoload() scheme if that was the way forward.

All magic tricks have that property when used in isolation. It’s the weight of “conventions” and “magic” and “configurations” and “environment” and all the other behind the curtains stuff that quietly kills you when it hits a critical level. Autoload isn’t evil per se, but you have to get it right, and you are playing one of your jokers.

yours, Marcus

I bypass auto-loading by using an indirect auto-loader using class path naming conventions similar to Zend. My framework uses a single method on a facade to instantiate all classes and allow Java like package syntax vs. the actual name of the class for readability purposes.


$daoBlog = $this->_facade->getInstance('Component.Blog.Module.DAO.DAOBlog',array($this->_facade));

The advantages of this is that the package path can be modified by the applications facade. Where this comes particularly handy is swapping packages for different sites since by framework is based on managing multiple sites with the same code base.


$indexModule = $this->_facade->getInstance('Site.*.Module.Index',array($this->_facade));

Where the index module exists for every site but is defined slightly differently. Using this method makes it very easy to modify the package path and use dynamic package resolution. In this case swap the * for the directory for site that the domain name is mapped.

The full path is also injected into any object that is a resource of my framework. So that within the class paths relative to its origin can be created similar to a directory structure.


// Inside Index Module
$parent = $this->getPkg('..'); // Site.sitename.Module
$current = $this->getPkg('.'); // Site.sitename.Module.Index
$grandParent = $this->getPkg('../..'); // Site.sitename

That becomes pretty handy for instantiating sub modules relative to a base class.


// Inside Index Module
$indexBlogList = $this->_facade->getIstance(
    "{$this->getPkg('.')}.BlogList" // returns Site.sitename.Module.Index.BlogList
    ,array($this->_facade)
); // returns instance of Site.sitename.Module.Index.BlogList

Also switching between windows and apache servers becomes very easy considering there is one central method that imports the actual class files mapped to the abstracted class package paths.

This.

I use autoload in combination with namespaces and set_include_path(). This way, I avoid long class names which contain directory structures. Essentially any directory which can contain files is added to the include path. Any plugin in the system can either add its paths to the include path or perform custom actions (e.g. their own mapping array).

I haven’t measured performance, but it’s native code finding the correct directory so shouldn’t be overly slow. Though it’s certainly slower than explicitly requiring a file, of course.

I have several issues with explicit require_once calls.

Where do you put it? I may want to create a User object in several places? Do I put a require_once in every place a user object is initiated? or do I put it before the class definition of the class it’s used in? If so this gets messy, I include the controller file, which includes any model used by it, which includes anything that user model may ever need to use and anything those files may ever want to use. Putting it before “new User()” is simply messy imho. Why should an arbitrary method be including files? It’s not very OO giving a method more than one responsibility.

It breaks portability. If I move my application from an apache server to a windows server there may be issues. I hope you’ve used DIRECTORY_SEPARATOR everywhere! What if your directory structure has to change?

Performance This isn’t a huge issue imho, but it’s a lot more difficult to stop more files getting included than are actually used in the application. Secondly, you will certainly end up with multiple require_once calls to the same file. require_once is quite expensive, __autoload avoids this because it only ever gets called once per class.

Using __autoload is magic, however it puts all the responsibility for loading files in a single location, rather than dispersed throughout all your files. In my opinion, the benefits of using it hugely outweigh the minimal downsides.

If you go the 1 class-1 file route, the file feels kinda redundant.

Redundant? On the contrary I think. It results in less redundancy, or at least less overhead.

I’m curious: Is the overhead of managing of the mapping file a lot less than including what you don’t need? Did you do any benchmarking on that

In my experience yes. The more atomic you can make a file, the less overhead is a direct result.

If you keep several classes in one file there will almost always come a time where you need only a single class but including the file results in bringing in extra baggage.

I’m imagining that the mapping file could grow quite large on a somewhat large project?

If you insist on going with the mapping file, sure of course it could. The alternative is to base your class names on some convention to automate lookup and loading. Much like Zend or PEAR.

There was a time (several years ago) where I believed that having a naming scehme like PEAR or Zend was bad practice because of the coupling between directory structure and class name, one dictating the other or vis-versa. However after experimenting with both I eventually determined the benefits out-weighed the negatives.

It does require more though and planning go into your directory structure, but then again, that is not nessecarily a bad thing… It simply means you cannot create files willy-nilly like most developers seem to do.

IMO it forces cleaner code base and directory structure.

In the end, I don’t like the magic involved. An include/require[_once] makes things nice and clear, it’s explicit

That was part of my argument as well, however, implicit is not evil in it’s own, magic is only bad when it’s just that, magic. If others cannot understand how classes are bring loaded then I would say its a bad thing. However if you clearly document or make clear to new developers how an automation happens, then it no longer is magic, its convention.

Cheers,
Alex

Another benefit from using autoload is performance optimization called ‘compilation’. In this case we implode files used most often in one single file, thus eliminate numerous includes.

If we use require_ince instead we’ll have to do something with reqiure_once directives after such implode.

Definitely with you on that one: I think having units of classes in one file is easier to work with than having one class per file. It is personal preference though.

You can obviously divide the load into multiple files if you want to. In one of my bigger projects, the mappings file did get a bit large and I decided to create a mapping file for each module of the application. Nevertheless, the size of the mapping file doesn’t really matter that much, as it’s auto-generated, so you don’t look at it all too much.

I haven’t done any benchmarking, no. It might be a faster to simply include or require the file, but the obvious upside of being able to move a class into another file and adding classes to files without adjusting any other code is far more important to me than the milliseconds that you could save by using a different strategy for loading your classes.

There’s not a lot of managing to do on the mapping file. It’s automatically generated by a simple script that uses a tokenizer to find the class-names from each file. You’d only use that in development, obviously. Anyway, again: it comes down to personal preference. I like the way I work, you like your way, and Pear_Or_Zend_People like their way. They all accomplish the loading of classes, so this is rather trivial, I’d imagine.

I would think so, but as I don’t use the pear standard, I’m unable to tell if that brings up any problems. I know Zend Framework will start using namespaces from now on, so Zend_Controller_Front becomes Zend\Controller\Front, located in /Zend/Controller/Front.php. I also know they’ve ran into some problems with reserved keywords :slight_smile:

Well only if you put 1 class in 1 file, no? Personally, I’m not too keen on that, I think the file is a good unit to put related things together. If you go the 1 class-1 file route, the file feels kinda redundant.

I’m curious: Is the overhead of managing of the mapping file a lot less than including what you don’t need? Did you do any benchmarking on that? I’m imagining that the mapping file could grow quite large on a somewhat large project?

In the end, I don’t like the magic involved. An include/require[_once] makes things nice and clear, it’s explicit, only what’s needed is included and there’s no need to manage a (large?) mapping file. Further more, you can put related stuff in the same file, avoiding excessive includes.

Also, since I do not use class loading, I’m wondering if the usage of namespaces would not complicate the various class loading mechanisms (especially the 1-class-1-file-underscore-directory-separator ones)?