Handling Collections of Aggregate Roots – the Repository Pattern

One of the most typical aspects of traditional Domain-Driven Design (DDD) architectures is the imperative persistence agnosticism exposed by the Domain Model. In more conservative designs, including several implementations based on Active Record or Data Table Gateway (which in pursuit of a rather deceiving simplicity often end up poisoning domain logic with infrastructure), there’s always an explicit notion of an underlying storage mechanism living and breathing down the line, usually a relational database. Domain Models on the other hand are conceptually designed from the beginning in a rigid “storage-unaware” nature, thus shifting any persistence logic out of their boundaries. Even considering that DDD is somewhat elusive when it comes to making a direct reference to the “database,” in the real world there most likely will be at least one sitting behind the scenes since the Domain Model must ultimately be persisted in one form or another. It’s pretty usual therefore to have a mapping layer deployed somewhere between the Model and the Data Access layer. Not only does this actively push for maintaining a decent level of isolation between each layer, but it shields every complex detail involved in moving domain objects back and forward across the seams of the layers in question from the client code. Mea culpa aside, it’s fair to admit dealing with the oddities of a layer of Data Mappers is quite a burden, often dropped into a “code once/use forever” strategy. Even though, the above schema performs decently well in fairly simplistic conditions where there are just a few domain classes handled by a small number of mappers. The situation can become a lot more awkward however when the model starts to bloat and increase in complexity, since additional mappers will be surely added over time. This shows in a nutshell that opening the doors of persistence ignorance when working with rich Domain Models, composed of several complex aggregate roots, can be quite difficult to accomplish in practice, at least without having to create expensive object graphs in multiple places or treading the sinful path of duplicated implementations. Worse, in large systems that need to pull expensive collections of aggregate roots from the database that match different criteria, the whole query process can be on its own an active, prolific promoter of this flawed duplication when not properly centralized through a single entry point. In such convoluted use cases, the implementation of an additional abstraction layer, commonly known in DDD parlance as a Repository, which mediates between the Data Mappers and the Domain Model, can effectively help to reduce query logic duplication to a minimum while exposing onto the Model the semantics of a real in-memory collection. Unlike mappers, though, which are part of the infrastructure, a repository characterizes itself as speaking the model’s language, as it’s intimately bound to it. And because of its implicit dependency on the mappers, it preserves the persistence ignorance as well, therefore providing a higher level of data abstraction, much closer to the domain objects. It’s sad but true the benefits a repository brings to the table can’t be so easily realized for every single application that might exist out there, hence its implementation is only worthwhile if the situation warrants. Anyway, it’d be pretty informative to build a small repository from scratch so that you can see its inner workings and unveiling what’s actually beneath its rather esoteric shell.

Doing some Preliminary Groundwork

The process of implementing a repository can be pretty complex, because it actually hides all the nuts and bolts of injecting and handling the Data Mappers behind a simplified collection-like API, which in turn also inject some kind of persistence adapter, and so on. This successive injection of dependencies, coupled to the hiding of extensive logic, explains why a repository is often considered a plain Façade, even when some opinions currently diverge from that concept. In either case, the first step that we should take to get a functional repository up and running is create a basic Domain Model. The one that I plan to use here will be charged with the task of modelling generic users, and its bare-bones structure looks like this:

<?php
namespace Model;

interface UserInterface
{
    public function setId($id);
    public function getId();
    
    public function setName($name);
    public function getName();
    
    public function setEmail($email);
    public function getEmail();
    
    public function setRole($role);
    public function getRole();
}

<?php
namespace Model;

class User implements UserInterface
{
    const ADMINISTRATOR_ROLE = "Administrator";
    const GUEST_ROLE         = "Guest";
    
    protected $id;
    protected $name;
    protected $email;
    protected $role;

    public function __construct($name, $email, $role = self::GUEST_ROLE) {
        $this->setName($name);
        $this->setEmail($email);
        $this->setRole($role);
    }
    
    public function setId($id) {
        if ($this->id !== null) {
            throw new BadMethodCallException(
                "The ID for this user has been set already.");
        }
        if (!is_int($id) || $id < 1) {
            throw new InvalidArgumentException(
                "The user ID is invalid.");
        }
        $this->id = $id;
        return $this;
    }
    
    public function getId() {
        return $this->id;
    }
    
    public function setName($name) {
        if (strlen($name) < 2 || strlen($name) > 30) {
            throw new InvalidArgumentException(
                "The user name is invalid.");
        }
        $this->name = htmlspecialchars(trim($name), ENT_QUOTES);
        return $this;
    }

    public function getName() {
        return $this->name;
    }

    public function setEmail($email) {
        if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
            throw new InvalidArgumentException(
                "The user email is invalid.");
        }
        $this->email = $email;
        return $this;
    }
    
    public function getEmail() {
        return $this->email;
    }
    
    public function setRole($role) {
        if ($role !== self::ADMINISTRATOR_ROLE
            && $role !== self::GUEST_ROLE) {
            throw new InvalidArgumentException(
                "The user role is invalid.");
        }
        $this->role = $role;
        return $this;
    }
    
    public function getRole() {
        return $this->role;
    }
}

In this case in particular, the Domain Model is a pretty skeletal layer, barely above a plain data holder capable of validating itself, which defines through just a segregated interface and a banal implementer the data and behavior of some fictional users. To keep things uncluttered and easy to understand, I’m going to keep the model that thin. With the model already going about its business in relaxed isolation, let’s make it a little bit richer by adding to it an additional class, responsible for handling collections of user objects. This “addendum” component is just a classic array wrapper implementing the Countable, ArrayAccess and IteratorAggregate SPL interfaces:

<?php
namespace ModelCollection;
use MapperUserCollectionInterface,
    ModelUserInterface;
    
class UserCollection implements UserCollectionInterface
{
    protected $users = array();
    
    public function add(UserInterface $user) {
        $this->offsetSet($user);
    }
    
    public function remove(UserInterface $user) {
        $this->offsetUnset($user);
    }
    
    public function get($key) {
        return $this->offsetGet($key);
    }
    
    public function exists($key) {
        return $this->offsetExists($key);
    }
    
    public function clear() {
        $this->users = array();
    }
    
    public function toArray() {
        return $this->users;
    }
    
    public function count() {
        return count($this->users);
    }
    
    public function offsetSet($key, $value) {
        if (!$value instanceof UserInterface) {
            throw new InvalidArgumentException(
                "Could not add the user to the collection.");
        }
        if (!isset($key)) {
            $this->users[] = $value;
        }
        else {
            $this->users[$key] = $value;
        }
    }
    
    public function offsetUnset($key) {
        if ($key instanceof UserInterface) {
            $this->users = array_filter($this->users,
                function ($v) use ($key) {
                    return $v !== $key;
                });
        }
        else if (isset($this->users[$key])) {
            unset($this->users[$key]);
        }
    }
    
    public function offsetGet($key) {
        if (isset($this->users[$key])) {
            return $this->users[$key];
        }
    }
    
    public function offsetExists($key) {
        return ($key instanceof UserInterface)
            ? array_search($key, $this->users)
            : isset($this->users[$key]);
    }
    
    public function getIterator() {
        return new ArrayIterator($this->users);
    }
}

In fact, placing this array collection within the model’s boundaries is entirely optional, as pretty much the same results can be yielded by using a plain array. In this case however, by relying on a standalone collection class makes easier to access sets of user objects fetched from the database through an object-oriented API. In addition, considering that the Domain Model must be entirely ignorant about the underlying storage set down in the infrastructure, the next logical step that we should take is implement a mapping layer that keeps it nicely separated from the database. Here are the elements that compose this tier:

<?php
namespace Mapper;
use ModelUserInterface;

interface UserCollectionInterface extends Countable, ArrayAccess, IteratorAggregate 
{
    public function add(UserInterface $user);
    public function remove(UserInterface $user);
    public function get($key);
    public function exists($key);
    public function clear();
    public function toArray();
}

<?php
namespace Mapper;
use ModelRepositoryUserMapperInterface,  
    ModelUser;

class UserMapper implements UserMapperInterface
{    
    protected $entityTable = "users";
    protected $collection;

    public function __construct(DatabaseAdapterInterface $adapter, UserCollectionInterface $collection) {
        $this->adapter = $adapter;
        $this->collection = $collection;
    }
    
    public function fetchById($id) {
        $this->adapter->select($this->entityTable,
            array("id" => $id));
        if (!$row = $this->adapter->fetch()) {
            return null;
        }
        return $this->createUser($row);
    }
    
    public function fetchAll(array $conditions = array()) {
        $this->adapter->select($this->entityTable, $conditions);
        $rows = $this->adapter->fetchAll();
        return $this->createUserCollection($rows);
        
    }
    
    protected function createUser(array $row) {
        $user = new User($row["name"], $row["email"],
            $row["role"]);
        $user->setId($row["id"]);
        return $user;
    }
    
    protected function createUserCollection(array $rows) {
        $this->collection->clear();
        if ($rows) {
            foreach ($rows as $row) {
                $this->collection[] = $this->createUser($row);
            }
        }
        return $this->collection;
    }
}

Out of the box, the batch of tasks performed by UserMapper

are fairly straightforward, limited to just exposing a couple of generic finders which are charged with pulling in users from the database and reconstructing the corresponding entities through the createUser() method. Moreover, if you’ve already sunk your teeth into a few mappers before, or even written your own mapping masterpieces, surely the above should be pretty easy to understand. Quite possibly the only subtle detail worth stressing is that the UserCollectionInterface has been placed into the mapping layer, rather than in the model’s. I decided to do so pretty much deliberately in this case, as that way the abstraction (the protocol) that the user collection depends on is explicitly declared and owned by the higher-level UserMapper, in consonance with the guidelines promoted by the Dependency Inversion Principle. With the mapper already set, we could just consume it right out of the box and pull in a few user objects from storage to get the model hydrated in a snap. While at first glance this would seem to be the right path to pick up indeed, in fact we’d be unnecessarily polluting application logic with infrastructure, as the mapper is effectively a part of it. What if down the road it becomes necessary to query user entities according to more distilled, domain-specific conditions, other than just the blanket ones exposed by the mapper’s finders? In such cases, there would be a real need to place an additional layer on top of the mapping one, which not only would provide a higher level of data access, but it would carry chunks of query logic through one single point. This is, in the last instance, the wealth of benefits we’d expect to get from a repository.

Implementing a User Repository

In production, repositories can implement under their surface pretty much every thing one can think of in order to expose onto the model the illusion of an in-memory collection of aggregate roots. Nevertheless, in this case we just can’t be so naive and expect to enjoy of such expensive luxuries for free, since the repository that we’ll be building will be a pretty contrived structure, responsible for fetching users from the database:

<?php
namespace ModelRepository;

interface UserMapperInterface
{
    public function fetchById($id);
    public function fetchAll(array $conditions = array());
}

<?php
namespace ModelRepository;

interface UserRepositoryInterface
{
    public function fetchById($id);
    public function fetchByName($name);
    public function fetchbyEmail($email);
    public function fetchByRole($role);
}

<?php
namespace ModelRepository;

class UserRepository implements UserRepositoryInterface
{
    protected $userMapper;
    
    public function __construct(UserMapperInterface $userMapper) {
        $this->userMapper = $userMapper;
    }
    
    public function fetchById($id) {
        return $this->userMapper->fetchById($id);
    }
    
    public function fetchByName($name) {
        return $this->fetch(array("name" => $name));
    }
    
    public function fetchByEmail($email) {
        return $this->fetch(array("email" => $email));
    }
    
    public function fetchByRole($role) {
        return $this->fetch(array("role" => $role));
    }
    
    protected function fetch(array $conditions) {
        return $this->userMapper->fetchAll($conditions);
    }
}

Although sitting on top of a somewhat lightweight structure, the implementation of UserRepository is pretty intuitive considering that its API allows it to pull in collections of user objects from storage that conform to refined predicates which are closely related to the model’s language. Furthermore, in its current state, the repository exposes just some simplistic finders to client code, which in turn exploit the functionality of the data mapper to gain access to the storage. In a more realistic environment, a repository should have the capability of persisting aggregate roots as well. If you’re in the mood to pitch an insert() method or something else along that line to UserRepository

, feel free to do so. In either case, one effective manner to catch the actual advantages of using a repository is by example.

<?php
use LibraryLoaderAutoloader,
    LibraryDatabasePdoAdapter,
    MapperUserMapper,
    ModelCollectionUserCollection,
    ModelRepositoryUserRepository;

require_once __DIR__ . "/Library/Loader/Autoloader.php";
$autoloader = new Autoloader;
$autoloader->register();

$adapter = new PdoAdapter("mysql:dbname=users", "myfancyusername", "mysecretpassword");
$userRepository = new UserRepository(new UserMapper($adapter, 
    new UserCollection()));

$users = $userRepository->fetchByName("Rachel");
foreach ($users as $user) {
    echo $user->getName() . " " . $user->getEmail() . "<br>";
}

$users = $userRepository->fetchByEmail("username@domain.com");
foreach ($users as $user) {
    echo $user->getName() . " " . $user->getEmail() . "<br>";
}

$administrators = $userRepository->fetchByRole("administrator");
foreach ($administrators as $administrator) {
    echo $administrator->getName() . " " . 
        $administrator->getEmail() . "<br>";
}

$guests = $userRepository->fetchByRole("guest");
foreach ($guests as $guest) {
    echo $guest->getName() . " " . $guest->getEmail() . "<br>";
}

As noted previously, the repository effectively interchanges business terminology with client code (the so-called Ubiquitous Language coined by Eric Evans in his book Domain Driven Design), rather than a lower-level, technical one. Unlike the ambiguity present in the data mapper’s finders, the repository’s methods on the other hand describe themselves in terms of “name,” “email,” and “role,” which are certainly a part of the attributes that model user entities. This distilled higher level of data abstraction, along with the set of full-fledged capabilities required when it comes to encapsulating query logic in complex systems, are certainly among the most compelling reasons which make using repositories appealing in multi-tiered design. Of course, most of the times there’s an implicit trade-off between getting those benefits up front and going through the hassle of deploying an additional abstraction layer, which in more modest applications may be bloated overkill.

Closing Thoughts

Being one of the central concepts of Domain Driven Design, repositories can be found in applications written in several other languages, like Java and C#, just to name a few. In PHP however, they’re still relatively unknown, just making their first shy steps in the world. Despite this, there are some well-trusted frameworks, such as FLOW3 and of course Doctrine 2.x, which will help you embrace the DDD paradigm. As with any development methodology out there, you don’t have to use repositories in your applications or even smash them unnecessarily with the pile of concepts sitting behind DDD. Just use common sense and pick them up only when you think they’re going to fit your needs. It’s really just that simple. Image via Chance Agrella / Freerangestock.com

Frequently Asked Questions (FAQs) about Handling Collections of Aggregate Roots

What is an Aggregate Root in Domain-Driven Design?

In Domain-Driven Design (DDD), an Aggregate Root is a cluster of associated objects that are treated as a single unit. These objects are bound together by a root entity, also known as the Aggregate Root. The Aggregate Root maintains the consistency of changes being made within the aggregate by forbidding external objects from holding references to its members.

How does an Aggregate Root differ from regular entities?

The main difference between an Aggregate Root and regular entities lies in their responsibilities. While regular entities encapsulate behavior and state, an Aggregate Root additionally ensures the integrity of the entire aggregate by controlling access to its members. It’s the only member of the aggregate that outside objects are allowed to hold references to.

How do I identify an Aggregate Root in my domain model?

Identifying an Aggregate Root requires a deep understanding of the business domain. It’s typically a high-level entity that has a global identity and encapsulates other entities and value objects. For example, in an e-commerce domain, an Order could be an Aggregate Root that encapsulates line items and shipping information.

How should I handle collections of Aggregate Roots?

Handling collections of Aggregate Roots can be challenging. It’s important to remember that each Aggregate Root is a consistency boundary, so changes to one should not affect others. Therefore, when dealing with collections, it’s often best to load and persist each Aggregate Root separately to maintain consistency.

Can an Aggregate Root reference another Aggregate Root?

Yes, an Aggregate Root can reference another Aggregate Root, but it should do so by identity only. This means it should not hold a direct reference to the other Aggregate Root object, but rather its ID. This helps to maintain the consistency boundary of each Aggregate Root.

How does an Aggregate Root relate to a Repository in DDD?

In DDD, a Repository provides methods to retrieve and store Aggregate Roots. It abstracts the underlying storage mechanism, allowing the domain model to remain ignorant of the details of data persistence. Each Aggregate Root typically has its own Repository.

What is the role of an Aggregate Root in enforcing business rules?

An Aggregate Root plays a crucial role in enforcing business rules. It ensures that all changes to the aggregate leave it in a valid state. This means that any business rule that spans multiple entities or value objects should be enforced by the Aggregate Root.

How does an Aggregate Root contribute to reducing complexity in a domain model?

By acting as a consistency boundary and controlling access to its members, an Aggregate Root helps to reduce complexity in a domain model. It simplifies the model by providing a single point of interaction for each aggregate, making it easier to reason about the system.

Can an Aggregate Root be part of more than one aggregate?

No, an Aggregate Root should not be part of more than one aggregate. This would violate the consistency boundary of the aggregates and could lead to inconsistencies in the domain model.

How should I handle concurrency issues with Aggregate Roots?

Concurrency issues with Aggregate Roots can be handled using various strategies, such as optimistic locking or pessimistic locking. The choice of strategy depends on the specific requirements of your application and the nature of the concurrency issues you are facing.