Repository Design Pattern Demystified

Amit Gupta
Tweet

What is the Repository Design Pattern?

To put it simply, it is an implementation of a brokering layer between the application and a data source. Neither party needs to be be aware of the other to perform their respective jobs which allows us to have a decoupled architecture which in turn helps in the scaling of the application in the big leagues without having hard dependencies.

Why should you care?

Let us understand this with an example. Imagine we are building an online store which sells orange flavored candies. It’s a small store and it keeps local inventory, so we don’t need anything fancy here. The storefront application can just hook into the database and take orders online based on how much inventory is at hand. This will work fine since the store has only one supply warehouse and has a limited area of operation. But what will happen if this store wants to expand its area of operation? The store might want to expand into another city or across the country and having a central inventory system would be so cumbersome.

Now if we are still using data models then we have a somewhat tightly coupled application. The storefront application needs to be aware of every data source it has to interact with and that is a poor application design. The job of the storefront application here is to allow customers to place orders for the candy, the application should not be concerned about the data source, it should not have to keep track of all the different data sources. This is where data repositories come in to play. Per the Repository Design Pattern, a public API is exposed via an interface and every consumer (our storefront application in this case) uses that API to talk to the data source. Which data source is being used or how its being connected to, these are not of concern to the application. The application is only concerned with the data it gets and the data it sends to be saved.

Once the Repository Design Pattern is implemented, repositories can be created for each data source. The storefront application no longer would need to keep track of any data source, it just uses the repository API to get the data it needs.

Is it the magic bullet?

Well, no it is not. Like every design pattern it has its ups and downs, pros and cons.

Pros:

  • Separation of concerns; the application need not know about or track any or all data sources.
  • Allows easy unit testing as the repositories are bound to interfaces which are injected into classes at run time.
  • DRY (Dont Repeat Yourself) design, the code to query and fetch data from data source(s) is not repeated.

Cons:

  • Adds another layer of abstraction which adds a certain level of complexity making it an overkill for small applications.

How to go about it?

Let us see this with a little code example. I will use Laravel here in the example to leverage its excellent dependency injection feature. If you use any modern PHP framework then it should already have Dependency Injection/IoC container. Dependency Injection is required to implement Repository Design Pattern because without it you will not be able to bind a data repository to the repository interface and the whole idea is to code to an interface to avoid hard coupling. If you are not using any framework or your choice of framework does not have an IoC container then you can use an off the shelf IoC container (check footnotes).

Let’s crack on. Firstly, we set up our namespace and autoloading in Composer. Open up composer.json and add psr-4 autoloading for our namespace (in autoload node right after classmap).

    "autoload": {
        "classmap": [
            "app/commands",
            "app/controllers",
            "app/models",
            "app/database/migrations",
            "app/database/seeds",
            "app/tests/TestCase.php"
        ],
        "psr-4": {
            "RocketCandy\\": "app/RocketCandy"
        }
    },

After saving it, execute composer dump-autoload -o in the terminal to register autoloading for the new namespace. Create OrangeCandyRepository.php in app/RocketCandy/Repositories/OrangeCandyRepository/. This will be our repository interface.

<?php

namespace RocketCandy\Repositories\OrangeCandyRepository;

interface OrangeCandyRepository {

    public function get_list( $limit = 0, $skip = 0 );

    public function get_detail( $candy_id = 0 );

}

Now that we have an interface, we can create a repository. Create CityAOrangeCandyRepository.php in app/RocketCandy/Repositories/OrangeCandyRepository/.

<?php

namespace RocketCandy\Repositories\OrangeCandyRepository;

class CityAOrangeCandyRepository implements OrangeCandyRepository {

    public function get_list( $limit = 0, $skip = 0 ) {
        //query the data source and get the list of
        //candies
    }

    public function get_detail( $candy_id = 0 ) {
        //query the data source and get the details of
        //candies
    }

}

To bind CityAOrangeCandyRepository repository to the OrangeCandyRepository interface, we will leverage Laravel’s IoC container. Open up app/start/global.php and add following to the end of the file.

//OrangeCandyRepository
App::bind(
    'RocketCandy\Repositories\OrangeCandyRepository\OrangeCandyRepository',
    'RocketCandy\Repositories\OrangeCandyRepository\CityAOrangeCandyRepository'
);

Note: I have put the IoC binding in global.php only for demonstration. Ideally these should be put into a separate file of their own where you would put all the IoC bindings and load up that file here in global.php or you would create Service Providers to register each IoC binding. You can read more here.

Now we can go ahead and make use of the repository via the interface. In our CandyListingController.php located in app/controllers/.

<?php

use RocketCandy\Repositories\OrangeCandyRepository\OrangeCandyRepository;

class CandyListingController extends BaseController {

    /**
     * @var RocketCandy\Repositories\OrangeCandyRepository\OrangeCandyRepository
     */
    protected $_orange_candy;

    public function __construct( OrangeCandyRepository $orange_candy ) {
        $this->_orange_candy = $orange_candy;
    }

}

Here we inject the OrangeCandyRepository interface into our controller and store its object reference in a class variable which can now be used by any function in the controller to query data. Since we bound OrangeCandyRepository interface to the CityAOrangeCandyRepository repository, it will be as if we are directly using CityAOrangeCandyRepository repository itself.

So now, the type and kind of data source is the sole concern of CityAOrangeCandyRepository here. Our application knows only of OrangeCandyRepository interface and the API it exposes to which every repository implementing it must adhere to. The repository is resolved out of IoC container at run time, which means the interface <=> repository binding can be set as needed, the interface can be bound to any data repository and our application would not have to be concerned about the change in data source which can now be a database or a web service or a trans-dimensional hyper-data conduit.

One size does not fit all

As I mentioned above in the Cons of Repository Design Pattern, it adds a bit of complexity to the application. So if you are making a small application and you do not see it graduating to big leagues where more than one data source might be called into service, you will be better off not implementing this and sticking to good old data models. Knowing something is different than knowing when to use that thing. It is a very handy design pattern that saves a lot of headache both when creating an application and when that application has to be maintained or scaled up (or down) but it is not a magic bullet for every application.

I used Laravel specific code to demonstrate the implementation above, but it is fairly simple and similar with any decent IoC container. Got questions? Fire away in the comments below.

Footnotes:

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://loige.com/ Luciano Mammino

    Hi Amit. Great article! The repository pattern is one that I like most since I started using it with doctrine 2.

    I don’t think the drawback you mentioned is real… I mean, creating an interface for a repository (also on a small application) gives you room to think in advance about how you would retrieve your data. Then writing the implementation for that interface and using the resulting class within your code should be quite simple and will lead to a more organized code.

    Anyway it’s a great post! I suggest you to write a similar article for the “data manager” or “manager” design pattern

    PS: pimple is another great DI container to consider, it’s absolutely worth mentioning! ;)

    • http://amitgupta.in/ Amit Gupta

      Thanks for liking the article.

      > I don’t think the drawback you mentioned is real… I mean, creating an interface
      > for a repository (also on a small application) gives you room to think in advance
      > about how you would retrieve your data.

      The drawback I have mentioned does not refer to having to create interface for the repository but having to implement the Repository Pattern itself. Not every application needs Repository Pattern and it does add some complexity (or over abstraction) which would be wasteful if you don’t intend to use multiple data sources. For example; if I’m making a blog app for myself it would be overkill if I implement Repository Pattern when I know I’m just going to use a single MySQL database. :)

      • http://loige.com/ Luciano Mammino

        Yes, I got your point Amit.
        You are probably right about “over abstraction”, but I believe that most of the times the advantage of having a repository class (even if you don’t use multiple data sources) are fairly greater than the disadvantage of spending a bit more time on writing it.
        That’s just to say, in my honest opinion, it does not only help you to write multi data sources code, but also on keeping your code well organized, decoupled and clean.

  • Tony Marston

    You say that this pattern is good when changing from a single data source to multiple data sources, but you have failed to identify what each source is, or to show code which deals with more than source. Is each data source supposed to be a separate database at a different location or what?
    I have built an application which allows inventory to exist in any number of facilities in any number of containers, and all that is in a single database. It is one thing for the website to ask “do we have anything in inventory for this item?” as the actual facility/container are irrelevant and are only required in the back office application when picking from inventory prior to shipping the order.
    I completely disagree that it is necessary to have a separate layer between the application and the data source. In my own framework the application layer contains all my Models from the MVC pattern, and my data layer consists of a Data Access Object (DAO). There is a separate DAO class for each DBMS engine (MySQL, PostgreSQL, Oracle and SQL Server), but the Model does not know which of those objects it is talking to. The DAO object is *not* injected into the Mode as I have a function which operates like a cross between a Factory and a Singleton which provides the correct object at runtime only when it is actually needed.
    Whether you like it or not the application *is* dependent on the data source (it has to, otherwise the application won’t work), and trying to hide it away behind several layers of indirection does not make the code better, it makes it more complicated and difficult to understand. When I see the amount of code you need to generate in order to get the application layer to talk to the data layer I just have to shake my head in amazement. Why are you taking a simple concept and making it more complicated than it need be?
    When you say that simple code doesn’t scale you are forgetting one thing – if you have more code than necessary to achieve an objective then that excess code is nothing but an unnecessary overhead, and it is the volume of unnecessary overheads which has the greatest effect on scalability. A application which requires no more than five classes to achieve a result will always scale better than a similar application which requires fifty.

    • http://amitgupta.in/ Amit Gupta

      > You say that this pattern is good when changing from a single data source to
      > multiple data sources, but you have failed to identify what each source is,
      > or to show code which deals with more than source.

      Teaching how to code to connect to multiple data sources at lower level was not in the scope for this article.

      > Is each data source supposed to be a separate database at a different location or what?

      To quote from the article:

      >> our application would not have to be concerned about the change in data source which
      >> can now be a database or a web service or a trans-dimensional hyper-data conduit

      Meaning: a data source can be anything, it can be an RDBMS, flat text files, a No-SQL database, a web service or even an organic (or artificial) neural network. The application in itself doesn’t need to know which one is being used, a repository is bound to the interface and the application trusts the interface and the API it exposes for consumption.

      > I have built an application which allows inventory to exist in any number of facilities
      > in any number of containers, and all that is in a single database. It is one thing for
      > the website to ask “do we have anything in inventory for this item?” as the actual
      > facility/container are irrelevant and are only required in the back office application
      > when picking from inventory prior to shipping the order.

      Good for you. The example in article is for illustration to help a person unfamiliar with Repository Pattern to understand how it can be implemented. Rocket Candy is a fictional store here.

      • Tony Marston

        Saying that this pattern makes it easier to swap from one data source to another is not obvious unless your example shows at least two data sources, otherwise I can only see it being tied to a single data source.
        In my own framework I can switch the data source between MySQL, PostgreSQL, Oracle and SQL Server simply by changing a single line in the config file, which results in the Data Access Object being instantiated from the relevant class. Each of these classes has exactly the same interfaces but a different implementation, and each application object is totally unaware of which database is being used. The ability to use the same interface with different implementations can be achieved using nothing more than polymorphism, so saying that you should use a separate repository pattern is moving away from simplicity and adding unnecessary complexity.

        • Alessandro Pellizzari

          Hi Tony,

          I understand your doubts, because I had them too.
          But your framework deals only with SQL sources.

          Imagine having to retrieve some data from MongoDB, some from Cassandra and some from a REST API.
          It’s not an uncommon scenario in medium/big architectures.

          This pattern is part or what’s needed to achieve this kind of decoupling. It’s not complete, but showing the complete picture would require many articles or pages, maybe a complete manual.

          The concept is well explained, but I would have left the DIC out of the picture (it complicates things a little) and would have just used DI to directly inject the object directly, for the sake of the example.

          • Tony Marston

            If every data source has a common set of APIs then the implementation behind those APIs will be irrelevant. That is what polymorphism already provides out of the box.
            My data sources are all persistent data stores which are databases, relational or otherwise. Nobody in the real world will switch from a database to a text file, or a spreadsheet, or a REST web service, so the need for that level of complexity does not exist.

          • gggeek

            Interesting point.

            In my own experience (not claiming universality of course), 99% of time apps do not switch databases during their entire lifetime, either. When they do, it’s at the same time that they get rewritten from scratch. So one could say that even using an abstraction to insulate the app from the sql dialect is a waste.

            Otoh I was working at an airport company where the main operational db had to be switched from sybase to oracle. The app had been coded close-to-metal, with no clear separation of concerns / modules, taking advantage of all features of the db at hand. It took 3 years not to do the change – only to find a contractor who dared to submit a quote for doing the switch. They all ran away at the mere thought of the complexity of the thing.

          • colinwiseman

            This is what I was going to say. No one changes source mid project. Especially if you use source specific code e.g MSSQL’s way to get a paged set of data is entirely different to MySQL.

          • http://amitgupta.in/ Amit Gupta

            Generally that holds true but its not a constant. I’ve seen apps that started out with one database but after X time moved to another database for one or more reasons. Like one app started out with CouchDB (for whatever reason) and then after a couple of years or so it didn’t seem like a good fit so they moved to MySQL. In such a case, if you put in DB specific code right into the app mixed with other business logic then you would have a problem. But if you have an abstraction layer like Repository Pattern implemented then it would just be a matter of writing new repository classes and switching the interface bindings.

            As I mentioned in the article above, if something is available then that does not mean it must be used.

          • colinwiseman

            Ah see this is a good point. An upgrade to the software like that I get. Changing database not just for the heck of it. And yes SQL and business logic should rarely mix. It happens though especially around optimisation – taking a whole bunch of calls into a single call to build an object is a great way to reduce load. Doing certain bits of logic e.g. summation of numbers in large data sets should always be done in the db. It’s better than sending large swathes of data across the network to do it there.

            Seperation is good. But if your app is not going to the outside world, KISS. Too many young developers takevarticles like these and go to town. Everything becomes an interface. But interfaces have problems e.g change the interface breaks the “contract”.

            Don’t get me wrong, this is a good article, and would love to see the behemoth that is WordPress found up rewrite using these ideas :-)

          • http://amitgupta.in/ Amit Gupta

            > Too many young developers takevarticles like these and go to town.
            > Everything becomes an interface.

            Knowing something & knowing when to use that something are completely 2 different things. The latter comes with experience & application of common sense! :)

            > But interfaces have problems e.g change the interface breaks the “contract”.

            Yup which is why one should never change an interface if its already in production. Either replace it or extend it (to add more stuff) or better yet, create another interface. A class can implement multiple interface so they need not be epic tomes.

            > would love to see the behemoth that is WordPress found up rewrite using these ideas

            Indeed, that would be kinda awesome! :D

          • colinwiseman

            By adding more interfaces you start to get yourself into so much of a muddle though! IEntity, IEntityPlus, IEntityExtra, IEntityFly, IEntityDoThis, IEntityDoThat.

            This is another good point that comes with layers. Don’t expose everything as an interface. If you have a Post object, don’t run around creating IPost between layers. Otherwise you will need to have IPostExtended, etc. Only have interfaces to things that may need to change their implementation e.g. the Persistence layer (from Database to File!!!). And only have really simple Interfaces likes IDisplay that you will then know there is a void Display() {} method available.

            I have seen code (look at NOPCommerce) where every single object be it a simple POCO to a complicated database layer had an interface. It was interlaced with mad amounts of IOC and DI to the point you really didn’t know what you were working with. And the thing was, it was .NET using SQL server and that would never change! I got the plugin architecture side of things – interfaces work amazingly well with plugins, and all the need to be is simple interfaces – IBuild, IRun or whatever. But this thing was wild coding!

            The abstraction went to point where it was just far too complicated, too much instantiation was happening due to IOC and DI and the code ground to a halt!!

            Crazy! Anyways, I digress.

    • ludofleury

      “A application which requires no more than five classes to achieve a result will always scale better than a similar application which requires fifty.”

      For most of applications, scaling is cheap & maintainability is expensive.
      That’s why you see some people adding more code where you think it’s not needed.

      • http://loige.com/ Luciano Mammino

        Totally agree!

  • gggeek

    I partially agree with the comments from Tony: to really understand the advantages and disadvantages of adopting a Repository, more complex cases and details are probably needed.

    I can bring to the table the example of the eZ Publish CMS, version 5, where each domain object is handled by a separate repository service. Currently all the services talk to the same database (as the main goal was keeping complete compatibility with the existing db schema), but the line of thinking is that in the future the storage layer might evolve, and possibly the system could even use different storages for different domain objects.

    Some of the “hard bits” which had to be solved:

    1. transaction usage. You want to make sure repo A and B are always updated using transactional code, when two domain objects relate (when moving, say, object X from inventory A to inventory B). And distributed transactions are a huge pain. They have always been slow and error prone in the db world, and in the webservices-based world of today they might be even slower and more error prone (have you ever tried to wrap a transaction around a series of http REST requests? In many cases you end up rewriting the service to be not-rest-oriented but rpc-oriented…)

    2. “domain objects” become quite more complex, as each of those now needs to implement 2 separate interfaces/classes: the one exposed to the app for business-logic concerns and the one used by the repository to store/retrieve the object (supposing some domain objects are complex enough not to fit in a single db table / have relations). And that’s just with a db storage layer. There will be more code needed when adopting f.e. nosql storages

    3. queries which span 2 db tables can now become requests which span 2 separate repositories. The chances to avoid N+1 requests decrease radically, performance suffers

    4. managing roles and policies: a complex policy model is best implemented by pushing down the policy filters all the way down into the model. This makes it easy to add different layers on top (rest api, website, cli app) without duplicating access controls. The problem here is that as soon as you have implemented that, you find out that you need “sudo” access in a lot of functions, and separating those away from “plain functions” is hard by itself. F.e. in eZPublish the user accounts service and the content service ended up having a circular dependency which is quite hard to untangle

    PS: I liberally mixed the storage-layer and the repository-service in the above description. I think the point is that it is really hard not to have the storage-layer semantics percolate in the repository-service api for anything but trivial cases.

    PPS: Of course there are benefits as well, not only drawbacks :-)

    • http://amitgupta.in/ Amit Gupta

      > to really understand the advantages and disadvantages of adopting a Repository,
      > more complex cases and details are probably needed

      Yup, but my intention for this article was to dumb it down so it is easy to understand for those who are new to this stuff. Once the basic understanding is achieved then one can focus on the nuances.