Maphper: A PHP DataMapper

TomB · April 16, 2015, 2:59pm

I’ve been working on this on and off for a while now and it’d be nice to get some feedback.

It’s a Data Mapper that treats a database table (or other data source) as an array so that once the mapper is defined you can call:

$user = $users[123];

To query the users table for the user with the id 123.

Similarly you can write a record using:

$user = new User;
$user->name = 'Tom';

$users[] = $user;

Which is the basic usage. It also supports looping:


foreach ($users as $user) {
echo $user->name . "\n";
}

Which will print out the name of each user in the table… as well as filtering:

foreach ($user->filter(['type' => USER::ADMIN]) as $user) {
  
}

to loop through all the admin users.

echo $users[123]->address->country;

with unlimited chaining e.g.

$order = $user[123]->orders->item(0);
foreach ($order->products as $product) {
   echo $product->name .  ' ' . $product->cost . "\n";
}

Composite primary keys represented as a 2d array:

//Two manufactuers may use the same code for their product so the PK needs to be manufacturer id + product code
$product = $products[$manufacturerId][$productCode];

It also supports:

One to one relationships
Many to many relationships
Creating database tables on-the-fly
Optimising database tables on-the-fly (including automatically adding indexes where they’re useful)

And a bunch more stuff. It’s all documented over at github: https://github.com/TomBZombie/Maphper and I’d be interested in comments/suggestions.

This project is a bit of a slower burner as I’ve just been adding features as I need them but it’s at a stage where some feedback would be useful

Any comments/suggestions are welcome.

s_molinari · April 16, 2015, 3:43pm

Sounds like an ORM “light”. I like how you’ve simplified working with the data. I’d love to have this working for MongoDB!

Scott

oddz · April 16, 2015, 6:11pm

I’m intrigued by the concept of using array indexes to specify filters and possibly sorting, limits, grouping, etc. However, that is really the only thing here that hasn’t been done a thousand times before. Therefore, I much rather have the after-mentioned concept added to an existing, well known ORM or ActiveRecord than building a completely new one. Just seems like there would be more value in that than creating yet another one. This would work pretty well with Eloquent for example. I know Eloquent is an ActiveRecord and not a Mapper but the concept would add a another level elegance to the system.

Ex.

$users = User[‘status=1’][‘name=blah’]->find();

The main issue with that would be specifying bind parameters, and differentiating between field context when it comes to managing different clauses like sorting, limits, etc.

Maybe:

$users = User[‘status=?’][$value][‘name=?’][$value2]->order()[‘name asc’]->find();

Interesting change-up of the standard API one the less but I would never use an unknown, untested solution over a well known one just because of this alone.

s_molinari · April 17, 2015, 5:13am

Nice docs.

I have a question. Wouldn’t it be better, if the class instantiation for the object is made a lot simpler too? You are giving us very nice methods for filtering, but instantiating classes is quite mofugly.

$authors = new \Maphper\Maphper(new \Maphper\DataSource\Database($pdo, 'author'));
$blogs = new \Maphper\Maphper(new \Maphper\DataSource\Database($pdo, 'blog', 'id'));
$blogs->addRelation('author', new \Maphper\Relation\One($authors, 'authorId', 'id'));

Need I say, there is too much duplication?

To me, instantiating the map should be something as simple as

$blogs = new Maphper('blog'); //and that is it!

Because we had mapped the blog object with the author relationship once earlier at some point in time (and the core reason of having a mapping system to begin with, right?), then the code in the example in your docs could also be a bit reduced.

$blog = new stdClass;
$blog->title = 'My First Blog';
$blog->date = new \DateTime();
$blog->author->name = 'Tom Butler';

$blogs[] = $blog;

Ok, it is only one line, but it is simpler. I think the author relationship should be automatically instantiated “internally” through the mapping done earlier.

Does that make any sense?

Edit: I am also missing some sort of unit of work. I can’t imagine you’d want the give the dev the ability to create and fire off database queries to be one-to-one with the writing method you explained above. So you might need to offer something like…

 $blogs[] = $blog;
 $blogs[] = $blog->flush();

or

$blogs[] = $blog->persist();

where “flush” or “persist” is the final “ok, I am now ready to finally start working with the database” methods.

Scott

s_molinari · April 17, 2015, 5:23am

Tom has some pretty good docs about how to do this.

foreach ($blogs->limit(5)->sort('date desc') as $blog) {
  echo $blog->title;
}

Jasmine · April 17, 2015, 5:30am

Tom, I’ve moved this to ShowCase as this is great and I’d like people outside of PHP to see it

TomB · April 17, 2015, 8:54am

Actually the only thing that you can use the array indexes for is primary key lookup e.g. $users[$pk1][$pk2][$pk3] but those fields have to be designed as the primary key.

What it does do that hasn’t (AFAIK) been done before is generating database tables without any kind of developer supplied metadata. Consider a completely empty schema and running this code:

$users = new Maphper(new \Maphper\DataSource\Database($pdo, 'user', 'id', ['editmode' => true]));

$user = new stdclass;
$user->name = 'Tom';
$user->level = 1;
$user->registrationDate = new \DateTime

$users[] = $user;

Will create a table user with fields int id, varchar name, int level and datetime registrationDate

Once you do a query:

$users->filter(['type' => User::ADMIN])->sort('name DESC');

It will add indexes to relevant fields e.g. in this case ‘type ASC’ and ‘name DESC’

Obviously this is designed for development and has a performance overhead so should be turned off in production

With editmode turned on, Maphper will do an inversion of the usual control and remove the control from the database and give it to the application. It will reshape the database based on the supplied data so that the data can be saved. No more errors or loss of data when trying to write a 256 character string to a VARCHAR(255) column, it will just resize the column and then save the data.

actually you can do that already:


foreach ($users->filter(['name' => 'tom', 'level' => User::ADMIN])->sort('name DESC')->limit(10)) {
	
}

it also supports some pretty advanced filtering, although I’ve not convinced myself this is the best way of doing it yet so I’ve not documented it

$author->books->filter([\Maphper\Maphper::OR => [
		\Maphper\Maphper::FIND_GREATER => [
			'sales' => 500000
		],
		\Maphper\Maphper::FIND_LESS => [
			'date' => new \DateTime('1999-01-01')
		]
	]
]);

Which will find any of the author’s books with sales greater than 50000 or a publication date before 1999.

I totally agree and actually have done something to fix that problem but the reason Maphper needs a complex instantiation is that it will support other data sources e.g.

new Maphper(new \Maphper\DataSource\CSV('../afile.csv'));
new Maphper(new \Maphper\DataSource\XML('./file.xml'));
new Maphper(new \Maphper\DataSource\TwitterFeed('username', $oauth);

etc.

I agree that the instantiation is a bit much but it’s the only way to keep the code flexible and extensible What I’ve actually done is created a factory that loads an XML file (for now) of all the mappers and their relationships in the system that then allows:

$loader = new \Maphper\Loader\Xml('maphperconfig.xml');
$blogs = $loader->getMapper('blogs');

and takes an XML file that looks like this:


		<database name="blogs">
			<table>blog</table>
			<relation name="author">
				<to>authors</to>
				<type>one</type>
				<localKey>authorId</localKey>
				<foreignKey>id</foreignKey>
			</relation>
		</database>


		<database name="authors">
			<table>authors</table>
			<relation name="blogs">
				<to>blogs</to>
				<type>many</type>
				<localKey>id</localKey>
				<foreignKey>authorId</foreignKey>
			</relation>
		</database>

Again, I’ve not convinced myself this is the best way yet so I haven’t released it.

s_molinari:

$blog = new stdClass;
$blog->title = 'My First Blog';
$blog->date = new \DateTime();
$blog->author->name = 'Tom Butler';

$blogs[] = $blog;
Ok, it is only one line, but it is simpler. I think the author relationship should be automatically instantiated “internally” through the mapping done earlier.

Does that make any sense?

I like this idea but the problem with this approach is that you cant use stdclass for the objects as when creating stdclass it obviously won’t have the author property set to being an object so the $blog->author->name = '' line will fail. Thinking about it, this should work already:

$blog = $blogs->createNew();
$blog->title = 'My First Blog';
$blog->date = new \DateTime();
$blog->author->name = 'Tom Butler';

$blogs[] = $blog;

Of course this makes the assumption that the mapper is available in the place the $blog object is constructed.

In the name of simplicity (Treating the mapper like an array), I can’t see any direct advantage of the $blog->persist() call. I’m happy to be shown otherwise but what is the practical difference between:

 $blogs[] = $blog;

and

$blogs[] = $blog->persist();

If it’s a case of explicitly saying “I want to store the data in the database” then the first is surely saying “I want to store the data in the array”.

TomB · April 17, 2015, 10:55am

One other thing: Part of that is down to the namespaces, if you use the use keyword a lot of that verbosity is lost:

use Maphper\Maphper;
use Maphper\DataSource\Database;
use Maphper\Relation\One;

$authors = new Maphper(new Database($pdo, 'author'));
$blogs = new Maphper(new Database($pdo, 'blog', 'id'));
$blogs->addRelation('author', new One($authors, 'authorId', 'id'));

oddz · April 17, 2015, 3:39pm

So it builds the persistent storage schema as you fetch and save data. I’m not to sure how I feel about that. Seems brittle and prone to breakage. Perhaps a nice thing for quickly prototyping though. You’re kinda taking agile to the extreme with this.

oddz · April 17, 2015, 3:43pm

One of the first issues I see with this is updating an existing db for new features. If a db already exists you would have to run these commands at least once on the environment to sync up the environment schema. Which means that at least once the application would need to alter the schema of the persistent storage device in a production environment. You have essentially created a poor mans “db” update taking after the concept of a poor mans cron where the application request build the structure on the db as needed.

TomB · April 17, 2015, 3:53pm

That’s a very good point. Perhaps I should log any CREATE/ALTER statements so that development and production environments can be easily synced.

TomB · April 17, 2015, 4:07pm

Not sure I agree with this. If anything it’s less brittle as it will always save data given to it. The upshot of this is that an insert query will never fail. The downside, of course, is that the database doesn’t enforce types. Think of it like json_encode() or csv/xml storage. There’s no type checking of the data it’s just stored so that it can be retrieved later.

Essentially I’m trying to reduce the need to describe the data in both the application and the database. If data passes the application’s validation rules, then the database is amended on the fly to run with it. If a new column is added in the application, it’s added in the database automatically rather than having to add the column in the database and then reference it in the code, referencing it in the code is enough.

Where it breaks, of course, is if other applications need to access the same database. However, for what I need 99% of the time this is not an issue (and if it is, it’s easy enough to turn off the DB modification feature).

s_molinari · April 18, 2015, 5:15am

That will work!

Now that I think about it, you are right. The assignment to the array could mean “persist now”.

Scott

s_molinari · April 18, 2015, 5:34am

What does MultiPk stand for? Edit: Never mind. Multiple Primary Keys. Got it!

Scott

TomB · April 20, 2015, 10:44am

Fair point. Do you think CompositeKey MultipleKeys would be a more suitable class name?

s_molinari · April 20, 2015, 11:22am

Since you explain it as it being a “Composite Primary Key” in the docs, I think something along those lines might be the best name? CompPriKey? LOL!

Scott

mefisto · May 6, 2015, 1:52am

Strange how this “data mapper” doesn’t look anything like data mapper pattern, but instead it kinda seem really like active record implementation.

Curious.

TomB · May 6, 2015, 8:08am

Not sure how you’ve come to that conclusion. In AR, the record has a dependency on the storage mechanism (the database) and methods to save/load, e.g:

$person = new Person($pdo);
$person->name = "Tom";
$person->save();

In Data Mapper, the entity object is not coupled to the storage mechanism at all, e.g.:

$person = new Person();
$mapper = new PersonMapper($pdo);
$person->name = "Tom";
$mapper->save($person);

This implementation is most certainly the latter

system · August 5, 2015, 3:22pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.