DAO -- should they be based on one table?

I’ve been creating DAOs that work with multiple tables. Apparently Martin Fowler
and others advise using one DAO per table. Is this true?

Why not create your DAOs with usage in mind, rather than just a gateway to one
table? For instance, I might have an article DAO that let’s me get data about
articles. But that data might span multiple tables, so I may have to do sql joins
in the DAO to get the data.
The relational model might be a good way to store data, but why force that way
of thinking on the rest of your application? In most cases, it seems to be
an unnatural way to think about things(unless you’re an RDBMS).
If I need to get information about articles all throughout my application,
then I don’t see a problem designing a DAO around a “concept” of an article, rather
than around how the data is stored.

Am I wrong? If so, how?

–ed

Hi…

Not exactly. The “DAO” is Java terminology and frankly I find it a bit vague. You end up looking at the JavaDocs trying to figure out what they mean. Fowler uses RowDataGateway and TableGateway for the low level DAO access, which is indeed one class per table/view/proc unless you count joins. You wouldn’t normally use these at the application level though unless you are working with mainly tabular data.

What you are trying to build is either ActiveRecord (if a domain object) or DataMapper (if a domain service).

At least that is what I think it all means :).

yours, Marcus

Ahh, so it’s terminology glitch then. So I guess I’m ok doing it the way I’m doing it?

–ed

I think that a Data Access Object (DAO) may have originated partly from the Facade pattern (GoF). It “hides” the details of database access (SQL calls) creating a unified interface.

JT

Hey coo_t2,

I’ve been thinking along the same lines lately, but I haven’t come to a conclusion yet. As far as I see it, both ways have their advantages and disadvantages:

  1. 1 DAO can query multiple tables (i.e. there’s a JOIN in the DAO taking place):

    • very practical
    • you lose your encapsulation. If you use DAO#1 to query table A and DAO#2 to query A and B together and you change a column name in table A, both DAOs will be affected. Might have repercussions all over the place, and might create bugs hard to trace.
  2. 1 DAO per table

    • pure design, good encapsulation
    • can be rather impractical and performance-costly. If you need info from several tables at once, you would have to query the DAOs for all those tables seperately and then maybe aggregate this info in an aggregate DAO. So you’re not using the power of relational db systems, i.e. the JOIN feature. Seems a bit stupid, right?

So, I don’t know what’s best here, either… Maybe someone can clarify things?

I’ve come to use the following.

DAO
Gateway
Manager

More detail at here.

It would help to know why the data would span multiple tables. That would make the discussion less abstract.

I’m trying to do is to seperate the “select” and “delete,update” database actions.

Every table has its own UpdateDao class which is responsible for “insert,delete,update” actions, while I also build some QueryDao classes arround the real object concept which is reponsible for "select " only actions.

The QueryDao can search several tables to retrieve enough information to form a real object or a resultset.

Ji Tao

Hm, I think this is a common situation: Imagine you have a db table with news articles. You have one column with a foreign key that identifies the author of this article. Then you have another table with all infos on authors.

On the page where you retrieve and display all news, imagine you want to display the authors name and other info along with each news item. In order to do this, you have to use a JOIN in your db query. All this info constitutes 1 complete news “object” for you, from the business perspective. But because of relational db systems and normalisation issues, you have to keep this info in several tables (which the business layer should know nothing about).

But in using a JOIN in a DAO thats supposed to query only 1 table, you violate the principle of encapsulation. See what I mean?

Ah, took me a while to see what you mean. There is still this problem: If you change the definition of one of the db tables, all related “Query DAOs” are affected by this and would have to be changed as well. You cannot always know which DAOs will be affected and you have to go through all of them. From the maintainability perspective, this is not ideal, because it’s easy to forget one of them, and this would create a bug.

Yes, but what I’m thinking at the moment is why not create an instance of an object, ie NewsMerger yes ?

But pass in as parameters, one for the news, and one for the authors ?

The actual class your creating an instance of would then know what to do with these 2 parameters yes ?

That’s about as the best method I can think of to get around the problem… Of course you can also use the authors and news classes independently of each other as well in other instances :slight_smile:

Yes, you could create an “aggregator” dao that queries the two simple DAOs and combines the information. But then you have at least 2 db queries instead of 1 (with the join), and db queries are always costly (performance-wise). There always seems to be a trade-off.

I have to admit this is not the pefect solution to bridge the RDBM and the OOP scripts, but it seems working well for me.

  1. It is very efficient.
  2. It provides good encapsulation.(The Query Daos are built arround the objects not tables)
  3. It may cause bugs when table changes, but easier to trace.

Can’t remeber which Query Dao is affected when you alter a table’s definition? Don’t worry, here are several solutions:

The brain, pencil and paper, MS Excel, another small script that will scan your source code and mark the related Query Daos and remind you fix them.
:slight_smile:

mkrz has probably given a better example than the one I have.
But in my particular case of storing articles, I have article information split
up into two tables. I have an “article” table with id, title, timestamp(could add more fields in the future), then I have an “article_sections” table that has id, article_id, title, content, timestamp(maybe some other fields that I don’t remember right now) .
The reason I chose to split the “article_sections” from “articles” is so that I will easily be able to break up articles into pages.

I’m also joining a “topics” table, and a “articles_topics_join” table, in other methods of the Dao. The topics stores all the nodes/categories of my site, the articles_topics_join table links topics and articles.

For example:

// returns articles that belong to this node.
// This method joins “articles”, “articles_sections”,
// and “article_topics_join”.
getByNode($nodeId)

// returns the id of the node that this article belongs to.
// This uses “articles_topics_join” to get the id of the topic
// that this article belongs to.
getNodeIdByArticleId($articleId)

Now I’m not really worried with violating encapsulation by using the “articles”, and “articles_sections” tables. Since these tables are both only really gonna(as far as I can see?) be used to help constitute an article.

But I do see how using the “articles_topics_join”, and “topics” tables in the dao could get me into trouble in the future. Because those tables can and will be used other places.

I guess it really is a choice between violating encapsulation or throwing away the advantages that an RDBMS brings?
I’m probably just going to be very liberal with using multiple tables in my daos, unless the project is expected to be huge. If it’s not a huge app, and you don’t have a huge number of DAOs, I don’t think fixing problems in the DAOs when table definitions change would be a huge nightmare.

–ed

Yes, the more I think about it, the more it seems stupid to me to manually recreate what a JOIN would do automatically by doing several db queries and somehow cobbling this together. Just seems rather impractical.

OK. I got two interesting answers to my question about why the data would span multiple tables. It seems to me the reason is similar: the data logically belongs to different tables. And it would seem that from an “ideal” object-oriented point of view, the data would belong to different objects. So ideally, you would retrieve a news article object from the database and a person object representing the author. The news article object would contain a reference to the person object, and you’d get the author name directly or indirectly.

This requires mapper objects to assemble the object structure. Fowler has a pattern called Foreign Key Mapping that can be used in this particular case.

The other alternative is to include the author name in the news article object. This is simpler but less flexible. In my view, the simpler solution is always preferable unless you actually need that flexibility. Another reason might be if you need a lot of information about the author.

In either case, it’s possible to get the data using a join. A mapper can extract the data from the joined tables and construct both objects. I’ve never actually tried this, but I imagine it’s critical how many authors there are. If you retrieve 1000 news items and there are only three authors, it seems silly to get hundreds of copies of the information about each author from the database.

I have now finally read this article in Fowler’s book. Funnily enough, he actually says this:

While it’s conceptually clean to issue one query per table, it’s often inefficient since SQL consists of remote calls and remote calls are slow. Therefore, it’s often worth finding ways to gather information from multiple tables in a single query. I can modify the above example to use a single query to get both the album and the artist information with a single SQL call. The first alteration is that of the SQL for the find statement.


class AlbumMapper... 
   public Album find(Long id) {
     return (Album) abstractFind(id);   }
   protected String findStatement() {
      return "SELECT a.ID, a.title, a.artistID, r.name " +
            " from albums a, artists r " +
            " WHERE ID = ? and a.artistID = r.ID";
   }

I then use a different load method that loads both the album and the artist information together.

class AlbumMapper... 
   protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException {
      String title = rs.getString(2);
      long artistID = rs.getLong(3);
      ArtistMapper artistMapper = MapperRegistry.artist();
      Artist artist;
      if (artistMapper.isLoaded(artistID))
         artist = artistMapper.find(artistID);
      else
         artist = loadArtist(artistID, rs);
      Album result = new Album(id, title, artist);
      return result;
   }
   private Artist loadArtist(long id, ResultSet rs) throws SQLException {
      String name = rs.getString(4);
      Artist result = new Artist(new Long(id), name);
      MapperRegistry.artist().register(result.getID(), result);
      return result;
   }

Appearently Fowler does actually not advise against querying several tables at once? For example, under “Table Data Gateway” (which is somewhat like a Dao?) he says:

Most times when you use Table Data Gateway, you’ll have one for each table in the database. For very simple cases, however, you can have a single Table Data Gateway that handles all methods for all tables. You can also have one for views or even for interesting queries that aren’t kept in the database as views.

So it seems to be not so bad after all to have a JOIN in a DAO?

I have two other loosely related questions though:

  1. In the code above, he returns an Album object that in turn contains another object, Artist. Is this possible in PHP?

  2. He advises using “Identity Maps” to cache objects and to reduce the number of db calls. Is this feasible with PHP and does it make sense, considering that PHP is stateless (if I understood this correctly)?

Yes. I read that the same way you do.

Sure. But you need to be careful about references in PHP 4. In this case it’s likely to work (but perhaps slower) without references.

(In case someone gets confused: the “code above” is Java code even though it’s labeled as PHP.)

It’s possible. This won’t work without references, though. I’ve done it, but only in PHP 5. (Serialization of object references won’t work properly in PHP 5 beta 3; you need beta 4.)

Does it make sense in PHP? Good question. It could, but probably not without sessions. It would mean getting the object from session storage instead of from the database. Pulling additional data from an existing session can be very fast, so that could improve performance. But is it enough to be worth the extra effort? You have to measure it in the actual application to know.

Dagfinn, could you please provide a small example for an aggregate object in PHP (I hope the term is right)? Would I include the subobject as a property, or is this bad practice? And could I then access it like this:

$artistname = $album.artist->getArtistName

(if the subobject Artist has a getter method “getArtistname”)?

Or would I create another getter and setter method to access the subobject. In that case, could/would I do sth. like this?

$artistname = $album->getArtist->getArtistName

I hope you see what I mean?

Getting back to the original DAO problem, after some more thinking, I think I have found an approach that would suit me. It might mean some extra work, but I don’t mind that as long as the solution is somewhat pure (design-wise).

Even though this has little to do with my solution, I’d like to state how I have been handling things so far. I communicate with my DAOs with the help of Transfer Objects. I know this is overkill for PHP, but I like the concept, because it provides strong encapsulation. It all worked very well, it was just that I recently thought about the problem of accessing several tables with one DAO. Before it gets too astract, I’ll provide a sample TO:



class ReviewTO {
	var $_arrValues; // private
  
	function ReviewTO() {
  $this->_arrValues = array();
 }
 
 // ACCESSORS 
 function getID() {return $this->_arrValues['id'];}
 function getBand() {return $this->_arrValues['band'];}
 function getTitle() {return $this->_arrValues['title'];}
 function getRating() {return $this->_arrValues['rating'];}
 function getLabel() {return $this->_arrValues['label'];}
 function getArticle() {return $this->_arrValues['article'];}
 function getAuthor() {return $this->_arrValues['author'];}
 function getLinkHref() {return $this->_arrValues['linkHref'];}
 function getLinkText() {return $this->_arrValues['linkText'];}
 function getImageName() {return $this->_arrValues['imageName'];}
 
// MANIPULATORS
 function setID($id) {$this->_arrValues['id'] = $id;}
 function setBand($band) {$this->_arrValues['band'] = $band;}
 function setTitle($title) {$this->_arrValues['title'] = $title;}
 function setRating($rating) {$this->_arrValues['rating'] = $rating;}
 function setLabel($label) {$this->_arrValues['label'] = $label;}
 function setArticle($article) {$this->_arrValues['article'] = $article;}
 function setAuthor($author) {$this->_arrValues['author'] = $author;}
 function setLinkHref($linkHref) {$this->_arrValues['linkHref'] = $linkHref;}
 function setLinkText($linkText) {$this->_arrValues['linkText'] = $linkText;}
 function setImageName($imageName) {$this->_arrValues['imageName'] = $imageName;}
}


The TOs are just very simple and not more than a value object. Because most business requirements in web pages are actually so simple, these TOs often double as my business objects as well (no need to create extra business objects that actually have no further functionality).
Because I often need a list view, what I do with DAOs is not to return 1 TO at a time but a collection of TOs (actually an array of TOs). Actually, in the fowler article mentioned before, he also covers collections, but he completely lost me there, so I had to come up with something else for my problem.

Ah, guess I’ll just post my DAO as well (it’s loosely based on Harry Fuecks’ DAO article).



<?php
/**
 *  Data Access Object for Review Table
 */
class ReviewDao extends DBDao {
	
 //! A constructor
	/**
	* Constructs the ReviewDao
	* @param $da instance of the DataAccess class
	*/
	function ReviewDao ( & $da ) {
		DBDao::DBDao($da);
  $this->_strSelectStatement = 
	  'SELECT  id, band, title, article, 
	  author, rating, creation_date AS date, 
	  image_name, label, link_address, link_text
	  FROM reviews ';
  $this->_arrCollection = array();
 }
 //! A private setter method
	/**
	* Stores query result in internal review collection
	* @param $objQueryResult a QueryResult object
 * @return void
	*/
	function _setResultCollection (& $objQueryResult) 
 {
  $this->_arrCollection = array();
  for ($it =& new QueryIterator($objQueryResult); $it->isValid(); $it->next())
  {
   $row = $it->getCurrent();
   $review =& new Review();
   
   $review->setID($row['id']);
   $review->setBand($row['band']);
   $review->setTitle($row['title']);
   $review->setRating($row['rating']);
   $review->setLabel($row['label']);
   $review->setArticle($row['article']);
   $review->setAuthor($row['author']);
   $review->setLinkHref($row['link_address']);
   $review->setLinkText($row['link_text']);
   $review->setImageName($row['image_name']);
   
   $this->_arrCollection[] =& $review;
  } 
 }
 
 //! An accessor
	/**
	* Searches all reviews
	* @return void
	*/
	function getAll($start=false,$rows=false) 
 {
		$sql= $this->_strSelectStatement . ' ORDER BY date DESC';
		if ( $start ) {
			$sql.=" LIMIT ".$start;
			if ( $rows )
				$sql.=", ".$rows;
		}
	 $this->_setResultCollection($this->_queryDb($sql));
 }
 
 //! An accessor
	/**
	* Searches reviews by band name
	* @return void
	*/
	function searchBand($name) 
 {
		$sql= $this->_strSelectStatement . "WHERE band='$name'" . ' ORDER BY date DESC';
		$this->_setResultCollection($this->_queryDb($sql));
	}
 
 //! An accessor
	/**
	* Searches reviews by first letter of band name
	* @return object a result object
	*/
	function searchBands($letter) 
 {
		$sql= $this->_strSelectStatement . "WHERE band LIKE '$letter%'" .
			" ORDER BY date DESC";
		$this->_setResultCollection($this->_queryDb($sql));
	}
 
	
	function totalRows() 
 {
		$sql= "SELECT count(*) as count FROM reviews";
		return $this->_queryDb($sql);
	}
}
?>


I might use this DAO like this:


$reviews =& new ReviewDao($da);  // passing in db connection
$reviews->getAll();  // populating result collection
$result = $reviews->fetchResultCollection();

I should probably mention that I have some base functionality in the superclass DBDAO, with methods for inserting etc., but hese are partly empty uin the superclass and can be overriden in the subclasses if and when needed. OK, let’s post this as well (again loosely based on Harrys article).


/**
 *  Base class for Data Access Objects
 */
class DBDao {
	/**
	* Private
	* $_objDA stores DataAccess object
	*/
	var $_objDA;
 
 /**
	* Private
	* $_collection represents the internal result collection
	*/
	var $_arrCollection;
 
 /**
	* Private
	* $_strSelectStatement stores a SQL SELECT statement
	*/
	var $_strSelectStatement;
	//! A constructor
	/**
	* Constructs the Dao
	* @param $da instance of the DataAccess class
	*/
	function DBDao ( & $da ) 
 {
		$this->_objDA =& $da;
	}
	//! A private accessor method
	/**
	* Queries the database
	* @param $sql the query string
	* @return mixed either false if error or object DataAccessResult
	*/
	function & _queryDb($sql) 
 {
		if (DEBUG) echo "sql = $sql"; // for debugging purposes
  $result=& $this->_objDA->query($sql);
		if ($result->isSuccess()) {
			return $result;
  } else {
		trigger_error($this->_objDA->getErrorMessage());
		return false;	
		}
	}
	//! A private setter method
	/**
	* Stores query result in internal result collection
	* @param $objQueryResult a QueryResult object
 * @return void
	*/
 function _setResultCollection(& $objQueryResult) 
	{
		die('Private method <b>_setResultCollection</b> of class <b>DAO</b> is not implemented.');
	}
 
 //! An accessor
	/**
	* Gets the result collection (an array of objects)
	* @return array  
	*/
 function & fetchResultCollection() 
	{
		 return $this->_arrCollection;
	}
 
 //! An accessor
	/**
	* Retrieves all rows in the database
	* @return void
	*/
	function getAll($start=false,$rows=false)
 {
		die('Method <b>getAll</b> of class <b>DAO</b> is not implemented.');
	}
 
 //! An accessor
	/**
	* Searches an element by its id
	* @return void
	*/
	function getByID($id)
 {
	  $sql= $this->_strSelectStatement . "WHERE id='" . $id . "'";
  $this->_setResultCollection($this->_queryDb($sql));
	}
 
 //! A manipulator method
	/**
	* Inserts an element/object (i.e. a row) into the database
	* @return mixed (id number if the insertion was successful, otherwise false)
	*/
	function insertDataObject($obj)
 {
		die('Method <b>insertDataObject</b> of class <b>DAO</b> is not implemented.');
	}
 
 //! A manipulator method
	/**
	* Updates an element/object (i.e. a row) in the database
	* @return bool (true if the update was successful, otherwise false)
	*/
 function updateDataObject($obj)
 {
		die('Method <b>updateDataObject</b> of class <b>DAO</b> is not implemented.');
	}
 
 //! A manipulator method
	/**
	* Deletes an element/object (i.e. a row) in the database
	* @return bool (true if the update was successful, otherwise false)
	*/
 function deleteDataObject($obj)
 {
		die('Method <b>deleteDataObject</b> of class <b>DAO</b> is not implemented.');
	}
}


OK, let’s take the example with news and authors again. My approach with several tables was to do a join in the DAO for the news and to expand the news TO so that it contained author data as well. I didn’t find this ideal, because the news TO might get a little bloated if you do a join on 3 or more tables.

Now, I thought, why not do it like this: I create “simple” DAOs and TOs for every DB table. Sometimes I really need info just from 1 table. Because these are so simple and standardized (especially the TOs) I might even write a little code generator to generate these automatically. So that’s really not much work. Now, whenever I have a page that requires info from several tables (maybe you could call this a compound view? Not sure), I create a special DAO for this (manually) that I even give a special kind of name, sth. like AggregatorNewsDAO, for example, so that I know straight-away that it contains aggregate data from several tables. I do a join in the DAO and take the TOs that I already have for the single tables and fill them with the data from the join. So, now I don’t return 1 TO (or, in my case, a collection with 1 “layer”), but an array that contains 2 TOs instead (or a collection with two “layers” of TOs). That way I only have to write a new DAO and can reuse the code for the TOs. In this respect, I think this is actually better than for example including the author as a subobject for a news object (see my post above) because i don’t need to change my original “simple” news object (TO).

Of course, there will be, for example, many Author TOs in my result collection that are exactly alike (because there are only a few authors, but many news items), but actually the alternative (creating a “special”, expanded TO that contains the news and the author info), would also carry this superfluous info, so I don’t think this is really wasteful. And, moreover, i still get this all with just 1 db call.

Not to forget that the scope of potential repercussions of db changes remains strictly limited: I know that a db table layout change has results in only the DAO and TO for this table and potentially in all DAOs that have an “Aggregate” in their name. I can be sure that all other “simple” DAOs remain unaffected.

Phew, hope this makes some sense? I’m really not sure about best practices yet. Comments welcome.