SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Member kmeister's Avatar
    Join Date
    May 2003
    Location
    MI
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    RecordSet Pattern

    Hello,

    I'm in the process of refactoring some existing code, focusing on the database / data access portions. Right now, a lot of the code is structured around querying the database and then looping through the results, displaying the data or doing some simple business logic.

    For example:
    <CODE>
    $conn = mysql_query($query);
    while($row = mysql_fetch_array($conn))
    {
    echo "$row['first_name'], $row['last_name'] <a href='account.php'>\n";
    ......etc..
    }
    </CODE>

    Looking for patterns in the code / researching enterprise patterns, I have come across the RecordSet pattern as an ideal goal for where I want the refactor to head.
    Now I have a question for the pattern pundits:
    Should the RecordSet pattern be used only in a way that it holds the entire query results in memory (like WACT does, and PEAA suggests),
    Or is it not considered bad practice to only fetch from the db as the need arises (i.e. the next() member is called)

    I know there are advantages to holding all the data in memory, I'm just not sure I need too and it seemed like it would be worth saving the memory by using a pattern that encourages incremental result fetching from the db. I'm just not sure if there's a pattern other than RecordSet that already encourages this.

  2. #2
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Fetching everything into memory is the preferred option for Java people where the data can be cached between requests. It is a really bad idea for PHP where the data will be thrown away at the end of the query.

    Your instincts are right. You want to iterate if the set is large. How big are your result sets?

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  3. #3
    SitePoint Member kmeister's Avatar
    Join Date
    May 2003
    Location
    MI
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lastcraft
    Hi...

    Fetching everything into memory is the preferred option for Java people where the data can be cached between requests. It is a really bad idea for PHP where the data will be thrown away at the end of the query.

    Your instincts are right. You want to iterate if the set is large. How big are your result sets?

    yours, Marcus
    My result sets are not more than a few hundred at present, but keeping in mind that this class may be used in the future for larger sets, I wanted too ensure that i'm on the right path.

    One of the things that initially attracted me to the idea of using RecordSet was that you could run a query, get a record set, then play with the results of that record set (formatting, validation etc..) and pass it around without worrying about being coupled to the db still. You obviously cannot do that if you are still fetching from the db internally.

    Also, You are contributing to WACT, correct? Why then do they use RecordSet and cache the results in memory?
    Quoting from the WACT api docs..

    http://wact.sourceforge.net/index.php/RecordSet

    From EnterprisePatterns : a RecordSet is an in memory representation of tabular data.

    WACT implements a RecordSet that is "UI aware" - it understands the DataSpace so after performing a query, it's iterator populates the DataSpace row by row, allowing components like the ListComponent to iterate over the RecordSet
    Of course, I haven't actually looked at the source, just the api docs so maybe they aren't doing that. Selkirk may be the best person to ask I suppose.

    Thanks for commenting.

  4. #4
    SitePoint Evangelist ghurtado's Avatar
    Join Date
    Sep 2003
    Location
    Wixom, Michigan
    Posts
    591
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have to say that I am more inclined to think that it is a better option to leave it up to the iterator to retreive each row from the DB as it is needed. Without a doubt this is the sensible option (performance-wise) in stateless / cacheless languages like PHP, but I even wonder what the purpose is of getting all the rows from the DB at once in any language whatsoever. Where we say today "fetch a row from the DB", we could be saying tomorrow "make a remote LDAP query" or "establish a RPC connection and fetch some XML data". These other possibities are probably more time consuming to run, and as such, I think its only "fair" that the iterator returns the results as it gets them, rather than grabbing them all at once.

  5. #5
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ...but I even wonder what the purpose is of getting all the rows from the DB at once in any language whatsoever.
    It is suggested from PEAA that in most cases though it is always better to seek more results (rows) than what you may actually need in the sense that one query (more general) is far, far better than multiple (more specific) queries.

  6. #6
    SitePoint Zealot ZangBunny's Avatar
    Join Date
    Jul 2003
    Location
    Mainz, Germany
    Posts
    119
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dr Livingston
    It is suggested from PEAA that in most cases though it is always better to seek more results (rows) than what you may actually need in the sense that one query (more general) is far, far better than multiple (more specific) queries.
    This doesn't mean that you should always get as many rows as possible, only that you should rather get some more than you need than get not enough and be forced to do another query for the rest. (One should also be aware that even though PEAA applies well to PHP in general, it is still very Java centric, and in an environment where you can cache resultsets in memory between page requests, getting all rows can make sense.)

    "Slightly more than just enough" would be a good rule of thumb. While it is true that extra database queries carry a high overhead that should be avoided, moving big amounts of data from the database to the PHP script (which often run on different machines with only a shared 100Mbit/s connection between them.) can also be a bottleneck, especially if your data contains large text fields or BLOBs.

    One script that went through my hands at work would fetch all records and then use array_split() instead of a SQL "LIMIT". (The reason was, I found out, that the script was ported to MySQL from MS.SQL, where there is no "LIMIT" clause...)

  7. #7
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kmeister
    Why then do they use RecordSet and cache the results in memory?
    Our RecordSet interface really represents an abstract cursor as you would get from a database query. As such, only a single record is cached in memory at one time.

    Quote Originally Posted by kmeister
    One of the things that initially attracted me to the idea of using RecordSet was that you could run a query, get a record set, then play with the results of that record set (formatting, validation etc..) and pass it around without worrying about being coupled to the db still.
    We've developed a couple techniques for dealing with this on wact.

    Our generalized concept of a record, called a DataSpace has a built in filtering mechanism. So you can register a filter class with a DataSpace, say for output filtering, and the framework will call the filter at appropriate times to reformat values for output.

    Here is an example filter (from a real application) that adds a calculated field to a DataSpace:
    PHP Code:
    class CategoryFilter {

        function 
    doFilter(&$vars) {
            
    $CategoryId intval(@$vars['CategoryId']);
            if (empty(
    $CategoryId)) {
                
    $ArticleCount 0;
            } else {
                
    $ArticleCount DBC::getOneValue("
                    SELECT 
                        COUNT(*) 
                    FROM 
                        ArticleCategories 
                    WHERE 
                        CategoryId='
    $CategoryId'");
            }
            
    $vars['ArticleCount'] = $ArticleCount;
        }


    The code that uses it (to display a list):
    PHP Code:
    $Page =& new Template('/category/index.html');
    $List =& $Page->getChild('CategoryList');
    $List->registerDataSet(
        
    DBC::NewPagedRecordSet('SELECT * FROM Categories ORDER BY Name'
        
    $Page->getChild('pagenav'), 
        new 
    CategoryFilter()));
    $Page->display(); 
    This code loads a template/component definition. Then it creates a paged and filtered RecordSet and assigns it to the template to be displayed.

    (Note that there are more efficient ways to do this in WACT SQL wise.)

    The templates. /category/index.html:

    Code:
    <core:SET title="Category List">
    <core:WRAP file="/layout/mainlayout.html" placeholder="Content">
    <h1 align="center">Categories</h1>
    <DIV ALIGN="RIGHT">
    <menu:menu>
        <menu:command icon="/admin/images/category.gif" url="/admin/category/add.php">
        Add category
        </menu:command>
    </menu:menu>
    </DIV>
    <core:include file="list.html">
    mainlayout defines a standard header & footer CSS file, etc for the site. The SET title is used to pass a parameter to the layout file to set the <TITLE> tag for the page. <menu:menu> and <menu:command> are custom tags I'm using in my own code which hasn't yet become a part of WACT. The WACT template engine allows you do define your own tags.

    Here is /category/list.html:

    Code:
    <TABLE width="100%" cellpadding="5" ALIGN="CENTER" CLASS="RecordList">
    <TR CLASS="ColumnHeading">
        <TD>Category</TD>
        <TD>Articles</TD>
        <TD>&nbsp;</TD>
    </TR>
    <list:LIST id='CategoryList'>
        <list:ITEM>
            <core:include file="summary.html">
        </list:ITEM>
    <list:DEFAULT>
    <TR>
        <TD COLSPAN="3">No categories defined</TD>
    </TR>
    </list:DEFAULT>
    </list:LIST>
    </TABLE>
    <core:include file="/common/pager.html">
    Category lists are sometimes embedded at parts of other pages, so they have their own template.

    Here is /category/summary.html:
    Code:
    <TR>
      <TD>
        <a href="/admin/category/details.php?CategoryId={$CategoryId}">
            {$Name}
        </a>
      </TD>
      <TD>{$ArticleCount}</TD>
      <TD align="right" valign="top" width="10%" nowrap>
        <core:include file="menu.html">
      </TD>
    </TR>
    Category summaries are likewise sometimes used for a single record (not a part of a list), so they have their own template, too.

    And finally /common/pager.html:
    Code:
    <DIV ALIGN="RIGHT">
    Page: 
    <page:navigator id="pagenav" items="30">
        <page:first hideforcurrentpage>First</page:first> 
        <page:prev hideforcurrentpage>Prev</page:prev>
        <page:list>
            <page:number>
            <page:elipses> ... </page:elipses>
            <page:separator> </page:separator>
        </page:list>
        <page:next hideforcurrentpage>Next</page:next> 
        <page:last hideforcurrentpage>Last</page:last>
    </page:navigator>
    </DIV>
    All of the lists in the application use the same format for a pager.

    This might not be the best way to organize the templates. I've done some whitespace editing on the templates to make them easier to read on this forum (shorter lines).

    One thing that I am especially proud of in WACT is how the items attribute of <page:navigator> tag automatically propagates from the template to the LIMIT clause issued in the SELECT statement (generated by the NewPagedRecordSet call) items="30" means put 30 items on a page. You can set this either in the template, or in the code. I leave it up to the HTML/Template designer.

    Quote Originally Posted by kmeister
    Also, You are contributing to WACT, correct?
    I wish. No, really I do. I am very much impressed by simpletest. Actually, we are looking for more contributors. There is much work to do and many good opportunities to make an impact. The door is open.

  8. #8
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Oh, I forgot. Here is /category/menu.html:

    Code:
    <menu:menu>
    <menu:command icon="/admin/images/edit.gif" url="/admin/category/edit.php" param="CategoryId">
        Edit
    </menu:command>
    <menu:command icon="/admin/images/delete.gif" url="/admin/category/delete.php" param="CategoryId">
        Delete
    </menu:command>
    <menu:command icon="/admin/images/details.gif" url="/admin/category/details.php" param="CategoryId">
        Details
    </menu:command>
    </menu:menu>
    Each item in the category list has its own menu.

    I can't really show /layout/mainlayout.html because it is too big and and too specific to the site I am working on. (plus it includes even more files.)

  9. #9
    SitePoint Zealot Sork's Avatar
    Join Date
    Jul 2002
    Location
    Portugal
    Posts
    143
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    [offtopic]this is for shure off topic but i just want to say that i'm so impacient for the wact download release to implement my "fast ezpublish based system". Damn, im so drunk, and sorry for the off topic![/offtopic]

  10. #10
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is another technique in WACT. Harry created a DataSetDecorator class for extending existing DataSets. (in WACT, a RecordSet is a database oriented DataSet. A DataSet is a cursor of DataSpaces and a RecordSet is a cursor of Records. A RecordSet implements the DataSet interface and the Record implements the DataSpace interface. Jon has completed a DataSet implementation that uses arrays only.)

    From the same application, Articles can belong to more than one category. Thus, a list of articles is actually a nested list. For each article, there is a list of categories. The following class powers the inner category list based on the outer article list. (once again, this is not ideal SQL efficiency wise.)

    PHP Code:
    class ArticleCategoryDataSet extends DataSetDecorator {
        
        var 
    $CategorySource;
        
        function 
    ArticleCategoryDataSet(&$CategorySource) {
            
    $this->CategorySource =& $CategorySource;
        }
        
        function 
    prepare() {

            
    $Categories $this->CategorySource->get('CategoryId');
            
            if (!empty(
    $Categories)) {
                
    $sep '';
                
    $CategoryList '';
                foreach (
    $Categories as $Category) {
                    
    $CategoryList .= $sep DBC::makeLiteral($Category'integer');
                    
    $sep ',';
                }
                
    $this->dataset =& DBC::NewRecordSet("
                    SELECT 
                        Name 
                    FROM
                        Categories 
                    WHERE 
                        CategoryId IN (
    $CategoryList)");
            } else {
                require_once 
    WACT_FRAMEWORK_ROOT 'util/emptydataset.inc.php';
                
    $this->dataset =& new EmptyDataSet();
            }
            
    parent::prepare();
        }

    The registration code for the nested lists. (excerpt only):
    PHP Code:
    $List =& $Page->getChild('ArticleList');
    ...
    $CategoryList =& $Page->getChild('ArticleCategoryList');
    $CategoryList->registerDataSet(new ArticleCategoryDataSet($List)); 
    Because of this registration, the inner list resets everytime the outer changes. A classical master-detail display.

    The problem with it is that it issues one SQL statement per outer (article) record on the list instead of one master SQL statement. Its still a work in progress, but demonstrates filtering and adapting DataSets in WACT.

  11. #11
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Sork
    [offtopic]this is for shure off topic but i just want to say that i'm so impacient for the wact download release to implement my "fast ezpublish based system". Damn, im so drunk, and sorry for the off topic![/offtopic]
    We are shooting for a Dec 1 release date for a downloadable Public Alpha Preview Release version .0001.

  12. #12
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok. One last post. I apologize If I am too far off topic and too long winded.

    Quote Originally Posted by kmeister
    One of the things that initially attracted me to the idea of using RecordSet was that you could run a query, get a record set, then play with the results of that record set (formatting, validation etc..) and pass it around without worrying about being coupled to the db still.
    The previous examples showed formating. Here is an example for Validation. In WACT, the built in validator operates on the generic DataSpace interface.

    You instantiate a validator and assign rules to it which are used to check values in a DataSpace.

    Here is a custom validator for the the same application for the Category table:

    PHP Code:
    class CategoryValidator extends Validator {

        function 
    CategoryValidator() {
            require_once 
    APP_LIB_ROOT 'includes/duplicaterecordrule.inc.php';
            
    $this->addRule(new RequiredRule('Name'));
            
    $this->addRule(new SizeRangeRule('Name'63));
            
    $this->addRule(new DuplicateRecordRule('Categories''CategoryId'
                array(
    'Name'),
                
    'duplicate''DUPLICATE_CATEGORY'));
        }

    The addRule calls could be done an an instance of Validator. It is not necessary to define a class for this. Its just the way I happened to do it.

    I won't go through how the validator gets registered with the formController class (and how the validation errors get displayed in the form template) because that part will probably change in the next week or two.

    Here is the definition of the DuplicateRecordRule class. This is also something I have not yet checked in to WACT. (Not good enough yet.)

    PHP Code:
    class DuplicateRecordRule extends Rule {

        var 
    $IdField;
        var 
    $Table;
        var 
    $KeyFields;
        var 
    $Group;
        var 
    $Id;

        function 
    DuplicateRecordRule($Table$IdField$KeyFields
                
    $Group 'duplicate'$Id 'DUPLICATE') {
            
    $this->IdField $IdField;
            
    $this->Table $Table;
            
    $this->KeyFields $KeyFields;
            
    $this->Group $Group;
            
    $this->Id $Id;
        }

        function 
    validate(&$DataSpace, &$ErrorList) {
            
    $sql 'SELECT ' $this->IdField ' FROM ' $this->Table ' WHERE ';
            
    $sep '';
            foreach(
    $this->KeyFields as $field) {
                
    $sql .= $sep $field '=' .
                    
    DBC::MakeLiteral($DataSpace->get($field));
                
    $sep ' AND ';
            }
            
            
    $idValue $DataSpace->get($this->IdField);
            if (isset(
    $idValue)) {
                
    $sql .= 'AND ' $this->IdField ' <> ' 
                    
    DBC::MakeLiteral($DataSpace->get($this->IdField));
            }
            
    $dupcheck DBC::getOneValue($sql);
            if (!empty(
    $dupcheck)) {
                
    $ErrorList->addError($this->Group$this->Id$this->KeyFields);
                return 
    FALSE;
            }
            return 
    TRUE;
        }

    This validation rule can be used on both the edit and create stages because idfield is only defined during an edit, never during an add. The group 'duplicate' and id 'DUPLICATE_CATEGORY' are the error message codes. They are used to lookup error messages in a language-independent way.

    Also, some of you might be thinking, "This application has a pretty weak model layer." You would be right.

    There are also some duplicate code problems that I have to work on.

    sorry if I've strayed too far.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •