SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 28
  1. #1
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Rome
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    DB staging and deploying with full automatic flavour

    After having injected into my projects the good principles of agility and having fully started to integrate continously and come over the problems about auto-testing DB driven web applications, now I'm facing what I think is a bad smell...

    The database that stands aside of the application goes under the same long and endless pipeline of changes and refactoring. But while powerful tools like CVS or Subversion let the developers share their changes and even easily upgrade the staging and production servers, I don't find an easy way to keep my DBs up to date all across the team and all along the development process.

    I think it must not be an uncommon issue so I also think that there should be a full plethora of good practices to follow to get over this obstacle in software development.

    Do you know any?

    p.s. down to newbie level: what's the exact difference between development and staging server?

  2. #2
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi.

    This was briefly a hot topic, bt it seems to have waned, which is a shame. There was some work being done at Thoughtworks on database refactoring, and there was a book called agile databases or thereabouts by Scott Ambler.

    We have an unusal approach of creating an XSLT transformation for the data to port it on rollout. I wouldn't recommand this for a large amount of data though .

    Development server are the ones you and your team code on day to day. Staging server is the one you all upload a clean build to for a complete dress rehearsal of the rollout.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  3. #3
    SitePoint Guru BerislavLopac's Avatar
    Join Date
    Sep 2004
    Location
    Zagreb, Croatia
    Posts
    830
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have been thinking of an approach for database refactoring, especially when there are several developers and each can make some changes on the database.

    1. Each developer has a DB structure SQL script, maintained in plain text. It can be edited manually, or if the DB is modified via some tool (I recently discovered the excellent SQLManager family of products) it can be exported to text.

    2. Each developer has a standard data SQL script, which contains "general" data which have to be in the DB even if there is no other content (e.g. countries, languages, some common codes etc.). This data is usually determined in advance and rarely modified.

    3. Each developer has an SQL script containing "placeholder" data, which represents what will be in the database when the product is deployed. For a CMS, this would have article texts etc. Each developer may modify this as necessary.

    4. All three scripts are controlled by CVS, Subversion or another VCS, along with all other code.

    5. At deployment, an automated script (in Ant or Phing) uses those scripts to build the database from their final release versions. The "placeholder" values are replaced here by a special script which contains the initial production data.

    Haven't tried that approach in reality yet, though, so any comments would be most welcome.

  4. #4
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Rome
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I can understand and easily agree with the separation of invariant and variant data in different scripts. But I don't understand how it all copes with DB schema changes... could you explain better please?

    Marcus, if staging server is the one you all upload a clean build to for a complete dress rehearsal of the rollout, I have development server coincident with the staging one! Complete regression and integration testing ensure me to have a complete application in my hands to put on production.... am I wrong? Completely wrong? Not so wrong?

    Thanks both for answers, btw.

  5. #5
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Eeek. That sounds scary. Would you say that DataMapper would be pretty much essential? I'm trying to think how you could keep code changes to a minimum with a volatile schema.

  6. #6
    SitePoint Guru silver trophy Luke Redpath's Avatar
    Join Date
    Mar 2003
    Location
    London
    Posts
    794
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Perhaps you could look at the RubyOnRails migration functionality and write something similar for PHP? I believe it uses a table in the database to manage versioning of migrations.

    http://api.rubyonrails.com/classes/A...Migration.html

    Still quite a new feature to Rails so probably not "feature complete" as it were.

  7. #7
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Rome
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by McGruff
    Would you say that DataMapper would be pretty much essential? I'm trying to think how you could keep code changes to a minimum with a volatile schema.
    Volatile is not the word that comes to my mind when I'm just improving application design by refactoring and accordingly improving DB design.
    I don't understand. It seems you know what you're saying and I'm interested in. Could you please explain better?

  8. #8
    SitePoint Guru BerislavLopac's Avatar
    Join Date
    Sep 2004
    Location
    Zagreb, Croatia
    Posts
    830
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by jakuza
    I can understand and easily agree with the separation of invariant and variant data in different scripts. But I don't understand how it all copes with DB schema changes... could you explain better please?
    The script #1 above takes care of that aspect. The idea is not that the database and the program evolve constantly -- in my experience the most common case is that the database is defined and then, during development of the application, a lot of minor improvements are called for. This approach should work well with that, regardless of the database access strategy.

    That being said, some kind of Data Mapper or Active Record would be very helpful here so there is only one (or a few) place in the application code that should be changed to accomodate for the changes in the database.

  9. #9
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi.

    We have a code generated DAO system and the schema is in XML. On rollout a check is done for semantic changes and if there are some the rollout script demands a porting script in it's arguments, some sample data and some tests to check that data. Now the fun begins.

    We have two databases on live servers, one for the roll out tests, and one live. The rollout script does a full port of some sample data and runs the tests against it (along with the standard acceptance tests in the system). If all goes well, it then ports the live data (the only time a rollout leads to downtime). The port script can be SQL, PHP or XSLT or a combination (they are boring to write, so we like to have options). Eventually we are hoping to build a library of standard porting operations in PHP.

    So regarding coincident staging and live boxes, we have the same set-up.

    We also have a couple of test servers in the office. These are handy for the actual writing of the porting scripts as we can test them with live data.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  10. #10
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by jakuza
    Volatile is not the word that comes to my mind when I'm just improving application design by refactoring and accordingly improving DB design.
    By "volatile" I just meant "changes regularly". I hope that didn't have a negative edge.

    I haven't faced this kind of problem so yes I'm half asking a question and half making a suggestion. I thought that you might also want to look at the php design as part of the integration strategy, trying to isolate the code from the db schema as much as possible.

    There's a brief description of DataMapper here (the book is highly recommended).

  11. #11
    SitePoint Addict pachanga's Avatar
    Join Date
    Mar 2004
    Location
    Russia, Penza
    Posts
    265
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That's exactly the topic i was meaning to start myself and it's great to see there are folks who face the same problem. Keeping code in sync with db is the one of the most important tasks for me. I haven't yet found the general solution and only can share a set of practices we use:

    1) Each developer box has its own local db server where all databases of projects being developed reside isolated.

    2) Usually each project has at least 3 databases(so each developer has 3 local databases for each project): "live"(sorry can't find the better word) database, unit testing database and functional testing database.

    "Live" db is used for storing the actual db schema and data of the project. Someday "live" db is copied to the production server and it starts living its own life.

    Unit testing db is used for...er..unit tests only. And functional db is used for end-to-end testing. Why having 2 separate dbs for testing? Unit tests db usually contains more tables than functional one since we run not only project specific unit tests but also tests of the used libraries to make sure everything works.

    3) We store the following sql dumps under SVN for the project:

    - full sql dump with data of the "live" db. Storing "live" sql db dump under SVN is essential IMHO - i simply can't figure out the better way to restore fragile db data. Sometimes we store big immutable tables data as separate archived chunks of sql dumps. E.g. we had a project with constant "code" table which had more than 500k records and dumping it everytime was getting too tedious. We dumped code table into code.tgz, checked it in and never included this table into the whole db dump again.

    - unit testing db sql schema dumps, we usually don't need to store data for unit tests since this is what the test fixture is for.

    - simplified sql snapshot of the "live" db which is used for functional testing.

    4) It's the developer's responsibility to update/commit these sql dumps and inform the rest of the team about the changes. Each project has a number of very simple shell scripts which do most of the dirty job: make a copy of the local db, clean some tables(e.g. session, log), dump plain text sql into the working copy, etc.

    I'm still not sure when exactly these dumping utils should start, ideally as a client SVN hook before each commit. But firstly SVN doesn't seem to support client hooks and secondly some smart check is required to see if developer db actually requires dumping(i think it has something to do with semantic changes mentioned ny Marcus ...) Given these difficulties all dumping scripts are executed manually.

    Standard PHP porting operations library with clean OO interface, this is what we really miss, i would be really glad to hear more of your thoughts on this, Marcus

    5) The biggest problem is resolving conflicts with "live" databases. We resolve them by hand and i can't recommed you this or that solution - every case is unique . This problem is even getting worse because at some point of the project development we copy "live" project db onto the local shared db server. This shared db server is used by managers, content writers, etc for filling project database with real data. From this moment we sync this shared "live" db with a developer "live" db very carefully.

    When we have to change the db schema of the shared db server it becomes inaccessible for everyone but the developer who is skilled to make that sort of change(we use SQLYog for detecting schema differences, however it doesn't scale very well and a tip for a better tool is highly appreciated). Once the shared db is changed the same developer is obliged to dump new "live" sql to his working copy and commit it.

    At the same time keeping tests databases in sync among all devs is no problem since no real data is stored there and we can safely change/commit their structure provided all tests pass.

    6) We don't use staging db for a complete dress rehearsal of the rollout on the production server yet(which is a bad thing). The developer's who is responsible for the rollout must run all tests on his local db server and decide if the rollout can be made.

    Here's a db layout we use for each project:
    Code:
    dev.box-a:
     - "live" db
     - unit tests db
     - functional tests db
    
    dev-box-b:
     - "live" db
     - unit tests db
     - functional tests db
    
    ...
    
    shared-dev-db-server:
     - "live" db
    
    production-db-server:
     - production db
    As you can see these practices may work pretty well for a small team with strong internal communication among developers. All our devs are in the same office and communication is not a problem however i'm not sure if it all will work for a bigger team...

    I'll post some more thoughts on abstracting SQL dumps and storing them as XML a bit later.
    Last edited by pachanga; Sep 27, 2005 at 22:47.

  12. #12
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Changing the structure of a database is normally done in a subset of SQL called Data Definition Language (DDL). Even with high-level utilities such as phpmyadmin, the point-and-click is translated to underlying statements.

    If at each time a developer manipulates the database in this way, she would log all queries involved, then she could save the changes to a textfile (script), and include this in commit. The difference between doing this and simply taking a dump is that this script could be used on a live database with data in it, without loosing this data. (At least to some extend)

    Am I overly simplifying things ?

  13. #13
    SitePoint Addict pachanga's Avatar
    Join Date
    Mar 2004
    Location
    Russia, Penza
    Posts
    265
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken
    Changing the structure of a database is normally done in a subset of SQL called Data Definition Language (DDL). Even with high-level utilities such as phpmyadmin, the point-and-click is translated to underlying statements.

    If at each time a developer manipulates the database in this way, she would log all queries involved, then she could save the changes to a textfile (script), and include this in commit. The difference between doing this and simply taking a dump is that this script could be used on a live database with data in it, without loosing this data. (At least to some extend)
    Yep, this is what we exactly do for resolving db conflicts. However resolving this stuff manually is very tedious and error prone. Still i don't know any better way of doing that I remember Selkirk mentioned briefly about SQL structure changing utility, i hope he will shed some more light on this

  14. #14
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by pachanga
    However resolving this stuff manually is very tedious and error prone.
    Wouldn't it for a start be possible to hack phpmyadmin to log all queries to a file ?
    The biggest problem with this method seems to be a lack of rollback, but I can't see any way around this except a full dump.

  15. #15
    SitePoint Addict pachanga's Avatar
    Join Date
    Mar 2004
    Location
    Russia, Penza
    Posts
    265
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken
    Wouldn't it for a start be possible to hack phpmyadmin to log all queries to a file ?
    I'm not sure if i'd like to hack phpmyadmin internals, i'd rather use a small highly specialized OO library for that(Marcus?) or try writing it myself....

  16. #16
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Rome
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Pachanga, my situation is exactly the same as yours. Not so good, I agree with you...
    kyberfabrikken your solution is nicer schema-safety-wise but it seems letting too much power in the hands of developers, that must be read as _less_ power... Tedious means errors...

    So we have a general yet unsolved issue here? I can't believe it...
    Let's count 2-3 days, then we'll start raising people to solve this stuff altogether once for all

  17. #17
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi.

    Regarding some porting classes, this is very much the direction we want to go in and we have deliberately built our infrastructure with this goal in mind. Too many years spent porting means I need a better way.

    We haven't started it yet . However it's scheduled for the iteration after this one...

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  18. #18
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Rome
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Do you need help? Are going to release this package? Are you going to make it work for several DBMS?

    Last question, OT: are you going to be at PHP London on November 3rd?

  19. #19
    SitePoint Enthusiast siteartwork's Avatar
    Join Date
    Jan 2005
    Location
    Germany
    Posts
    67
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Not sure if this could be from interest from you, but MySQLDiff provides a tool for checking differences in the database-structure between 2 dbs...

  20. #20
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by siteartwork
    Not sure if this could be from interest from you, but MySQLDiff provides a tool for checking differences in the database-structure between 2 dbs...
    Nice. Have you used it ? Any idea how complete it is ? Can it be run from the commandline ?

  21. #21
    SitePoint Enthusiast siteartwork's Avatar
    Join Date
    Jan 2005
    Location
    Germany
    Posts
    67
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken
    Nice. Have you used it ? Any idea how complete it is ? Can it be run from the commandline ?
    I had used it on a commercial project last month where we had similiar problems as described here... one dev-db, one live-db, much changes where made to the dev and it was hard to keep up with them.

    Well, I cannot state how complete it is, since we used it for appr. 10 minutes: install, configure, take a look at the diff and alter the tables on the live-db.

    It worked fine so far.

  22. #22
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by jakuza
    Do you need help? Are going to release this package? Are you going to make it work for several DBMS?
    It's part of the "Changes" project. Once we are confident in the end to end system, we will spin the project off as open source. progress is sporadic though as other issues arise.

    Quote Originally Posted by jakuza
    Last question, OT: are you going to be at PHP London on November 3rd?
    Yes. I am a bit of a regular . Rasmus Lerdorf is coming to the upcoming meeting...
    http://www.phplondon.org/wiki/Meeting2005October6

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  23. #23
    SitePoint Addict pachanga's Avatar
    Join Date
    Mar 2004
    Location
    Russia, Penza
    Posts
    265
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I haven't looked into mysqldiff very close but something tells me it's not the right tool As i said earlier i'd be glad to have a highly specialized OO library for this task, please have a look at the following imaginary script using such kind of a library.

    PHP Code:
    <?php

    $local_db 
    = new DSN('mysq://root:secret@localhost/dev');
    $shared_db = new DSN('mysq://root:secret@shared.db/dev');

    $c = new MySQLDbComparator($local_db$shared_db);
    if(
    $c->isIdentical())
    {
        echo 
    'Databases are identical';
        exit(
    0);
    }
        
    if(
    $c->isSchemaConflict())
    {   
       
    //here we check if shared db schema needs to be updated:   
       
    if($diff $c->getLeftSchemaDiff())
       {           
            
    //it's almost 100% safe to apply changes if we simply 
            //modified indexes, added new tables or fields,
            //the most dangerous changes concern table and fields drops, 
            //Below we check if changes are critical and require manual 
            //control otherwise we backup the shared db and apply diff to it 
            //automatically
            
            
    if($diff->isDangerous())
            {
                echo 
    'Dangerous changes in new schema:'
                echo 
    $diff->toString();
                exit(
    1);
            }
            
            
    $dumper = new MySQLDumper($shared_db);        
            
    $dumper->dumpAll('/path/to/shared/db/backup/sql');
            
            echo 
    'Applying schema changes'
            echo 
    $diff->toString();
            
    $diff->applyTo($shared_db);
       }
    }

    if(
    $c->isDataConflict())
    {
        
    //and here we check if local db data needs to be updated:   
        
    if($diff $c->getRightDataDiff())
        {
            
    //we believe that data changes on the shared server can
            //be safely applied to the local server...
            
            
    $dumper = new MySQLDumper($local_db);      
            
            
    $dumper->dumpAll('/path/to/local/db/backup/sql');

            echo 
    'Applying data changes'
            echo 
    $diff->toString();
            
    $diff->applyTo($local_db);        
        }
    }

    exit(
    0);
    ?>
    The following list of classes would be very nice to see in such a library:

    1) A whole bunch of db meta data classes: DbInfo, TableInfo, FieldInfo, IndexInfo. I think we could use and extend them from Propel(if Hans doesn't mind ).

    2) DbDumper which is capable of dumping concrete db into sql using different settings(dump only data, dump only schema, dump only certain tables, etc). It may also be capable of loading dumps into the database but that can be another class as well.

    3) DbComparator which would accept 2 db meta data objects in the constructor and analyze and return differences between them(DbSchemaDiff and DbDataDiff).

    4) DbSchemaDiff, DbDataDiff - these two would encapsulate actual changes between 2 databases and would be able to apply the diff to the "right" or "left" part.

    5) DbSandBox - a wrapper around some db object which creates a temporary physical copy of the database transparently to the user. Upon its destruction the temporary db gets dropped silently. It can be very useful if one has to change some db data before, say, dumping it without messing up with the original db.

    Please feel free to post any kind of feedback on this pre-alpha list of classes!
    Last edited by pachanga; Sep 29, 2005 at 22:44.

  24. #24
    simple tester McGruff's Avatar
    Join Date
    Sep 2003
    Location
    Glasgow
    Posts
    1,690
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There's a zip file of some classes I use to do at least some of these things in this thread. No diffing though.

  25. #25
    SitePoint Guru
    Join Date
    Nov 2002
    Posts
    841
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    We used to check in to source code control the structures of all the tables in our application, each a separate file. For some tables, we also checked in some data. The installer builder could compare the SCC version of this information against a developer's personal test environment. It would then generate a update script that represented the differences. The programmer could edit the script to do any special kinds of setup up that was needed.

    The change script was checked in. An automated script would check out the structures and data, apply the update script, and then check back in the new structures and data.

    The whole thing was coordinated with tags in SCC.

    When a new version was released, all of the update scripts were collected together and packaged into a single install script. (The installer could consolidate operations)

    Tests were done on the installer with copies of production systems to make sure that nothing would break, then the installation CDs would be sent out and applied at individual locations.

    I yearn for some tools to do this in PHP.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •