Select latest revisions

Hi guys,

I designed a pair of tables thinking that I had a great idea, and now I’m not so sure that it was such a good idea…

What I have, is a parent table, that has details about a particular document. There is then a child table, that has many parts (sections) to the parent document. When a section is updated its revision number increases by one, as does the revision of the parent. Sections that have not been updated will remain the same however.

So, I create a new document, it will have a revision of 1 and so will all of the sections. When I update a section however, the document becomes rev 2 and the updated section becomes rev 2 (rev 1 remains in the table for history) but all the others remain rev 1.

The way I thought about it was something like this:

Select all sections for document X, that are at revision Y or below, but only get the latest revision of each section

Made sense at the time, but now I’m writing the query to get the data out I’m struggling. My first thought was to use GROUP BY, but there’s no guarantee that it’ll get me the revision I want is there, even with an ORDER BY?

Would anyone be able to help nudge me in the right direction?

Cheers

OK, I tried that query and it works fine while everything is on rev 1, but when I update one of the sections (taking it and the parent policy to rev 2, but the other section remains at rev1) the query no longer works correctly. All I see now is the one section at rev 2. The PKs in the policies_sections are:

1,1 (Section 1, revision 1)
2,1 (Section 2, revision 1)
2,2 (Section 2, revision 2)

I’m playing with it to see what I can do with it, but I haven’t found a solution yet…

Afraid I’m not at work now so can’t get the exact names, but essentially you’re looking at something like:

Table: policies
Fields: pol_id, pol_rev, create_date, update_date

Table: policies_sections
Fields: sec_id, pol_id, pol_rev. create_date

pol_id is the common field as I’m sure you can see. sec_id is unique to the section and the two, pol_id and sec_id are the PK.

If pol_id 1 is at pol_rev 3, then I need to find all sec_id where pol_id = 1 but only get the highest pol_rev per sec_id, where the pol_rev in each section is 3 or less. If there’s three entries for sec_id 2. with pol_rev 1, 2 and 3, I only need number 3. However, if there’s an entry for sec_id at 1 and 2, I need the one from 2 only.

Does that make any sense?

Right, I’ve switched it to use a lookup table. It’s quicker and easier. I should’ve just done it that way to start. It’s now working exactly as I want it to.

Thanks

Ah, interesting, thanks. Will give that a try

SELECT policies.pol_id
     , policies.pol_rev
     , policies.create_date AS pol_create_date
     , policies.update_date
     , policies_sections.sec_id
     , policies_sections.pol_rev AS latest_sec_rev
     , policies_sections.create_date AS sec_create_date
  FROM policies
INNER
  JOIN ( SELECT pol_id
              , MAX(pol_rev) AS latest_sec_rev
           FROM policies_sections
          WHERE pol_rev <= [COLOR="red"]Y[/COLOR]
         GROUP
             BY pol_id ) AS s
    ON s.pol_id = policies.pol_id
INNER
  JOIN policies_sections
    ON policies_sections.pol_id = s.pol_id
   AND policies_sections.pol_rev = s.latest_sec_rev
 WHERE policies.pol_id =[COLOR="Red"] X[/COLOR]
SELECT policies.pol_id
     , policies.pol_rev
     , policies.created_date AS pol_create_date
     , policies.updated_date
     , policies_entries.polent_id
     , policies_entries.pol_rev AS latest_sec_rev
     , policies_entries.created_date AS sec_create_date
  FROM policies
INNER
  JOIN ( SELECT pol_id
              , MAX(pol_rev) AS latest_sec_rev
           FROM policies_entries
          WHERE pol_rev <= [COLOR="Red"]2[/COLOR]
         GROUP
             BY pol_id ) AS s
    ON s.pol_id = policies.pol_id
INNER
  JOIN policies_entries
    ON policies_entries.pol_id = s.pol_id
   AND policies_entries.pol_rev = s.latest_sec_rev
 WHERE policies.pol_id = [COLOR="red"]1[/COLOR]

after changing column names in the query, here’s what i get…


pol_id pol_rev pol_create_date     updated_date        polent_id  latest_sec_rev  sec_create_date
   1      2    2010-09-20 12:14:59 2010-09-20 12:14:59    2          2            2010-09-20 13:20:34

i don’t see any problem here, do you?

Yes, because the entry at polent_id 1 is still valid in rev 2, it is just the same. It hasn’t been removed. I need to get all entries that apply to policy 1, but I only need to get the latest revision of each. Does that make sense? Like I say, I think it’s a nice idea badly implemented

I’m wondering now if I should just accept defeat and change the table structure and rewrite the existing PHP. I think it was a nice idea badly implemented. Maybe it would make more sense to do it with a table of policies, a table of the individual sections/entries, and then a lookup table which listed which entries belong to which policies and in which order?

How’s this? (realise some table/field names are different, sorry)

CREATE TABLE `policies` (
 `pol_id` int(10) unsigned NOT NULL auto_increment COMMENT 'Policy or procedure ID',
 `pol_ref` varchar(255) NOT NULL COMMENT 'Existing reference of the policy',
 `pol_rev` tinyint(3) unsigned NOT NULL COMMENT 'The latest revision of this policy',
 `created_date` datetime NOT NULL COMMENT 'Datetime that this policy was created',
 `updated_date` datetime NOT NULL COMMENT 'Datetime that this policy was last updated',
 `creator_id` int(10) NOT NULL COMMENT 'AUID of the policy author',
 `updator_id` int(10) NOT NULL COMMENT 'AUID of the last person to update the policy',
 `pol_title` varchar(255) NOT NULL COMMENT 'The title of this policy',
 `mandatory` enum('Y','N') NOT NULL default 'N' COMMENT 'Flag to determine whether this is a policy or procedure, by the nature of procedures',
 `pol_active` enum('Y','N') NOT NULL default 'N' COMMENT 'Flag to determine whether this policy is active or not',
 PRIMARY KEY  (`pol_id`),
 UNIQUE KEY `pol_ref` (`pol_ref`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COMMENT='Company policies and procedures'

CREATE TABLE `policies_entries` (
 `polent_id` int(10) unsigned NOT NULL COMMENT 'Policy entry ID',
 `pol_id` int(10) unsigned NOT NULL COMMENT 'Parent policy',
 `pol_rev` tinyint(3) unsigned NOT NULL COMMENT 'The earliest revision that this entry relates to',
 `parent` varchar(50) default NULL COMMENT 'What section does this belong to, ie shall it be section 1, section 1.1 etc',
 `created_date` datetime NOT NULL COMMENT 'Date/time that this entry was made',
 `creator_id` int(10) NOT NULL COMMENT 'AUID of the author',
 `polent_title` varchar(255) NOT NULL COMMENT 'Heading for this policy',
 `polent_body` text NOT NULL COMMENT 'Plain text (Most HTML removed, simple formatting allowed) for this entry',
 `polent_enabled` enum('Y','N') NOT NULL default 'Y' COMMENT 'Should this section be displayed?',
 PRIMARY KEY  (`polent_id`,`pol_rev`),
 KEY `pol_id` (`pol_id`),
 KEY `polent_enabled` (`polent_enabled`),
 CONSTRAINT `policies_entries_ibfk_1` FOREIGN KEY (`pol_id`) REFERENCES `policies` (`pol_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Individual entries (and history) making up the full policies'

INSERT INTO `policies` (`pol_id`, `pol_ref`, `pol_rev`, `created_date`, `updated_date`, `creator_id`, `updator_id`, `pol_title`, `mandatory`, `pol_active`) VALUES
(1, 'POL/001', 2, '2010-09-20 12:14:59', '2010-09-20 12:14:59', 115, 115, 'Security Policy', 'Y', 'Y');

INSERT INTO `policies_entries` (`polent_id`, `pol_id`, `pol_rev`, `parent`, `created_date`, `creator_id`, `polent_title`, `polent_body`, `polent_enabled`) VALUES
(1, 1, 1, NULL, '2010-09-20 12:14:59', 115, 'Introduction', '(the Company) take the security of customer information seriously. To demonstrate its commitment to the control of sensitive customer information...', 'Y'),
(2, 1, 1, NULL, '2010-09-20 12:14:59', 115, 'Scope', 'Using the ... as a framework, the company will:', 'Y'),
(2, 1, 2, NULL, '2010-09-20 13:20:34', 115, 'Scope', 'Using ... as a framework, the company will:', 'Y');

If you get policies row 1 (the only one at the mo), you should get from policies_entries rows 1 and 3 from that data, because polent_id 1 has only the one revision, and polent_id 2 has two revisions, and we only want the latter. Does that help?

perhaps you could give us a test case to work with?

Sure, just a sec and I’ll get you some data. Cheers

do you want the the latest revision of each section, or the latest revision that isn’t greater than Y?

it would also help if you shared your table and column names