I have two database tables which track documents and the document release dates. The ‘document’ table only contains the current version of each document. The ‘document_replaced’ table only contains previous versions of the documents.
When querying the database we always specify a project_id and a document_category.
I need to create a list of document release dates which takes into account the following rules/attributes:
- Each date should only appear once in the list UNLESS a specific document was updated two or more times on the same day.
- The first version isn’t always ‘1’ - it can be any number.
- Documents that haven’t been updated will have an entry in the ‘documents’ table only.
Here is some sample data from the two database tables (document and document_replaced):
SELECT document_id,document_version,DATE(document_date),document_project,document_category FROM document WHERE document_project = 242 AND document_category=‘Architectural’ order by document_id;
±------------±-----------------±--------------------±-----------------±------------------+
| document_id | document_version | DATE(document_date) | document_project | document_category |
±------------±-----------------±--------------------±-----------------±------------------+
| 12503 | 7 | 2016-04-11 | 242 | Category_1 |
| 12504 | 2 | 2016-04-07 | 242 | Category_1 |
| 12505 | 2 | 2016-04-07 | 242 | Category_1 |
| 12506 | 3 | 2016-04-07 | 242 | Category_1 |
| 12507 | 4 | 2016-04-07 | 242 | Category_1 |
| 12508 | 2 | 2016-04-07 | 242 | Category_1 |
| 12509 | 1 | 2016-04-11 | 242 | Category_1 |
±------------±-----------------±--------------------±-----------------±---------------+
SELECT parent_doc,document_version,DATE(document_date),parent_project,document_category FROM document_replaced WHERE parent_project = 242 AND document_category=‘Architectural’ order by parent_doc;
±-----------±-----------------±--------------------±---------------±------------------+
| parent_doc | document_version | DATE(document_date) | parent_project | document_category |
±-----------±-----------------±--------------------±---------------±------------------+
| 12503 | 1 | 2016-04-03 | 242 | Category_1 |
| 12503 | 2 | 2016-04-07 | 242 | Category_1 |
| 12503 | 3 | 2016-04-07 | 242 | Category_1 |
| 12503 | 4 | 2016-04-07 | 242 | Category_1 |
| 12503 | 5 | 2016-04-07 | 242 | Category_1 |
| 12503 | 6 | 2016-04-07 | 242 | Category_1 |
| 12504 | 1 | 2016-04-07 | 242 | Category_1 |
| 12508 | 1 | 2016-04-07 | 242 | Category_1 |
±-----------±-----------------±--------------------±---------------±---------------+
The ‘parent_doc’ field in the ‘document_replaced’ table corresponds with the ‘document_id’ field in the ‘document’ table. When a document is replaced we copy the field values over from the ‘document’ table and set the ‘parent_doc’ field in the document_replaced table to the document_id of the document that is being replaced.
It might be easier to understand what I’m looking for if I include a crude visualisation of how the final data is displayed. Document names on the left, revision dates across the top and corresponding version numbers in the grid:
3-Apr-16 7-Apr-16 7-Apr-16 7-Apr-16 7-Apr-16 7-Apr-16 11-Apr-16
Document 12503 1 2 3 4 5 6 7
Document 12504 1 2
Document 12505 2
Document 12506 3
Document 12507 4
Document 12508 1 2
Document 12509 1
I’m not even sure if what I need can be done with a single query but I’ve tried doing it with a couple of queries, some PHP code and an array and it’s doing my head in. I’m looking to get that list of dates but if a single query can gather all of the related data then that would be a bonus.
Gary