Path Enumeration: how to retrieve whole tree in right order

I am storing a tree-like structure of nested categories using the Path Enumeration method as described here: http://www.slideshare.net/billkarwin/models-for-hierarchical-data and I find it useful for easily accessing the path to a category for breadcrumbs, which I will use on my site. I am mixing it with the Adjacency List method to ensure some foreign key constraints for data integrity and for DELETE CASCADE of subcategories. Here is my cat table sample data:


+--------+-----------+--------+-------------+
| cat_id | parent_id | path   | name        |
+--------+-----------+--------+-------------+
|      1 |      NULL | 1/     | Cameras     |
|      2 |         1 | 1/2/   | Canon       |
|      3 |         1 | 1/3/   | Nikon       |
|      4 |      NULL | 4/     | Cards       |
|      5 |         4 | 4/5/   | SmartMedia  |
|      6 |         5 | 4/5/6/ | Original    |
|      7 |         5 | 4/5/7/ | Replacement |
+--------+-----------+--------+-------------+

My problem is: how do I retrieve the whole tree in the order as seen above, preferably using 1 query? The categories will be sorted by name but more importantly I want to get the first top category first, then subitems within this category, then subitems within any existing subitems, etc. - in the exact order like you would see an expanded tree of items.

For adjacency list (using parent_id) I used to use multiple queries, first selecting top level items and then all subitems for each nested level in a loop. Now I’d like to use fewer queries if possible. Is this possible with the Path Enumeration method?

SELECT *
  FROM cat
ORDER
    BY path

Thanks, but I’m afraid this isn’t going to work. True, this will work in the case of the sample data I provided but bear in mind that the numbers in path are id’s - in this example they are sequential numbers but they may not be, they may very well be random numbers (or even natural keys made from strings). Also, it’s only by accident that in my example items further in alphabetical order have higher id’s - again, this is not guaranteed at all.

i still think sorting on path is exactly what you want

:slight_smile:

Maybe it is what I want but don’t know it yet! Can you explain how I would deal with the problems that can arise as I explained in my previous post? Rearrange sequentially all id’s whenever an update is made to the table? A bit convoluted but I can’t see any other way.

whenever an update is made to the table, you must also update the path column

okay, example

let’s say that you delete id 3 and replace it with id 11, also under 1 like 3 was

so then your paths will sort like this –

1/
1/11/
1/2/

correct?

that’s still 100% accurate – both 2 and 11 follow 1 in the sort order

I already do that and I think I have this part figured out. But I already can see one shortcoming in your example:

okay, example

let’s say that you delete id 3 and replace it with id 11, also under 1 like 3 was

so then your paths will sort like this –

1/
1/11/
1/2/

correct?

that’s still 100% accurate – both 2 and 11 follow 1 in the sort order

Well, it’s not 100% accurate because the last two rows switched their positions. Before the replacement the order was:


+--------+-----------+--------+-------------+
| cat_id | parent_id | path   | name        |
+--------+-----------+--------+-------------+
|      1 |      NULL | 1/     | Cameras     |
|      2 |         1 | 1/2/   | Canon       |
|      3 |         1 | 1/3/   | Nikon       |

After replacing 3 with 11:


+--------+-----------+--------+-------------+
| cat_id | parent_id | path   | name        |
+--------+-----------+--------+-------------+
|      1 |      NULL | 1/     | Cameras     |
|     11 |         1 | 1/11/  | Nikon       |
|      2 |         1 | 1/2/   | Canon       |

This could be solved with 0-padded fixed length numbers but again - it may happen that 2 gets replaces by 22 and then even padding won’t help. And imagine what happens if 1 gets replaced with 20 - the whole top level nodes switch positions. So on every update to the table, apart from updating paths I’d have to first reorder id’s sequentially and make them 0-padded fixed length.

Alternatively, I could create a separate column sort_order that would be updated whenever a change is made and then I don’t have to worry about renumbering ids and padding.

So I kind of see solutions but would like to choose the simplest one. Yours seems the simplest but unfortunately I can’t see how sorting only by path can work here.

but not their place in the hierarchy – they’re both still siblings

the path enumeration remains accurate as far as the hierarchy is concerned, right?

you might as well do away with the path enumeration column then, since you would have to apply the sortorder column at each level, i.e. you’d have to break apart the path enumeration strings at each slash to insert the sortorder column at that point in the ORDER BY – yikes, look what the cat dragged in!!

Yes, the path enumeration remains accurate but my main issue is not preserving accurate path enumeration but accurate sorting - and since you offered ORDER BY path then changing id’s breaks sorting :). I need to have both enumeration and sorting accurate!

you might as well do away with the path enumeration column then, since you would have to apply the sortorder column at each level, i.e. you’d have to break apart the path enumeration strings at each slash to insert the sortorder column at that point in the ORDER BY – yikes, look what the cat dragged in!!

Okay, then this is what seems sensible to me:


SELECT * FROM cat ORDER BY sort;

+--------+-----------+--------+-------------+------+
| cat_id | parent_id | path   | name        | sort |
+--------+-----------+--------+-------------+------+
|      1 |      NULL | 1/     | Cameras     |    1 |
|      2 |         1 | 1/2/   | Canon       |    2 |
|      3 |         1 | 1/3/   | Nikon       |    3 |
|      4 |      NULL | 4/     | Cards       |    4 |
|      5 |         4 | 4/5/   | SmartMedia  |    5 |
|      6 |         5 | 4/5/6/ | Original    |    6 |
|      7 |         5 | 4/5/7/ | Replacement |    7 |
+--------+-----------+--------+-------------+------+

As you said I probably won’t need the path column here, when updating the sort column I can use parent_id to traverse the whole tree in a loop with multiple queries but after that I can simply order by sort whether I select the whole tree or just one branch.

But anyway I want to have the path column just for the sake of easily accessing the full path to a category.

I don’t know what you meant by the cat dragged in - you seem to have suggested something more complicated if I understood you correctly. I don’t think I have dragged in any wild cat with the simple sort column above because I don’t want to break apart path strings, either :slight_smile:

“look what the cat dragged in” is an expression meant to indicate that breaking apart the path enumeration is ugly and messy

yes, you will avoid the sort-column-value-within-siblings by using a sort-column-value-for-the-entire-table, but this means you might have to update large portions of the table for a single row change

good luck with your update algorithm :wink:

I don’t mind updating even all the rows for a single row change since there will not be many categories and the updates will be rare.

Anyway, thanks for the discussion!