Has anyone successfully implemented the model outlined in this article in MySQL? The first four operations -- add node, delete node, move node, add link -- are pretty straightforward but the last "delete link" operation did me in (MySQL 4.1.2).
| SitePoint Sponsor |
Has anyone successfully implemented the model outlined in this article in MySQL? The first four operations -- add node, delete node, move node, add link -- are pretty straightforward but the last "delete link" operation did me in (MySQL 4.1.2).



No.Originally Posted by i_m_making_a_cms
I stumbled on that page a while back, and the early parts convinced me that the author was pretty good at using unicode glyphs, but not so hot at database work.
If you're implementing trees in a RDBMS, you need to make some hard estimations about how many nodes you'll have, and how deep the tree will be -- will it be wide or thin, short or tall? Once you have this in hand, generate some sample data: just randomly create a tree with a hundred, or a hundred thousand, or a hundred million nodes (honest, it's not that hard to write the code). Then test your algorithm.
Assuming you're not writing small trees (in which case who cares?) you'll almost immediately fall into issues of indexing and statement optimization. Some of Vadim Tropashko's articles on the subject go into the details of query analysis for Oracle (his employer). His nested intervals approach has considerably better performance for tree reordering operations.
Tropashko points out, correctly, that nested sets is only really useful for small trees or static large trees-- where little or no reorganization will take place-- because of the high reorg cost. MySQL has weighed in on the subject, with a wanton disregard for performance leading them to suggest a three-way self-join. Boy, I hope their optimizer is up to the task: 1M records x 1M x 1M = 1,000,000,000,000,000,000 combinations. Life is so much easier when the sample size is small.
(Tropashko goes on to author this article on using Farey Fractions (honest: I can't make this stuff up) to get equivalent performance without the slow reorganization behavior.)
Next, give some thought to your operation mix: will you be reorganizing frequently? Is reorg performance critical, even if it is uncommon? After all, adjacency lists and materialized path schemes (which is essentially the theme of the article you cite) are murder for path and subtree queries with large datasets, but they perform quite well on reorganization operations.
OTOH, if you genuinely need a DAG, you're really in a world of hurt. I very much doubt that you'll get any joy from a single-table solution. Consider Amazon's categorization behavior, versus Yahoo's.
The Yahoo categories are pretty obviously either implemented as filesystem elements, or implemented using some sort of filesystem emulation: the "Links" are blatant: @Link-to-somewhere-else even looks like the output of 'ls'.
Amazon, however, lists all the paths through all the categories that a book appears in. Looking up, for a randomexample, "Joe Celko's Trees and Hierarchies in SQL for Smarties" gives these categories:
(man I hope the formatter doesn't destroy that...)
- Subjects > Computers & Internet > Databases > Specific Databases > SQL > General
- Subjects > Computers & Internet > Databases > Database Management Systems
- E-Books & E-Docs > Adobe > Computers & Internet
- E-Books & E-Docs > Business & Investing > Management
- E-Books & E-Docs > Adobe > Business & Investing > Management
- E-Books & E-Docs > Computers & Internet > General
- E-Books & E-Docs > Computers & Internet > Programming
Anyhow, if I were a gambling man I'd bet cash money that the categories tree doesn't change much -- a book may move from one place to another, but the basic categories probably don't get changed that much.
Likewise, I bet the "shape" of the Sitepoint forums tree doesn't change too much, although articles are constantly being moved from one place to another (if you're writing a cms, as your handle suggests, you can see where I'm going with this).
So maybe it's a clever idea to manage two tables: one for the hierarchy, and another one for the leaf nodes. And of course there's a third table to link them:
Code:SELECT book.title, category.path FROM book, link, category WHERE book.title LIKE %joe%celko% AND book.id = link.fk_bookid AND category.id = link.fk_categoryid
In this scenario, of course, deleting links is trivial.
=Austin
Hi Austin,
First of all, thanks for your informative response. I am indeed making (at least planning to make) a cms. I plan to organize site objects in a hierarchical format in the same vein as Zope and Limb CMS. Since the hierarchy will map to the URL path info, paths generally will not exceed 6 levels. Furthermore, I anticipate only the deepest nodes will be fat. Needless to say, queries will far outnumber updates so node retrieval is key. Updates are secondary but must be tolerably quick since they'll show up in the frontend from time to time.
I've looked into the main tree/relational mapping models and came to the following conclusion:
1. Adjacency - ugh, recursion
2. Nested Sets - cms requires too many updates
3. Nested Intervals - (apparently) suited for RDBMSes that support stored procedures, triggers, etc. Frankly, I'm a bit relieved because I have no hope of understanding this model anyway.
4. Materialized Path - probably the best route for me to take. It's denormalized however (even worse for DAGs) and I'm suspicious of indices that rely on string parsing.
That's why I thought I was on to something when I found the article that I cited in the OP. The model looks like a normalized variant of the materialized path, as you mentioned. A glance at the model suggests that path and subtree queries would be quick, even with a million+ nodes; they're simply selects against id or ancestor_id afterall).
I'm not quite sure where I'm going with this post, but having elaborated a bit on my requirements I was wondering if you could offer any suggestions or advice. You certainly seem to know a lot more about this than me.
And perhaps I'm just slow but could you please have a quick look at that last "delete link" query in that article and explain to me why there are two link tables (ActorLink, DataLink)? The previous queries were fairly straightforward mainly because the tables and columns actually map to the tables in the diagram, but that last query, specifically the second derived table, has really manhandled me. My tables look something like this:
data
--------------------
id | parent_id
path
--------------------
id | ancestor_id
link
--------------------
from_id | to_id
Thanks again.
I've hit this problem time and time again, and so far the one which has caused me the least hassle has been as Austin_Hastings suggested, and that's storing the structure in another table.
The table structure is simple.
categories( id, category_name, display_order );
categories_tree ( parent_id, child_id );
Same question here without actually going through all the elements of the tree with a recursive function ??Originally Posted by [ArcanE]
Any one with good math
Geza


So in reading this whole thread and thinking about the benefits of both the adjacency list model and the modified preorder tree traversal model I'm thinking that the (of course depending on your situation) the adjacency list model might be the way to go for creating menus.
Let me explain. First it is really slow with recursion creating the menus. Why not create the menu once seralize it and throw it into another table, basically caching it. This way if you get a lot of requests for the menu it's one database call and unserialize and poof. Fast right?
Then there are the updates, inserts, deletes. In an adjacency model those are easy, from what I've heard (well a lot easier than the modified preorder tree tracersal). So after every update, insert, delete just recache and serialize the tree again.
I'm assuming that your menu is going to get rendered a bunch of times and not updated all that much. If that's the case why not go this route? hm.


if you are thinking of a hierarchy for the navigation menu of a web site, you're not going to go more than 3, maybe 4, surely not 5 or 6 levels deep, right? (if you do, either you have a huge site, or maybe you have an information architecture problem)
in that case, you do not need recursion, just use left outer self-joins
see http://sqllessons.com/categories.html
I have the following structure:
My table structure is 'category_id, name, lft, rgt'.PHP Code:- TB
- cat a
- cat b
- sub-cat b
- sub-cat b
- sub-cat b
- cat c
The category_id is linked in with my products table.
Any ideas how I can display my products in the following fashion:
If I'm not being clear enough, let me know.PHP Code:- TB
- cat a
- product from sub category a
- cat b
- sub-cat bb
- product from sub category bb
- sub-cat bbb
- product from sub category bbb
- product from sub category bbb
- sub-cat bbbb
- cat c
- product from sub category c
Thanks!
Bookmarks