Storing Hierarchical Data in a Database, Part 2

    Gijs Van Tulder
    Gijs Van Tulder
    Share

    Modified Preorder Tree Traversal

    Now, let’s have a look at another method for storing trees. Recursion can be slow, so we would rather not use a recursive function. We’d also like to minimize the number of database queries. Preferably, we’d have just one query for each activity.

    We’ll start by laying out our tree in a horizontal way. Start at the root node (‘Food’), and write a 1 to its left. Follow the tree to ‘Fruit’ and write a 2 next to it. In this way, you walk (traverse) along the edges of the tree while writing a number on the left and right side of each node. The last number is written at the right side of the ‘Food’ node. In this image, you can see the whole numbered tree, and a few arrows to indicate the numbering order.

    1105_numbering

    We’ll call these numbers left and right (e.g. the left value of ‘Food’ is 1, the right value is 18). As you can see, these numbers indicate the relationship between each node. Because ‘Red’ has the numbers 3 and 6, it is a descendant of the 1-18 ‘Food’ node. In the same way, we can say that all nodes with left values greater than 2 and right values less than 11, are descendants of 2-11 ‘Fruit’. The tree structure is now stored in the left and right values. This method of walking around the tree and counting nodes is called the ‘modified preorder tree traversal’ algorithm.

    Before we continue, let’s see how these values look in our table:

    1105_table2

    Note that the words ‘left’ and ‘right’ have a special meaning in SQL. Therefore, we’ll have to use ‘lft’ and ‘rgt’ to identify the columns. Also note that we don’t really need the ‘parent’ column anymore. We now have the lft and rgt values to store the tree structure.

    Retrieve the Tree

    If you want to display the tree using a table with left and right values, you’ll first have to identify the nodes that you want to retrieve. For example, if you want the ‘Fruit’ subtree, you’ll have to select only the nodes with a left value between 2 and 11. In SQL, that would be:

    SELECT * FROM tree WHERE lft BETWEEN 2 AND 11;

    That returns:

    1105_table3

    Well, there it is: a whole tree in one query. To display this tree like we did our recursive function, we’ll have to add an ORDER BY clause to this query. If you add and delete rows from your table, your table probably won’t be in the right order. We should therefore order the rows by their left value.

    SELECT * FROM tree WHERE lft BETWEEN 2 AND 11 ORDER BY lft ASC;

    The only problem left is the indentation.

    To show the tree structure, children should be indented slightly more than their parent. We can do this by keeping a stack of right values. Each time you start with the children of a node, you add the right value of that node to the stack. You know that all children of that node have a right value that is less than the right value of the parent, so by comparing the right value of the current node with the last right node in the stack, you can see if you’re still displaying the children of that parent. When you’re finished displaying a node, you remove its right value from the stack. If you count the elements in the stack, you’ll get the level of the current node.

    <?php  
    function display_tree($root) {  
       // retrieve the left and right value of the $root node  
       $result = mysql_query('SELECT lft, rgt FROM tree '.  
                              'WHERE title="'.$root.'";');  
       $row = mysql_fetch_array($result);  
     
       // start with an empty $right stack  
       $right = array();  
     
       // now, retrieve all descendants of the $root node  
       $result = mysql_query('SELECT title, lft, rgt FROM tree '.  
                              'WHERE lft BETWEEN '.$row['lft'].' AND '.  
                              $row['rgt'].' ORDER BY lft ASC;');  
     
       // display each row  
       while ($row = mysql_fetch_array($result)) {  
           // only check stack if there is one  
           if (count($right)>0) {  
               // check if we should remove a node from the stack  
               while ($right[count($right)-1]<$row['rgt']) {  
                   array_pop($right);  
               }  
           }  
     
           // display indented node title  
           echo str_repeat('  ',count($right)).$row['title']."n";  
     
           // add this node to the stack  
           $right[] = $row['rgt'];  
       }  
    }  
    ?>

    If you run this code, you’ll get exactly the same tree as with the recursive function discussed above. Our new function will probably be faster: it isn’t recursive and it only uses two queries.

    The Path to a Node

    With this new algorithm, we’ll also have to find a new way to get the path to a specific node. To get this path, we’ll need a list of all ancestors of that node.

    With our new table structure, that really isn’t much work. When you look at, for example, the 4-5 ‘Cherry’ node, you’ll see that the left values of all ancestors are less than 4, while all right values are greater than 5. To get all ancestors, we can use this query:

    SELECT title FROM tree WHERE lft < 4 AND rgt > 5 ORDER BY lft ASC;

    Note that, just like in our previous query, we have to use an ORDER BY clause to sort the nodes. This query will return:

    +-------+
    | title |
    +-------+
    | Food |
    | Fruit |
    | Red |
    +-------+

    We now only have to join the rows to get the path to ‘Cherry’.

    How Many Descendants

    If you give me the left and right values of a node, I can tell you how many descendants it has by using a little math.

    As each descendant increments the right value of the node with 2, the number of descendants can be calculated with:

    descendants = (right – left - 1) / 2

    With this simple formula, I can tell you that the 2-11 ‘Fruit’ node has 4 descendant nodes and that the 8-9 ‘Banana’ node is just a child, not a parent.

    Go to page: 1 | 2 | 3

    Frequently Asked Questions on Hierarchical Data and Databases

    What is the difference between Adjacency List Model and Path Enumeration Model?

    The Adjacency List Model and Path Enumeration Model are two different ways of representing hierarchical data in a database. The Adjacency List Model uses a parent ID to link each record to its parent, creating a chain of parent-child relationships. This model is simple and efficient for inserting and deleting nodes, but it can be slow for retrieving subtrees or depth of nodes. On the other hand, the Path Enumeration Model stores the path from the root to each node. This model makes it easy and fast to retrieve subtrees or depth of nodes, but it can be slow and complex for inserting, deleting, or moving nodes because it requires updating the paths of all descendant nodes.

    How does the Nested Set Model handle hierarchical data?

    The Nested Set Model is another method for representing hierarchical data in a database. It uses a pair of left and right values to define the boundaries of each node and its descendants. This model is efficient for retrieving subtrees or depth of nodes, and for operations like checking if a node is a descendant of another node. However, it can be complex and slow for inserting, deleting, or moving nodes because it requires updating the left and right values of all affected nodes.

    What is the Closure Table Model?

    The Closure Table Model is a method for representing hierarchical data in a database that uses a separate table to store the ancestor-descendant relationships of all nodes. This model is efficient for retrieving subtrees, depth of nodes, or all ancestors or descendants of a node. It also makes it easy to insert, delete, or move nodes. However, it requires more storage space because it stores every ancestor-descendant relationship.

    How does the Materialized Path Model work?

    The Materialized Path Model is a method for representing hierarchical data in a database that stores the path from the root to each node as a string of IDs. This model makes it easy and fast to retrieve subtrees or depth of nodes, and to insert, delete, or move nodes. However, it requires careful management of the path strings to ensure data integrity.

    What is the difference between depth-first and breadth-first tree traversal?

    Depth-first and breadth-first are two different methods for traversing a tree. Depth-first traversal explores as far as possible along each branch before backtracking, which is useful for tasks like searching a specific path or finding a specific node. Breadth-first traversal explores all the nodes at the present depth before moving on to nodes at the next depth level, which is useful for tasks like finding the shortest path or level order traversal.

    How can I use SQL for recursive tree traversal?

    SQL can be used for recursive tree traversal by using recursive common table expressions (CTEs). A recursive CTE is a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement, and it can refer to itself. This allows you to write a query that can traverse a tree by repeatedly executing the same query with different parameters.

    What are the advantages and disadvantages of storing a tree in a relational database?

    Storing a tree in a relational database has several advantages, such as the ability to use SQL for querying and manipulating the data, and the support for transactions and concurrency control. However, it also has some disadvantages, such as the complexity of representing hierarchical relationships and the potential for performance issues when dealing with large trees or deep levels of nesting.

    How can I use a graph database for storing a tree?

    A graph database is a type of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store data. A tree is a type of graph, so it can be naturally represented and stored in a graph database. This allows for efficient and flexible operations like traversing the tree, finding the shortest path, or finding all nodes connected to a specific node.

    What is the difference between a tree and a graph in terms of data structure?

    A tree and a graph are both types of data structures used to represent relationships between elements. A tree is a type of graph, but it has specific constraints: it is acyclic (it has no cycles), connected (there is a path between any two nodes), and each node has at most one parent. A graph is more general and can represent more complex relationships, with cycles and nodes having multiple parents.

    How can I handle updates in a tree data structure in a database?

    Handling updates in a tree data structure in a database depends on the model used to represent the tree. For example, in the Adjacency List Model, you can update a node by simply updating its parent ID. In the Nested Set Model or the Closure Table Model, you need to update the left and right values or the ancestor-descendant relationships of all affected nodes. In the Materialized Path Model, you need to update the path strings of all descendant nodes.