varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

08-Mar-2004 Issue: 65

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Handling Trees in SQL
Practical Tree Handling 06-Mar-2004

There are many different models for handling tree structures in SQL. Google it and see what you can find. Here I am presenting a practical and relatively simple approach to a trees with an example using items and topics. The example is based on, say, a google search. There are many topics under which are items (eg. URLS) as well as other topics.

The topics are the structures of the tree and the items are the leaves. This example assumes a web interface, but we will not go into that here. What we will concentrate on is handling the selection and manipulation of the tree that will facilitate writing a good user interface.

To create the basic structure of the tree there will be an item table, a topic table and an item_topic table. Each topic in the topic table will reference its parent topic, creating the tree. The item_topic table is an aggregate table which maps items to topics without limiting any item to just one topic.

   CREATE TABLE items (
      item_id  SERIAL PRIMARY KEY,
      item_name TEXT);
   CREATE TABLE topics (
      topic_id SERIAL PRIMARY KEY,
      topic_name  TEXT,
      parent_id   INTEGER REFERENCES topics (topic_id) );
   CREATE TABLE item_topics (
      item_id  INTEGER,
      topic_id INTEGER,
      FOREIGN KEY (item_id) REFERENCES items (item_id)
         ON UPDATE CASCADE ON DELETE CASCADE,
      FOREIGN KEY (topic_id) REFERENCES topics (topic_id)
         ON UPDATE CASCADE
      PRIMARY KEY (item_id, topic_id) );

To insert an item, an insert into both the item table and the topic table is required. In the following example, a topic and item are inserted and then the item_topics record is created the hard way. In the user interface one would have a nice interface which allowed you to assign topics to items--a drop down or a list.

   INSERT INTO topics VALUES (default, 'History', NULL );
   INSERT INTO items VALUES (default, 'Studying History' );
   INSERT INTO item_topics VALUES (
      (SELECT item_id FROM items 
       WHERE item_name = 'Studying History'),
      (SELECT topic_id FROM topics 
       WHERE topic_name='History' ));
The aggregate table, item_topics, should be updated if ever the item table key or the topic table key is updated. If an item is deleted, it should be removed from the aggregate table. The CASCADE directives implement this policy.

A view of the item_topics table would probably be helpful to the user interface developer. This would show the pertinent data from joining the three tables together. Here we can also see some "real" data.

   CREATE VIEW it AS
   SELECT i.item_id, i.item_name, it.topic_id, t.parent_id, t.topic_name
   FROM item_topics it JOIN items i USING (item_id) 
      JOIN topics t USING (topic_id);

   elein=# SELECT item_name, topic_name FROM it;
                item_name              |      topic_name       
   ------------------------------------+-----------------------
    Studying History                   | History
    Spanish California                 | California History
    Gold Rush                          | California History
    Dot Coms                           | California History
    American Presidents                | American History
    Civil War                          | American History
    Civil Rights                       | American History
    The Making of the Constitution     | American History
    The Golden Gate                    | San Francisco History
    Digging up Market Street           | San Francisco History
    Ada Lovelace: Scientific Computing | History
    Ada Lovelace: Scientific Computing | Ada
    The Camel Book                     | Perl
    Bad Perl, Good Perl                | Perl
    Comparing Databases                | Databases
    General Bits                       | Postgres
    Practicing Postgres                | Postgres
    UCBerkeley Archives: Ingres        | Ingres
    Relational Technology, Inc.        | Ingres
   (19 rows)

But what happens if a topic is deleted? If the topic had any items and we deleted the topic, the items would be orphans if they just happened to not be associated with another topic. If the topics had any children topics, they would lose their link to their grandparents, possibly disrupting the associations of the tree.

We could adopt a policy of not deleting topics, that does not seem very helpful. This is what we've decided to do if a topic is deleted. If the topic to be deleted had both items and a parent, then the parent would inherit the deleted topic's items. If a topic to be deleted had child topics and a parent topic, then those children will have their parent id reassigned to their "grand"parent.

The way to implement this policy is with a trigger on the deletion of a topic. The trigger function is del_topic(). If there are items for the topic but there is parent of the topic, an exception is raised with a helpful (we hope) error message. Otherwise, each item is reassigned to it's previous topic's parent. That takes care of the items. The children are easier. Any topic which had this topic as a parent is reassigned to this topic's parent.

   CREATE OR REPLACE function del_topic()
   RETURNS TRIGGER AS '
   DECLARE
      r_rec RECORD;
   BEGIN
      FOR r_rec IN SELECT item_id, topic_id 
                   FROM item_topics 
                   WHERE topic_id = OLD.topic_id LOOP
         IF OLD.parent_id IS NULL THEN
            RAISE EXCEPTION 
            ''Cannot delete topic % until its records are reassigned.'',
            OLD.topic_name;
         ELSE
            UPDATE item_topics SET topic_id = OLD.parent_id 
            WHERE rec_id = r_id AND topic_id =  t_id;
         END IF;
      LOOP;
   
      UPDATE topics SET parent_id=OLD.parent_id
      WHERE parent_id = OLD.topic_id;
      RETURN OLD;
   END;
   ' language 'plpgsql';
   
   CREATE TRIGGER del_topic BEFORE DELETE ON topics
      FOR EACH ROW EXECUTE PROCEDURE del_topic();

We have outlined the variations of adding, updating and deleting both topics and items using CASCADE directives and a trigger. Those variations we did not handle explicitly will work correctly via standard inserts, updates and deletes.

The next thing to think about is to provide ways to access the tree hierarchy. The developer will want a way to trace the topic path of an item or a topic. We will look at two ways here. The first way is to provide the path as a comma separated list. The second way is to expanded the path and return an ordered set of tuples.

To show the path as a comma separated list, we must start with an item's topic and recurse up the tree, concatenating (aggregating) the topic nodes as we go.

   CREATE OR REPLACE FUNCTION get_topic_path( integer )
   RETURNS TEXT AS '
   DECLARE
      path  text;
      topic RECORD;
   BEGIN
      SELECT INTO topic topic_name, parent_id FROM topics WHERE topic_id = $1;
      path := topic.topic_name;
      IF topic.parent_id IS NOT NULL
      THEN
         path := (SELECT get_topic_path(topic.parent_id)) || '', '' || path;
      END IF;
      RETURN path;
   END;
   ' LANGUAGE 'plpgsql';
The function get_topic_path() can be used to get both topic paths by topic or topic paths by item.

   elein=# select topic_name, get_topic_path( topic_id ) from topics;
         topic_name       |                   get_topic_path                   
   -----------------------+----------------------------------------------------
    History               | History
    California History    | History, California History
    American History      | History, American History
    San Francisco History | History, California History, San Francisco History
    Computer Science      | Computer Science
    Computer Languages    | Computer Science, Computer Languages
    Ada                   | Computer Science, Computer Languages, Ada
    Perl                  | Computer Science, Computer Languages, Perl
    Databases             | Computer Science, Databases
    Postgres              | Computer Science, Databases, Postgres
    Ingres                | Computer Science, Databases, Ingres
   (11 rows)

  elein=# select item_name, get_topic_path(topic_id) from it;
               item_name              |                   get_topic_path                   
  ------------------------------------+----------------------------------------------------
   Studying History                   | History
   Spanish California                 | History, California History
   Gold Rush                          | History, California History
   Dot Coms                           | History, California History
   American Presidents                | History, American History
   Civil War                          | History, American History
   Civil Rights                       | History, American History
   The Making of the Constitution     | History, American History
   The Golden Gate                    | History, California History, San Francisco History
   Digging up Market Street           | History, California History, San Francisco History
   Ada Lovelace: Scientific Computing | History
   Ada Lovelace: Scientific Computing | Computer Science, Computer Languages, Ada
   The Camel Book                     | Computer Science, Computer Languages, Perl
   Bad Perl, Good Perl                | Computer Science, Computer Languages, Perl
   General Bits                       | Computer Science, Postgres
   Practicing Postgres                | Computer Science, Postgres
   UCBerkeley Archives: Ingres        | Computer Science, Ingres
   Relational Technology, Inc.        | Computer Science, Ingres
   Comparing Databases                | Computer Science
  (19 rows)

With the function get_topic_path() we can test the delete trigger fairly easily. We will delete the topic 'Databases' and expect that the child nodes adopt 'Computer Science' as parents and the item 'Comparing Databases' will be reassigned to topic 'Computer Science'. In the selection below, this is exactly what is shown.

   elein=# delete from topics where topic_id = 9;
   DELETE 1
   
   elein=# select item_id, item_name, get_topic_path(topic_id) from it;
   elein=# select item_id, item_name, get_topic_path(topic_id) from it;
    item_id |             item_name              |  get_topic_path                   
   ---------+------------------------------------+----------------------------------
   [ ... rows deleted for readability ...]
         16 | General Bits                       | Computer Science, Postgres
         17 | Practicing Postgres                | Computer Science, Postgres
         18 | UCBerkeley Archives: Ingres        | Computer Science, Ingres
         19 | Relational Technology, Inc.        | Computer Science, Ingres
         15 | Comparing Databases                | Computer Science
   (19 rows)

We are assuming that a comma separated list is a good way for the user interface to get the path of an item or topic. However, sometimes it might be better for the user interface to slice this a little bit differently. Another way to do this is trickier. A function which returns sets of tuples will be used to select an ordered set of tuples, each of which represents the next node in the path.

In this case let's look at the results first. The selection is for topic id 11, 'Ingres'. We saw in one of the previous queries that the path to this topic is: "Computer Science->Databases->Ingres" We are joining topics into the query twice; once for the base topic name and once for the name of the topic on the path. The order here is pertinent. The topic id numbers do not necessarily correspond to the order of the tree from the top down. We are relying strictly on the parent id link.

   elein=# select t.topic_name as base_topic, tn_id, t2.topic_name
           from topics t, get_topic_node(11) g, topics t2
           where t.topic_id = 11 and t2.topic_id = g.tn_id;
    base_topic | tn_id |    topic_name    
   ------------+-------+------------------
    Ingres     |     5 | Computer Science
    Ingres     |     9 | Databases
    Ingres     |    11 | Ingres
   (3 rows)

So, let's take a look at the get_topic_node() function first. First the return tuple must be created as a proper composite type, topic_node. Then in the function, starting at the topic_id which was passed in, we fetch its parent, if there is one and then call get_topic_node() again on the parent topic. The end of the recursion occurs when the parent topic is empty.

When the highest tuple is fetched, it is returned to the previous caller. We return t2 for the parent node and t for the current node. The last return will return a NULL tuple signifying the end of the function.

   CREATE TYPE topic_node AS (tn_id integer, tn_parent integer);

   CREATE or REPLACE FUNCTION get_topic_node( integer )
   RETURNS SETOF topic_node AS '
   DECLARE
      t_parent INTEGER;
      t topic_node;
      t2 topic_node;
   BEGIN
      FOR t IN SELECT topic_id, parent_id 
         FROM topics 
         WHERE topic_id = $1
      LOOP
         IF t.tn_parent IS NOT NULL
         THEN
            FOR t2 IN SELECT * FROM get_topic_node(t.tn_parent) LOOP
               RETURN NEXT t2;
            END LOOP;
         END IF;
         RETURN NEXT t;
      END LOOP;
      RETURN;
   END;
   ' language 'plpgsql';

As we saw in the query above, the tuple set returning function get_topic_node relies on a parameter. The parameter was hard coded. It is next to impossible to make the parameter part of the query. The technique we will use to parameterize the query is to wrap the function in another tuple set returning function and qualify the results of the second function. The pros and cons of this technique are flexibility vs. speed. The topic tree is not planned to be very large, but it is anticipated to be complex, so we are opting for flexibility. There are probably other, possibly better ways to do this.

This function, get_item_path() returns an item and its topic path in an ordered set of tuples by relying on the previous function get_topic_node()

   CREATE TYPE item_path AS (item_id integer, topic_id integer);
   
   CREATE OR REPLACE FUNCTION item_path ()
   RETURNS SETOF item_path AS '
   DECLARE
      it item_path;
      i record;
      tn topic_node;
   BEGIN
      FOR i IN select item_id, topic_id from item_topics LOOP
         it.item_id = i.item_id;
         FOR tn IN SELECT  * from get_topic_node ( i.topic_id ) LOOP
            it.topic_id = tn.tn_id;
            RETURN NEXT it;
         END LOOP;
      END LOOP;
      RETURN it;
   END;
   ' language 'plpgsql';
We create a composite type, item_path for the function item_path() to return. This is a simple recursive call linking all topics to their topic paths. Each node is returned as an item_id and a topic_id. The order counts in the same way order counts for get_topic_path.

The query to show all topics paths for all items follows. To show only those for certain items, qualify the selection with item_id.

   elein=# SELECT it.item_id, i.item_name, it.topic_id, t.topic_name
           FROM items i, topics t, item_path() it
           WHERE it.item_id = i.item_id AND it.topic_id = t.topic_id;
   
    item_id |             item_name              | topic_id |      topic_name       
   ---------+------------------------------------+----------+-----------------------
          1 | Studying History                   |        1 | History
          2 | Spanish California                 |        1 | History
          2 | Spanish California                 |        2 | California History
          3 | Gold Rush                          |        1 | History
          3 | Gold Rush                          |        2 | California History
          4 | Dot Coms                           |        1 | History
          4 | Dot Coms                           |        2 | California History
          5 | American Presidents                |        1 | History
          5 | American Presidents                |        3 | American History
          6 | Civil War                          |        1 | History
          6 | Civil War                          |        3 | American History
          7 | Civil Rights                       |        1 | History
          7 | Civil Rights                       |        3 | American History
          8 | The Making of the Constitution     |        1 | History
          8 | The Making of the Constitution     |        3 | American History
          9 | The Golden Gate                    |        1 | History
          9 | The Golden Gate                    |        2 | California History
          9 | The Golden Gate                    |        4 | San Francisco History
         10 | Digging up Market Street           |        1 | History
         10 | Digging up Market Street           |        2 | California History
         10 | Digging up Market Street           |        4 | San Francisco History
         12 | Ada Lovelace: Scientific Computing |        1 | History
         12 | Ada Lovelace: Scientific Computing |        5 | Computer Science
         12 | Ada Lovelace: Scientific Computing |        6 | Computer Languages
         12 | Ada Lovelace: Scientific Computing |        7 | Ada
         13 | The Camel Book                     |        5 | Computer Science
         13 | The Camel Book                     |        6 | Computer Languages
         13 | The Camel Book                     |        8 | Perl
         14 | Bad Perl, Good Perl                |        5 | Computer Science
         14 | Bad Perl, Good Perl                |        6 | Computer Languages
         14 | Bad Perl, Good Perl                |        8 | Perl
         15 | Comparing Databases                |        5 | Computer Science
         15 | Comparing Databases                |        9 | Databases
         16 | General Bits                       |        5 | Computer Science
         16 | General Bits                       |        9 | Databases
         16 | General Bits                       |       10 | Postgres
         17 | Practicing Postgres                |        5 | Computer Science
         17 | Practicing Postgres                |        9 | Databases
         17 | Practicing Postgres                |       10 | Postgres
         18 | UCBerkeley Archives: Ingres        |        5 | Computer Science
         18 | UCBerkeley Archives: Ingres        |        9 | Databases
         18 | UCBerkeley Archives: Ingres        |       11 | Ingres
         19 | Relational Technology, Inc.        |        5 | Computer Science
         19 | Relational Technology, Inc.        |        9 | Databases
         19 | Relational Technology, Inc.        |       11 | Ingres
   (45 rows)

These pieces of examples raise even more questions. I'll raise some key related questions and leave the solutions as an exercise to the reader.

  • What, if anything, would need to change in the table structures to support a parent to child topic search?
  • How would the function(s) be to show the children trees of a a topic node?
  • Since these path lists are ordered, what changes, if any, would be required to sort the results of get_topic_node() or get_item_path() by item or base topic?

The code and the data used for this article are available in Tidbits.

Contributors: elein @ varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW