Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

08-Mar-2004 Issue: 65

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Handling Trees in SQL
Practical Tree Handling	06-Mar-2004

There are many different models for handling tree structures in SQL. Google it and see what you can find. Here I am presenting a practical and relatively simple approach to a trees with an example using items and topics. The example is based on, say, a google search. There are many topics under which are items (eg. URLS) as well as other topics.

The topics are the structures of the tree and the items are the leaves. This example assumes a web interface, but we will not go into that here. What we will concentrate on is handling the selection and manipulation of the tree that will facilitate writing a good user interface.

To create the basic structure of the tree there will be an item table, a topic table and an item_topic table. Each topic in the topic table will reference its parent topic, creating the tree. The item_topic table is an aggregate table which maps items to topics without limiting any item to just one topic.

CREATE TABLE items ( item_id SERIAL PRIMARY KEY, item_name TEXT); CREATE TABLE topics ( topic_id SERIAL PRIMARY KEY, topic_name TEXT, parent_id INTEGER REFERENCES topics (topic_id) ); CREATE TABLE item_topics ( item_id INTEGER, topic_id INTEGER, FOREIGN KEY (item_id) REFERENCES items (item_id) ON UPDATE CASCADE ON DELETE CASCADE, FOREIGN KEY (topic_id) REFERENCES topics (topic_id) ON UPDATE CASCADE PRIMARY KEY (item_id, topic_id) );

To insert an item, an insert into both the item table and the topic table is required. In the following example, a topic and item are inserted and then the item_topics record is created the hard way. In the user interface one would have a nice interface which allowed you to assign topics to items--a drop down or a list.

INSERT INTO topics VALUES (default, 'History', NULL ); INSERT INTO items VALUES (default, 'Studying History' ); INSERT INTO item_topics VALUES ( (SELECT item_id FROM items WHERE item_name = 'Studying History'), (SELECT topic_id FROM topics WHERE topic_name='History' ));

The aggregate table, item_topics, should be updated if ever the item table key or the topic table key is updated. If an item is deleted, it should be removed from the aggregate table. The CASCADE directives implement this policy.

A view of the item_topics table would probably be helpful to the user interface developer. This would show the pertinent data from joining the three tables together. Here we can also see some "real" data.

CREATE VIEW it AS SELECT i.item_id, i.item_name, it.topic_id, t.parent_id, t.topic_name FROM item_topics it JOIN items i USING (item_id) JOIN topics t USING (topic_id); elein=# SELECT item_name, topic_name FROM it; item_name | topic_name ------------------------------------+----------------------- Studying History | History Spanish California | California History Gold Rush | California History Dot Coms | California History American Presidents | American History Civil War | American History Civil Rights | American History The Making of the Constitution | American History The Golden Gate | San Francisco History Digging up Market Street | San Francisco History Ada Lovelace: Scientific Computing | History Ada Lovelace: Scientific Computing | Ada The Camel Book | Perl Bad Perl, Good Perl | Perl Comparing Databases | Databases General Bits | Postgres Practicing Postgres | Postgres UCBerkeley Archives: Ingres | Ingres Relational Technology, Inc. | Ingres (19 rows)

But what happens if a topic is deleted? If the topic had any items and we deleted the topic, the items would be orphans if they just happened to not be associated with another topic. If the topics had any children topics, they would lose their link to their grandparents, possibly disrupting the associations of the tree.

We could adopt a policy of not deleting topics, that does not seem very helpful. This is what we've decided to do if a topic is deleted. If the topic to be deleted had both items and a parent, then the parent would inherit the deleted topic's items. If a topic to be deleted had child topics and a parent topic, then those children will have their parent id reassigned to their "grand"parent.

The way to implement this policy is with a trigger on the deletion of a topic. The trigger function is del_topic(). If there are items for the topic but there is parent of the topic, an exception is raised with a helpful (we hope) error message. Otherwise, each item is reassigned to it's previous topic's parent. That takes care of the items. The children are easier. Any topic which had this topic as a parent is reassigned to this topic's parent.

CREATE OR REPLACE function del_topic() RETURNS TRIGGER AS ' DECLARE r_rec RECORD; BEGIN FOR r_rec IN SELECT item_id, topic_id FROM item_topics WHERE topic_id = OLD.topic_id LOOP IF OLD.parent_id IS NULL THEN RAISE EXCEPTION ''Cannot delete topic % until its records are reassigned.'', OLD.topic_name; ELSE UPDATE item_topics SET topic_id = OLD.parent_id WHERE rec_id = r_id AND topic_id = t_id; END IF; LOOP; UPDATE topics SET parent_id=OLD.parent_id WHERE parent_id = OLD.topic_id; RETURN OLD; END; ' language 'plpgsql'; CREATE TRIGGER del_topic BEFORE DELETE ON topics FOR EACH ROW EXECUTE PROCEDURE del_topic();

We have outlined the variations of adding, updating and deleting both topics and items using CASCADE directives and a trigger. Those variations we did not handle explicitly will work correctly via standard inserts, updates and deletes.

The next thing to think about is to provide ways to access the tree hierarchy. The developer will want a way to trace the topic path of an item or a topic. We will look at two ways here. The first way is to provide the path as a comma separated list. The second way is to expanded the path and return an ordered set of tuples.

To show the path as a comma separated list, we must start with an item's topic and recurse up the tree, concatenating (aggregating) the topic nodes as we go.

CREATE OR REPLACE FUNCTION get_topic_path( integer ) RETURNS TEXT AS ' DECLARE path text; topic RECORD; BEGIN SELECT INTO topic topic_name, parent_id FROM topics WHERE topic_id = $1; path := topic.topic_name; IF topic.parent_id IS NOT NULL THEN path := (SELECT get_topic_path(topic.parent_id)) || '', '' || path; END IF; RETURN path; END; ' LANGUAGE 'plpgsql';

The function get_topic_path() can be used to get both topic paths by topic or topic paths by item.

elein=# select topic_name, get_topic_path( topic_id ) from topics; topic_name | get_topic_path -----------------------+---------------------------------------------------- History | History California History | History, California History American History | History, American History San Francisco History | History, California History, San Francisco History Computer Science | Computer Science Computer Languages | Computer Science, Computer Languages Ada | Computer Science, Computer Languages, Ada Perl | Computer Science, Computer Languages, Perl Databases | Computer Science, Databases Postgres | Computer Science, Databases, Postgres Ingres | Computer Science, Databases, Ingres (11 rows) elein=# select item_name, get_topic_path(topic_id) from it; item_name | get_topic_path ------------------------------------+---------------------------------------------------- Studying History | History Spanish California | History, California History Gold Rush | History, California History Dot Coms | History, California History American Presidents | History, American History Civil War | History, American History Civil Rights | History, American History The Making of the Constitution | History, American History The Golden Gate | History, California History, San Francisco History Digging up Market Street | History, California History, San Francisco History Ada Lovelace: Scientific Computing | History Ada Lovelace: Scientific Computing | Computer Science, Computer Languages, Ada The Camel Book | Computer Science, Computer Languages, Perl Bad Perl, Good Perl | Computer Science, Computer Languages, Perl General Bits | Computer Science, Postgres Practicing Postgres | Computer Science, Postgres UCBerkeley Archives: Ingres | Computer Science, Ingres Relational Technology, Inc. | Computer Science, Ingres Comparing Databases | Computer Science (19 rows)

With the function get_topic_path() we can test the delete trigger fairly easily. We will delete the topic 'Databases' and expect that the child nodes adopt 'Computer Science' as parents and the item 'Comparing Databases' will be reassigned to topic 'Computer Science'. In the selection below, this is exactly what is shown.

elein=# delete from topics where topic_id = 9; DELETE 1 elein=# select item_id, item_name, get_topic_path(topic_id) from it; elein=# select item_id, item_name, get_topic_path(topic_id) from it; item_id | item_name | get_topic_path ---------+------------------------------------+---------------------------------- [ ... rows deleted for readability ...] 16 | General Bits | Computer Science, Postgres 17 | Practicing Postgres | Computer Science, Postgres 18 | UCBerkeley Archives: Ingres | Computer Science, Ingres 19 | Relational Technology, Inc. | Computer Science, Ingres 15 | Comparing Databases | Computer Science (19 rows)

We are assuming that a comma separated list is a good way for the user interface to get the path of an item or topic. However, sometimes it might be better for the user interface to slice this a little bit differently. Another way to do this is trickier. A function which returns sets of tuples will be used to select an ordered set of tuples, each of which represents the next node in the path.

In this case let's look at the results first. The selection is for topic id 11, 'Ingres'. We saw in one of the previous queries that the path to this topic is: "Computer Science->Databases->Ingres" We are joining topics into the query twice; once for the base topic name and once for the name of the topic on the path. The order here is pertinent. The topic id numbers do not necessarily correspond to the order of the tree from the top down. We are relying strictly on the parent id link.

So, let's take a look at the get_topic_node() function first. First the return tuple must be created as a proper composite type, topic_node. Then in the function, starting at the topic_id which was passed in, we fetch its parent, if there is one and then call get_topic_node() again on the parent topic. The end of the recursion occurs when the parent topic is empty.

When the highest tuple is fetched, it is returned to the previous caller. We return t2 for the parent node and t for the current node. The last return will return a NULL tuple signifying the end of the function.

CREATE TYPE topic_node AS (tn_id integer, tn_parent integer); CREATE or REPLACE FUNCTION get_topic_node( integer ) RETURNS SETOF topic_node AS ' DECLARE t_parent INTEGER; t topic_node; t2 topic_node; BEGIN FOR t IN SELECT topic_id, parent_id FROM topics WHERE topic_id = $1 LOOP IF t.tn_parent IS NOT NULL THEN FOR t2 IN SELECT * FROM get_topic_node(t.tn_parent) LOOP RETURN NEXT t2; END LOOP; END IF; RETURN NEXT t; END LOOP; RETURN; END; ' language 'plpgsql';

As we saw in the query above, the tuple set returning function get_topic_node relies on a parameter. The parameter was hard coded. It is next to impossible to make the parameter part of the query. The technique we will use to parameterize the query is to wrap the function in another tuple set returning function and qualify the results of the second function. The pros and cons of this technique are flexibility vs. speed. The topic tree is not planned to be very large, but it is anticipated to be complex, so we are opting for flexibility. There are probably other, possibly better ways to do this.

This function, get_item_path() returns an item and its topic path in an ordered set of tuples by relying on the previous function get_topic_node()

CREATE TYPE item_path AS (item_id integer, topic_id integer); CREATE OR REPLACE FUNCTION item_path () RETURNS SETOF item_path AS ' DECLARE it item_path; i record; tn topic_node; BEGIN FOR i IN select item_id, topic_id from item_topics LOOP it.item_id = i.item_id; FOR tn IN SELECT * from get_topic_node ( i.topic_id ) LOOP it.topic_id = tn.tn_id; RETURN NEXT it; END LOOP; END LOOP; RETURN it; END; ' language 'plpgsql';

We create a composite type, item_path for the function item_path() to return. This is a simple recursive call linking all topics to their topic paths. Each node is returned as an item_id and a topic_id. The order counts in the same way order counts for get_topic_path.

The query to show all topics paths for all items follows. To show only those for certain items, qualify the selection with item_id.

These pieces of examples raise even more questions. I'll raise some key related questions and leave the solutions as an exercise to the reader.

What, if anything, would need to change in the table structures to support a parent to child topic search?
How would the function(s) be to show the children trees of a a topic node?
Since these path lists are ordered, what changes, if any, would be required to sort the results of get_topic_node() or get_item_path() by item or base topic?

The code and the data used for this article are available in Tidbits.

Contributors: elein @ varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top