Handling Trees in SQL
|
Practical Tree Handling
|
06-Mar-2004
| |
There are many different models for handling tree structures in SQL.
Google it and see what you can find. Here I am presenting a practical
and relatively simple approach to a trees with an example using
items and topics. The example is based on, say, a google search.
There are many topics under which are items (eg. URLS) as well as other topics.
The topics are the structures of the tree and the items are the
leaves. This example assumes a web interface, but we will not go
into that here. What we will concentrate on is handling the selection
and manipulation of the tree that will facilitate writing a good
user interface.
To create the basic structure of the tree there will be
an item table, a topic table and an item_topic
table. Each topic in the topic table will reference its parent
topic, creating the tree. The item_topic table is an
aggregate table which maps items to topics without limiting
any item to just one topic.
CREATE TABLE items (
item_id SERIAL PRIMARY KEY,
item_name TEXT);
CREATE TABLE topics (
topic_id SERIAL PRIMARY KEY,
topic_name TEXT,
parent_id INTEGER REFERENCES topics (topic_id) );
CREATE TABLE item_topics (
item_id INTEGER,
topic_id INTEGER,
FOREIGN KEY (item_id) REFERENCES items (item_id)
ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (topic_id) REFERENCES topics (topic_id)
ON UPDATE CASCADE
PRIMARY KEY (item_id, topic_id) );
|
To insert an item, an insert into both the item table and
the topic table is required. In the following example,
a topic and item are inserted and then the item_topics
record is created the hard way. In the user interface
one would have a nice interface which allowed you to
assign topics to items--a drop down or a list.
INSERT INTO topics VALUES (default, 'History', NULL );
INSERT INTO items VALUES (default, 'Studying History' );
INSERT INTO item_topics VALUES (
(SELECT item_id FROM items
WHERE item_name = 'Studying History'),
(SELECT topic_id FROM topics
WHERE topic_name='History' ));
|
The aggregate table, item_topics, should be updated
if ever the item table key or the topic table key is updated.
If an item is deleted, it should be removed from the aggregate
table. The CASCADE directives implement this policy.
A view of the item_topics table would probably be
helpful to the user interface developer. This would show the
pertinent data from joining the three tables together. Here we
can also see some "real" data.
CREATE VIEW it AS
SELECT i.item_id, i.item_name, it.topic_id, t.parent_id, t.topic_name
FROM item_topics it JOIN items i USING (item_id)
JOIN topics t USING (topic_id);
elein=# SELECT item_name, topic_name FROM it;
item_name | topic_name
------------------------------------+-----------------------
Studying History | History
Spanish California | California History
Gold Rush | California History
Dot Coms | California History
American Presidents | American History
Civil War | American History
Civil Rights | American History
The Making of the Constitution | American History
The Golden Gate | San Francisco History
Digging up Market Street | San Francisco History
Ada Lovelace: Scientific Computing | History
Ada Lovelace: Scientific Computing | Ada
The Camel Book | Perl
Bad Perl, Good Perl | Perl
Comparing Databases | Databases
General Bits | Postgres
Practicing Postgres | Postgres
UCBerkeley Archives: Ingres | Ingres
Relational Technology, Inc. | Ingres
(19 rows)
|
But what happens if a topic is deleted? If the topic had any
items and we deleted the topic, the items would be orphans if
they just happened to not be associated with another topic.
If the topics had any children topics, they would lose their
link to their grandparents, possibly disrupting the associations
of the tree.
We could adopt a policy of not deleting topics, that
does not seem very helpful. This is what we've decided to
do if a topic is deleted.
If the topic to be deleted had both items and a parent, then
the parent would inherit the deleted topic's items.
If a topic to be deleted had child topics and a parent topic,
then those children will have their parent id reassigned to
their "grand"parent.
The way to implement this policy is with a trigger on the
deletion of a topic. The trigger function is del_topic().
If there are items for the topic but there is parent of the
topic, an exception is raised with a helpful (we hope) error
message. Otherwise, each item is reassigned to it's previous
topic's parent. That takes care of the items. The children
are easier. Any topic which had this topic as a parent is
reassigned to this topic's parent.
CREATE OR REPLACE function del_topic()
RETURNS TRIGGER AS '
DECLARE
r_rec RECORD;
BEGIN
FOR r_rec IN SELECT item_id, topic_id
FROM item_topics
WHERE topic_id = OLD.topic_id LOOP
IF OLD.parent_id IS NULL THEN
RAISE EXCEPTION
''Cannot delete topic % until its records are reassigned.'',
OLD.topic_name;
ELSE
UPDATE item_topics SET topic_id = OLD.parent_id
WHERE rec_id = r_id AND topic_id = t_id;
END IF;
LOOP;
UPDATE topics SET parent_id=OLD.parent_id
WHERE parent_id = OLD.topic_id;
RETURN OLD;
END;
' language 'plpgsql';
CREATE TRIGGER del_topic BEFORE DELETE ON topics
FOR EACH ROW EXECUTE PROCEDURE del_topic();
|
We have outlined the variations of adding, updating and deleting
both topics and items using CASCADE directives and a trigger.
Those variations we did not handle explicitly will work correctly
via standard inserts, updates and deletes.
The next thing to think about is to provide
ways to access the tree hierarchy. The developer will want a
way to trace the topic path of an item or a topic.
We will look at two ways here.
The first way is to provide the path as a comma separated list.
The second way is to expanded the path and return an ordered
set of tuples.
To show the path as a comma separated list, we must start with
an item's topic and recurse up the tree, concatenating (aggregating)
the topic nodes as we go.
CREATE OR REPLACE FUNCTION get_topic_path( integer )
RETURNS TEXT AS '
DECLARE
path text;
topic RECORD;
BEGIN
SELECT INTO topic topic_name, parent_id FROM topics WHERE topic_id = $1;
path := topic.topic_name;
IF topic.parent_id IS NOT NULL
THEN
path := (SELECT get_topic_path(topic.parent_id)) || '', '' || path;
END IF;
RETURN path;
END;
' LANGUAGE 'plpgsql';
|
The function get_topic_path() can be used to get both
topic paths by topic or topic paths by item.
elein=# select topic_name, get_topic_path( topic_id ) from topics;
topic_name | get_topic_path
-----------------------+----------------------------------------------------
History | History
California History | History, California History
American History | History, American History
San Francisco History | History, California History, San Francisco History
Computer Science | Computer Science
Computer Languages | Computer Science, Computer Languages
Ada | Computer Science, Computer Languages, Ada
Perl | Computer Science, Computer Languages, Perl
Databases | Computer Science, Databases
Postgres | Computer Science, Databases, Postgres
Ingres | Computer Science, Databases, Ingres
(11 rows)
elein=# select item_name, get_topic_path(topic_id) from it;
item_name | get_topic_path
------------------------------------+----------------------------------------------------
Studying History | History
Spanish California | History, California History
Gold Rush | History, California History
Dot Coms | History, California History
American Presidents | History, American History
Civil War | History, American History
Civil Rights | History, American History
The Making of the Constitution | History, American History
The Golden Gate | History, California History, San Francisco History
Digging up Market Street | History, California History, San Francisco History
Ada Lovelace: Scientific Computing | History
Ada Lovelace: Scientific Computing | Computer Science, Computer Languages, Ada
The Camel Book | Computer Science, Computer Languages, Perl
Bad Perl, Good Perl | Computer Science, Computer Languages, Perl
General Bits | Computer Science, Postgres
Practicing Postgres | Computer Science, Postgres
UCBerkeley Archives: Ingres | Computer Science, Ingres
Relational Technology, Inc. | Computer Science, Ingres
Comparing Databases | Computer Science
(19 rows)
|
With the function get_topic_path() we can test the delete trigger
fairly easily. We will delete the topic 'Databases' and expect that the child
nodes adopt 'Computer Science' as parents and the item 'Comparing Databases'
will be reassigned to topic 'Computer Science'. In the selection below,
this is exactly what is shown.
elein=# delete from topics where topic_id = 9;
DELETE 1
elein=# select item_id, item_name, get_topic_path(topic_id) from it;
elein=# select item_id, item_name, get_topic_path(topic_id) from it;
item_id | item_name | get_topic_path
---------+------------------------------------+----------------------------------
[ ... rows deleted for readability ...]
16 | General Bits | Computer Science, Postgres
17 | Practicing Postgres | Computer Science, Postgres
18 | UCBerkeley Archives: Ingres | Computer Science, Ingres
19 | Relational Technology, Inc. | Computer Science, Ingres
15 | Comparing Databases | Computer Science
(19 rows)
|
We are assuming that a comma separated list is a good way for the
user interface to get the path of an item or topic. However,
sometimes it might be better for the user interface to slice this
a little bit differently. Another way to do this is trickier. A function
which returns sets of tuples will be used to
select an ordered set of tuples, each of which represents the
next node in the path.
In this case let's look at the results first.
The selection is for topic id 11, 'Ingres'. We saw in one of the previous
queries that the path to this topic is: "Computer Science->Databases->Ingres"
We are joining topics into the query twice; once for the base topic
name and once for the name of the topic on the path. The order here is
pertinent. The topic id numbers do not necessarily correspond to the order
of the tree from the top down. We are relying strictly on the parent id link.
elein=# select t.topic_name as base_topic, tn_id, t2.topic_name
from topics t, get_topic_node(11) g, topics t2
where t.topic_id = 11 and t2.topic_id = g.tn_id;
base_topic | tn_id | topic_name
------------+-------+------------------
Ingres | 5 | Computer Science
Ingres | 9 | Databases
Ingres | 11 | Ingres
(3 rows)
|
So, let's take a look at the get_topic_node() function first.
First the return tuple must be created as a proper
composite type, topic_node. Then in the function,
starting at the topic_id which was passed in, we fetch its
parent, if there is one and then call get_topic_node()
again on the parent topic. The end of the recursion occurs when the
parent topic is empty.
When the highest tuple is fetched, it is returned to the previous caller.
We return t2 for the parent node and t for the current
node. The last return will return a NULL tuple signifying the end
of the function.
CREATE TYPE topic_node AS (tn_id integer, tn_parent integer);
CREATE or REPLACE FUNCTION get_topic_node( integer )
RETURNS SETOF topic_node AS '
DECLARE
t_parent INTEGER;
t topic_node;
t2 topic_node;
BEGIN
FOR t IN SELECT topic_id, parent_id
FROM topics
WHERE topic_id = $1
LOOP
IF t.tn_parent IS NOT NULL
THEN
FOR t2 IN SELECT * FROM get_topic_node(t.tn_parent) LOOP
RETURN NEXT t2;
END LOOP;
END IF;
RETURN NEXT t;
END LOOP;
RETURN;
END;
' language 'plpgsql';
|
As we saw in the query above, the tuple set returning function
get_topic_node relies on a parameter.
The parameter was hard coded. It is next to
impossible to make the parameter part of the query. The technique we will
use to parameterize the query is to wrap the function in another tuple set
returning function and qualify the results of the second function. The
pros and cons of this technique are flexibility vs. speed. The topic tree
is not planned to be very large, but it is anticipated to be complex, so we
are opting for flexibility. There are probably other, possibly better ways
to do this.
This function, get_item_path() returns an item and its
topic path in an ordered set of tuples by relying on the previous
function get_topic_node()
CREATE TYPE item_path AS (item_id integer, topic_id integer);
CREATE OR REPLACE FUNCTION item_path ()
RETURNS SETOF item_path AS '
DECLARE
it item_path;
i record;
tn topic_node;
BEGIN
FOR i IN select item_id, topic_id from item_topics LOOP
it.item_id = i.item_id;
FOR tn IN SELECT * from get_topic_node ( i.topic_id ) LOOP
it.topic_id = tn.tn_id;
RETURN NEXT it;
END LOOP;
END LOOP;
RETURN it;
END;
' language 'plpgsql';
|
We create a composite type, item_path for the function
item_path() to return. This is a simple recursive
call linking all topics to their topic paths. Each node
is returned as an item_id and a topic_id.
The order counts in the same way order counts for get_topic_path.
The query to show all topics paths for all items follows. To show
only those for certain items, qualify the selection with
item_id.
elein=# SELECT it.item_id, i.item_name, it.topic_id, t.topic_name
FROM items i, topics t, item_path() it
WHERE it.item_id = i.item_id AND it.topic_id = t.topic_id;
item_id | item_name | topic_id | topic_name
---------+------------------------------------+----------+-----------------------
1 | Studying History | 1 | History
2 | Spanish California | 1 | History
2 | Spanish California | 2 | California History
3 | Gold Rush | 1 | History
3 | Gold Rush | 2 | California History
4 | Dot Coms | 1 | History
4 | Dot Coms | 2 | California History
5 | American Presidents | 1 | History
5 | American Presidents | 3 | American History
6 | Civil War | 1 | History
6 | Civil War | 3 | American History
7 | Civil Rights | 1 | History
7 | Civil Rights | 3 | American History
8 | The Making of the Constitution | 1 | History
8 | The Making of the Constitution | 3 | American History
9 | The Golden Gate | 1 | History
9 | The Golden Gate | 2 | California History
9 | The Golden Gate | 4 | San Francisco History
10 | Digging up Market Street | 1 | History
10 | Digging up Market Street | 2 | California History
10 | Digging up Market Street | 4 | San Francisco History
12 | Ada Lovelace: Scientific Computing | 1 | History
12 | Ada Lovelace: Scientific Computing | 5 | Computer Science
12 | Ada Lovelace: Scientific Computing | 6 | Computer Languages
12 | Ada Lovelace: Scientific Computing | 7 | Ada
13 | The Camel Book | 5 | Computer Science
13 | The Camel Book | 6 | Computer Languages
13 | The Camel Book | 8 | Perl
14 | Bad Perl, Good Perl | 5 | Computer Science
14 | Bad Perl, Good Perl | 6 | Computer Languages
14 | Bad Perl, Good Perl | 8 | Perl
15 | Comparing Databases | 5 | Computer Science
15 | Comparing Databases | 9 | Databases
16 | General Bits | 5 | Computer Science
16 | General Bits | 9 | Databases
16 | General Bits | 10 | Postgres
17 | Practicing Postgres | 5 | Computer Science
17 | Practicing Postgres | 9 | Databases
17 | Practicing Postgres | 10 | Postgres
18 | UCBerkeley Archives: Ingres | 5 | Computer Science
18 | UCBerkeley Archives: Ingres | 9 | Databases
18 | UCBerkeley Archives: Ingres | 11 | Ingres
19 | Relational Technology, Inc. | 5 | Computer Science
19 | Relational Technology, Inc. | 9 | Databases
19 | Relational Technology, Inc. | 11 | Ingres
(45 rows)
|
These pieces of examples raise even more questions. I'll raise some key
related questions and leave the solutions as an exercise to the reader.
- What, if anything, would need to change in the table structures to
support a parent to child topic search?
- How would the function(s) be to show the children trees of a
a topic node?
- Since these path lists are ordered, what changes, if any, would
be required to sort the results of get_topic_node() or get_item_path()
by item or base topic?
The code and
the data used
for this article are available in Tidbits.
Contributors:
elein @ varlena.com
|