This is a special rollup issue covering many different articles from the archives about Triggers. I hope you enjoy this collection.
A bad trigger was created to update an id with the next value of a sequence.
CREATE OR REPLACE FUNCTION "f_products"() RETURNS "opaque" AS ' BEGIN NEW.ID = NEXTVAL(''products_id_seq''); RETURN NEW; END' LANGUAGE 'plpgsql' CREATE TRIGGER "t_classes" AFTER INSERT ON "products" FOR EACH ROW EXECUTE PROCEDURE f_products();
This does not work because to change a value in the row, you must use a BEFORE trigger rather than an AFTER trigger.
It was also pointed out that it is much easier to use a SERIAL data type or to have a DEFAULT set to the value of nextval() to achieve the same effect. You can find more informatino in the documentation.
The following is a trigger was created to validate the tag_number column for uniqueness on update. However, when columns other than tag_number are updated, they are rejected. When, say, column item_name is updated, both the OLD and the NEW records will have the same value for tag_number and therefore the exception will be raised.
CREATE OR REPLACE FUNCTION validate_tag_number() RETURNS OPAQUE AS ' BEGIN IF new.tag_number=old.tag_number THEN RAISE EXCEPTION ''Number is already present ''; END IF; RETURN NEW; END; 'LANGUAGE 'plpgsql'; CREATE TRIGGER validate_tag_number BEFORE UPDATE ON tag_vendor FOR EACH ROW EXECUTE PROCEDURE validate_tag_number();
What you really want to do to ensure uniqueness is to define the column with a unique index by declaring it to be UNIQUE.
Another issue with bulk loading is triggers firing with each row inserted. If you are sure your data is trustworthy and already meets your referential integrity requirements, you can turn off triggers for the bulk load and turn them back on immediately afterward. You should not use this option when your data is not completely clean.
The reltriggers field in the pg_class table contains the number of triggers active for each table. It can be set to 0 the disable the triggers, but will need to be reset to the proper number of triggers to have them re-enabled.
UPDATE "pg_class" SET "reltriggers" = 0 WHERE "relname" = 'tablename'; UPDATE pg_class SET reltriggers = ( SELECT count(*) FROM pg_trigger where pg_class.oid = tgrelid) WHERE relname = 'table name';
The decision factor for rules vs. triggers is that rules are for rewriting the query and triggers are for changes executed on rows. The timing is different. Rules are invoked before a query gets executed and may affect the statement to be executed. The rule is only invoked at the start, but the ultimate statement may affect many rows.
A trigger, on the other hand, is a reaction to a change of data. During the statement execution, if there is a trigger PER ROW, each row affected will fire off the trigger. The triggers PER STATEMENT are not yet implemented so they are not considered here.
It is a common mistake to try to use a rule to track input and delete counts, for example. A rule which adds or subtracts a counter in some table will fire once, regardless of whether the statement affects one or many rows. In the example in the item above, if rules were used instead of triggers then when several rows were deleted by a single SQL statement the count would only have been decremented by one. Likewise, when we inserted multiple rows in one statement, the row count would have been incremented by one.
A good use of rules is to enable updates, inserts and deletes for views. These rules can redirect the data into logical statements operating on the underlying tables.
Both plpgsql and plpython functions can be trigger functions. The return values are a different for the plpython functions in keeping with the plpython language. This item describes plpgsql behavior only.
As of 7.3 trigger functions needed to be created to return type trigger. The previously defined return value, opaque, has been deprecated.
The information value returned from plpgsql triggers is row data and an indication of the disposition of the row. Even if NULL is returned, as is legal in certain situations, it indicates (non-)data and a disposition.
If the function is not returning NULL, it is usually returning the row structure that was made available for the function via variables NEW and OLD. INSERTS only have a NEW row. UPDATES have both NEW and OLD rows. DELETES only have an OLD row. The value returned from the function is usually NULL or NEW, depending on what the function is supposed to do.
A BEFORE trigger is fired BEFORE the row is acted upon-- before it is inserted, updated or deleted. But only the BEFORE UPDATE or BEFORE INSERT trigger can modify the row being inserted or updated. The UPDATE or INSERT can be ignored by returning NULL instead of NEW. This does not abort the transaction.
An AFTER trigger occurs after the insertion, update or deletion occurs. It cannot change the data or void the action on the particular row.
All triggers, however, can raise an error to abort the transaction.
In plpgsql trigger functions the row triggering the function is defined in SQL variables so that the values are available to the trigger functions. NEW is the row about to be inserted or updated. OLD is the row as it was before the update. This table shows which statement types have what variables available.
These column values of these variables are accessed using the "dot" syntax like this:
BEFORE triggers on INSERT and UPDATE statements allow you to update or change column values based on your trigger's criteria. In UPDATE triggers, this criteria can be based on what the row looked like before the insert started.
If the task at hand is to change or assign column values in the row in question, you simply make the assignment to the NEW variable. No separate update is required.
This is an example of a plpgsql trigger function for BEFORE UPDATE of a salary table. If salary is the field being updated, the previous salary is saved in a separate column, oldsalary and the approved by field is set to NULL so the raise can be approved by some other mechanism.
CREATE OR REPLACE FUNCTION newsal() returns TRIGGER as ' BEGIN IF NEW.salary <> OLD.salary,0 THEN NEW.oldsalary = OLD.salary; NEW.approvedby = NULL; END IF; RETURN NEW; END; ' language 'plpgsql'; CREATE TRIGGER newsal BEFORE UPDATE on salary FOR EACH ROW EXECUTE PROCEDURE newsal();
A transaction in postgreSQL encompasses all of the functions and triggers in the transaction. If you are not in a block transaction (BEGIN ... COMMIT) the single primary statement initiates the transaction.
In this (nonsense) SQL, the primary statement is INSERT. Let us also suppose there is a BEFORE INSERT trigger on thattable which calls yet another function checkme().
INSERT INTO thattable SELECT f(id), x(sdate), y(stuff) FROM mytable WHERE id IN (SELECT id FROM anothertable a, badaccts b WHERE a.id = b.id);The execution of this statement invokes all of the triggers, functions and sub-selects in the context of the single transaction begun with INSERT. This includes the functions f(), x(), y(), the trigger function checkme() and the subselect in the last WHERE clause. Any cast functions invoked implicitly are also included, as well as any input or output functions for the data types. If there is an error in any of the SQL or auxiliary functions the transaction as a whole will be rolled back.
Suppose we had a table representing processes tasks like this:
=# \d procs Table "procs" Column | Type | Modifiers -----------+-----------------------+----------- pid | integer | not null task | character varying(10) | not null owner | character varying(10) | not null IsActive | boolean |The tasks are executed in single-threaded mode, one after the other. No two processes can be active at the same moment. What is wanted is the exclusive OR behavior like a radio button set.
The problem is how to keep the table updated with one and only one record in the table with IsActive set as TRUE. When a process is made active active, any other active process must be set to inactive.
The implementation is a trigger that fires when a processes updated to active which updates the row that was previously set to active as inactive. Here is the function and trigger.
CREATE TRIGGER proc_current AFTER UPDATE OR INSERT ON procs FOR EACH ROW EXECUTE PROCEDURE proc_current_trigger(); CREATE FUNCTION proc_current_trigger() RETURNS TRIGGER AS ' BEGIN IF NEW.IsActive THEN UPDATE procs SET IsActive = ''f'' WHERE pid != NEW.pid AND IsActive = ''t''; END IF; RETURN NEW; END' LANGUAGE 'plpgsql';
Both INSERT and UPDATE cases are being handled by the trigger to ensure that integrity remains. Because there is no DELETE trigger and no action when a process is set to inactive, it is possible to have no active processes.
The function checks to see if the new or updated row is set to active, and if so, unset all (should only be one!) other rows to inactive. The other is accomplished by excluding the new row by pid in the where clause.
Constraints are restrictions on columns in a table definition. There are several kinds of constraints available for both the table and the columns and they are implemented in several ways. Some constraints are implemented as triggers, some as indexes, and some are attributes of a column definition.
There are some hints when you create a table what triggers and indexes are created for references and primary keys. These are those notices:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 't1_pkey' for table 't1' NOTICE: CREATE TABLE / UNIQUE will create implicit index 't1_unq_key' for table 't1' NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)But when those messages long gone, it is nice to know exactly how the constraints defined are being implemented. This is a list of possible constraints and how they are implemented. The letter in parentheses indicates the constraint type:
References and Foreign Keys
A table which defines a FOREIGN KEY reference also implicitly installs update and delete triggers on the table being referenced. In the case above, the trigger implemented the default NO ACTION behaviour on the referenced tables. If the foreign key were defined with different directives, one of the other functions would be installed as the trigger instead. These are the internal functions which are used for referential integrity constraints.
Foreign key constraints names have the format $1, $2, etc. The trigger names are constructed using oid based numbers.
Checks and other Constraints
DEFAULT values and NOT NULL are conceptually constraints, but are not implemented like other constraints. These limitations are attributes of columns and are stored in the pg_attributes and pg_attrdef tables.
create table showconstraints ( id integer PRIMARY KEY, unq integer CONSTRAINT sc_unq UNIQUE, checkme integer CONSTRAINT posval CHECK ( checkme > 0 ) );
-- What indexes are on my table? select * from pg_indexes where tablename = 'tablename'; -- What triggers are on my table? select c.relname as "Table", t.tgname as "Trigger Name", t.tgconstrname as "Constraint Name", t.tgenabled as "Enabled", t.tgisconstraint as "Is Constraint", cc.relname as "Referenced Table", p.proname as "Function Name" from pg_trigger t, pg_class c, pg_class cc, pg_proc p where t.tgfoid = p.oid and t.tgrelid = c.oid and t.tgconstrrelid = cc.oid and c.relname = 'tablename'; -- What constraints are on my table? select r.relname as "Table", c.conname as "Constraint Name", contype as "Constraint Type", conkey as "Key Columns", confkey as "Foreign Columns", consrc as "Source" from pg_class r, pg_constraint c where r.oid = c.conrelid and relname = 'tablename';
A trigger to update the row count is needed. The actual row count is important for the application and count(*) by nature is too slow.
The way to implement a proper row counter is to create a trigger on the table which needs to be counted. This trigger will increment or decrement the count on insert or delete.
There are a couple of interesting issues with the row counting implementation. The first is that the counter trigger function can work for both insert and delete and with any table by using trigger information in plpgsql. The second is the issue of initializing the rowcount table.
Suppose you had the table mystuff and set up the table rowcount to hold the row count of many tables.
CREATE TABLE mystuff ( name text NOT NULL, description text, PRIMARY KEY (name)); CREATE TABLE rowcount ( table_name text NOT NULL, total_rows bigint, PRIMARY KEY (table_name));
Most people are familiar with the use of OLD and NEW variables in trigger functions, but there are several other informational variables available.
The count_rows() trigger function uses TG_OP and TG_RELNAME. TG_OP indicates whether it is a DELETE or INSERT and TG_RELNAME is used to store the count in the countrows table by table name. This is the function:
CREATE OR REPLACE FUNCTION count_rows() RETURNS TRIGGER AS ' BEGIN IF TG_OP = ''INSERT'' THEN UPDATE rowcount SET total_rows = total_rows + 1 WHERE table_name = TG_RELNAME; ELSIF TG_OP = ''DELETE'' THEN UPDATE rowcount SET total_rows = total_rows - 1 WHERE table_name = TG_RELNAME; END IF; RETURN NULL; END; ' LANGUAGE plpgsql;
There are many ways the various TG variables can be used to write generic triggers. Different courses of action can be taken based on the exact definition of the trigger. The data in the row can be accessed via NEW and OLD and information about their datatypes is available if decisions or actions need to be done based on the data type.
Now we will look at what is necessary to initialize the trigger. If the trigger function and the tables are all defined in the same transaction, then initialization is not necessary, except for the creation of the row in rowcount for the table in question. However, most of the time if row counting functionality is being added to an existing table, the base row count must be ascertained.
For an existing table, the initialization must be done in a single transaction. The target table is locked to prevent updates during this initialization. Then the trigger is created and the rowcount is inserted with the current row count. Once the initialization transaction is committed, then your counter is operational. Don't forget to test it!
BEGIN; -- Make sure no rows can be added to mystuff until we have finished LOCK TABLE mystuff IN SHARE ROW EXCLUSIVE MODE; create TRIGGER countrows AFTER INSERT OR DELETE on mystuff FOR EACH ROW EXECUTE PROCEDURE count_rows(); -- Initialise the row count record DELETE FROM rowcount WHERE table_name = 'mystuff'; INSERT INTO rowcount (table_name, total_rows) VALUES ('mystuff', (SELECT COUNT(*) FROM mystuff)); COMMIT; -- -- Testing -- insert into mystuff values ('abacus','mathmatics'); insert into mystuff values ('bee','insect'); select * from rowcount; insert into mystuff values ('dog','pet'); insert into mystuff values ('cathedral','building'); select * from rowcount; select * from mystuff; delete from mystuff where name='abacus'; select * from rowcount; select * from mystuff;
The topic of using a trigger to count rows was discussed in Article #49-1. A subsequent question was raised regarding the effect of multiple triggers on the logic of the counter. The question was whether it was possible for one trigger to increment the counter and then have the subsequent trigger abort the original statement.
When a statement is executed one or more triggers are also executed to implement foreign key integrity, additional functionality such as the row counter and also CHECK clauses. All of the triggers are executed in the context of the transaction of the original statement. They are executed (arbitrarily) in alphabetical order. If any one of the triggers cause the original statement to be aborted, then all of the triggers' actions will also be rolled back. Everything is in the same transaction.
If the trigger for counting succeeded and a subsequent trigger rolled back the transaction, then the effects of the counting trigger would also be rolled back.
This is a nice little view which will list triggers in your database. This is a very helpful addition to your pg_catalog utilities because there is no psql utility to show triggers individually. Triggers are usually shown in relation to the tables they are attached to.
create view showtriggers as select trg.tgname as trigger_name , tbl.relname as table_name, p.proname as function_name, case trg.tgtype & cast(2 as int2) when 0 then 'AFTER' else 'BEFORE' end as trigger_type, case trg.tgtype & cast(28 as int2) when 16 then 'UPDATE' when 8 then 'DELETE' when 4 then 'INSERT' when 20 then 'INSERT, UPDATE' when 28 then 'INSERT, UPDATE, DELETE' when 24 then 'UPDATE, DELETE' when 12 then 'INSERT, DELETE' end as trigger_event from pg_trigger trg, pg_class tbl, pg_proc p where trg.tgrelid = tbl.oid and trg.tgfoid = p.oid;
One technique for tracking updates of a table is to store the old values into a another table as a log of changes. All or most of the columns of the original table are duplicated with in the log table and a timestamp or other key is added.
This technique is best done with a trigger instead of an UPDATE RULE, particularly if the UPDATE RULE was to apply to a view. A possible update rule might be this:
CREATE RULE update_org AS ON UPDATE TO orig_table_view DO INSTEAD ( UPDATE orig_table set col=NEW.col... INSERT INTO log_table ...( old.col... ) );This rule will not work because when the original table is updated, the OLD values have changed and are visible. Therefore the insert into the log table logs the new values rather than the old ones. If you switch the order of the UPDATE and INSERT statements, this RULE should work for logging.
However, with TRIGGERS you do not need to wrestle with the visibility rules. You will still need the rule for updating the view. The trigger will be on the base table and insert the log record only.
To both techniques in action see Logging Updates.
It is very simple to add a column representing an modification timestamp to any row in any table. This question is asked fairly often, particularly from MySQL converts. MySQL supports a feature where the first timestamp typed column in a table is automatically updated to the current timestamp any time a row is inserted or updated. To avoid the automatic update in MySQL you must use a datetime type instead of the timestamp type. (Some believe it is sacrilegious for the server to update data without specific requests, but the feature is useful.)
If you have a modification timestamp column with the DEFAULT set to now(), then each INSERT will record the current timestamp. This only works for INSERTS and not UPDATES since the DEFAULT values are only evaluated for INSERTS.
To update the modification timestamp on update, you will need a very simple function and a very simple trigger. If you had several tables with the same column name for the modification timestamp, you can reuse the function, but you will need to define the trigger for each table explicitly.
The trigger function, simply sets the NEW column to now().
create or replace function ts_row() returns TRIGGER as ' BEGIN NEW.mod_date = now(); RETURN NEW; END; ' language 'plpgsql';
For any table which has a timestamp column named mod_date, this function can be invoked as an UPDATE trigger.
create trigger upd_any_table BEFORE UPDATE ON any_table for each row EXECUTE PROCEDURE ts_row(); create trigger upd_other_table BEFORE UPDATE ON other_table for each row EXECUTE PROCEDURE ts_row();
Further discussion on this topic included a discussion on the implementation of the modification timestamp column as a RULE instead of a TRIGGER and a proposal to add an "ON UPDATE" qualification to the DEFAULT specification. RULES require more mechanics since they require an updatable view and the rule needs to be qualified so that it is not recursive. It was also pointed out that adding an ON UPDATE to the DEFAULT specification would not (necessarily) override explicit updates to the modification timestamp field.
Timestamping of rows in conjunction with the system of no deletions used to be a part of the University Postgres and was called "timetravel". Timetravel was the ability (pre-vacuum) to qualify a data set based on a time frame and enabled you to ask what the data values were for a week ago at 2:00pm. Timetravel was removed from PostgreSQL because the overhead of row size and update costs was considered too high. Timetravel is not too hard to reconstruct, however, The logging of a modification timestamp as described above is the key requirements.
There are many different models for handling tree structures in SQL. Google it and see what you can find. Here I am presenting a practical and relatively simple approach to a trees with an example using items and topics. The example is based on, say, a google search. There are many topics under which are items (eg. URLS) as well as other topics.
The topics are the structures of the tree and the items are the
leaves. This example assumes a web interface, but we will not go
into that here. What we will concentrate on is handling the selection
and manipulation of the tree that will facilitate writing a good
To create the basic structure of the tree there will be an item table, a topic table and an item_topic table. Each topic in the topic table will reference its parent topic, creating the tree. The item_topic table is an aggregate table which maps items to topics without limiting any item to just one topic.
CREATE TABLE items ( item_id SERIAL PRIMARY KEY, item_name TEXT); CREATE TABLE topics ( topic_id SERIAL PRIMARY KEY, topic_name TEXT, parent_id INTEGER REFERENCES topics (topic_id) ); CREATE TABLE item_topics ( item_id INTEGER, topic_id INTEGER, FOREIGN KEY (item_id) REFERENCES items (item_id) ON UPDATE CASCADE ON DELETE CASCADE, FOREIGN KEY (topic_id) REFERENCES topics (topic_id) ON UPDATE CASCADE PRIMARY KEY (item_id, topic_id) );
To insert an item, an insert into both the item table and the topic table is required. In the following example, a topic and item are inserted and then the item_topics record is created the hard way. In the user interface one would have a nice interface which allowed you to assign topics to items--a drop down or a list.
INSERT INTO topics VALUES (default, 'History', NULL ); INSERT INTO items VALUES (default, 'Studying History' ); INSERT INTO item_topics VALUES ( (SELECT item_id FROM items WHERE item_name = 'Studying History'), (SELECT topic_id FROM topics WHERE topic_name='History' ));The aggregate table, item_topics, should be updated if ever the item table key or the topic table key is updated. If an item is deleted, it should be removed from the aggregate table. The CASCADE directives implement this policy.
A view of the item_topics table would probably be helpful to the user interface developer. This would show the pertinent data from joining the three tables together. Here we can also see some "real" data.
CREATE VIEW it AS SELECT i.item_id, i.item_name, it.topic_id, t.parent_id, t.topic_name FROM item_topics it JOIN items i USING (item_id) JOIN topics t USING (topic_id); elein=# SELECT item_name, topic_name FROM it; item_name | topic_name ------------------------------------+----------------------- Studying History | History Spanish California | California History Gold Rush | California History Dot Coms | California History American Presidents | American History Civil War | American History Civil Rights | American History The Making of the Constitution | American History The Golden Gate | San Francisco History Digging up Market Street | San Francisco History Ada Lovelace: Scientific Computing | History Ada Lovelace: Scientific Computing | Ada The Camel Book | Perl Bad Perl, Good Perl | Perl Comparing Databases | Databases General Bits | Postgres Practicing Postgres | Postgres UCBerkeley Archives: Ingres | Ingres Relational Technology, Inc. | Ingres (19 rows)
But what happens if a topic is deleted? If the topic had any items and we deleted the topic, the items would be orphans if they just happened to not be associated with another topic. If the topics had any children topics, they would lose their link to their grandparents, possibly disrupting the associations of the tree.
We could adopt a policy of not deleting topics, that does not seem very helpful. This is what we've decided to do if a topic is deleted. If the topic to be deleted had both items and a parent, then the parent would inherit the deleted topic's items. If a topic to be deleted had child topics and a parent topic, then those children will have their parent id reassigned to their "grand"parent.
The way to implement this policy is with a trigger on the deletion of a topic. The trigger function is del_topic(). If there are items for the topic but there is parent of the topic, an exception is raised with a helpful (we hope) error message. Otherwise, each item is reassigned to it's previous topic's parent. That takes care of the items. The children are easier. Any topic which had this topic as a parent is reassigned to this topic's parent.
CREATE OR REPLACE function del_topic() RETURNS TRIGGER AS ' DECLARE r_rec RECORD; BEGIN FOR r_rec IN SELECT item_id, topic_id FROM item_topics WHERE topic_id = OLD.topic_id LOOP IF OLD.parent_id IS NULL THEN RAISE EXCEPTION ''Cannot delete topic % until its records are reassigned.'', OLD.topic_name; ELSE UPDATE item_topics SET topic_id = OLD.parent_id WHERE rec_id = r_id AND topic_id = t_id; END IF; LOOP; UPDATE topics SET parent_id=OLD.parent_id WHERE parent_id = OLD.topic_id; RETURN OLD; END; ' language 'plpgsql'; CREATE TRIGGER del_topic BEFORE DELETE ON topics FOR EACH ROW EXECUTE PROCEDURE del_topic();
We have outlined the variations of adding, updating and deleting both topics and items using CASCADE directives and a trigger. Those variations we did not handle explicitly will work correctly via standard inserts, updates and deletes.
The next thing to think about is to provide ways to access the tree hierarchy. The developer will want a way to trace the topic path of an item or a topic. We will look at two ways here. The first way is to provide the path as a comma separated list. The second way is to expanded the path and return an ordered set of tuples.
To show the path as a comma separated list, we must start with an item's topic and recurse up the tree, concatenating (aggregating) the topic nodes as we go.
CREATE OR REPLACE FUNCTION get_topic_path( integer ) RETURNS TEXT AS ' DECLARE path text; topic RECORD; BEGIN SELECT INTO topic topic_name, parent_id FROM topics WHERE topic_id = $1; path := topic.topic_name; IF topic.parent_id IS NOT NULL THEN path := (SELECT get_topic_path(topic.parent_id)) || '', '' || path; END IF; RETURN path; END; ' LANGUAGE 'plpgsql';The function get_topic_path() can be used to get both topic paths by topic or topic paths by item.
elein=# select topic_name, get_topic_path( topic_id ) from topics; topic_name | get_topic_path -----------------------+---------------------------------------------------- History | History California History | History, California History American History | History, American History San Francisco History | History, California History, San Francisco History Computer Science | Computer Science Computer Languages | Computer Science, Computer Languages Ada | Computer Science, Computer Languages, Ada Perl | Computer Science, Computer Languages, Perl Databases | Computer Science, Databases Postgres | Computer Science, Databases, Postgres Ingres | Computer Science, Databases, Ingres (11 rows) elein=# select item_name, get_topic_path(topic_id) from it; item_name | get_topic_path ------------------------------------+---------------------------------------------------- Studying History | History Spanish California | History, California History Gold Rush | History, California History Dot Coms | History, California History American Presidents | History, American History Civil War | History, American History Civil Rights | History, American History The Making of the Constitution | History, American History The Golden Gate | History, California History, San Francisco History Digging up Market Street | History, California History, San Francisco History Ada Lovelace: Scientific Computing | History Ada Lovelace: Scientific Computing | Computer Science, Computer Languages, Ada The Camel Book | Computer Science, Computer Languages, Perl Bad Perl, Good Perl | Computer Science, Computer Languages, Perl General Bits | Computer Science, Postgres Practicing Postgres | Computer Science, Postgres UCBerkeley Archives: Ingres | Computer Science, Ingres Relational Technology, Inc. | Computer Science, Ingres Comparing Databases | Computer Science (19 rows)
With the function get_topic_path() we can test the delete trigger fairly easily. We will delete the topic 'Databases' and expect that the child nodes adopt 'Computer Science' as parents and the item 'Comparing Databases' will be reassigned to topic 'Computer Science'. In the selection below, this is exactly what is shown.
elein=# delete from topics where topic_id = 9; DELETE 1 elein=# select item_id, item_name, get_topic_path(topic_id) from it; elein=# select item_id, item_name, get_topic_path(topic_id) from it; item_id | item_name | get_topic_path ---------+------------------------------------+---------------------------------- [ ... rows deleted for readability ...] 16 | General Bits | Computer Science, Postgres 17 | Practicing Postgres | Computer Science, Postgres 18 | UCBerkeley Archives: Ingres | Computer Science, Ingres 19 | Relational Technology, Inc. | Computer Science, Ingres 15 | Comparing Databases | Computer Science (19 rows)
We are assuming that a comma separated list is a good way for the user interface to get the path of an item or topic. However, sometimes it might be better for the user interface to slice this a little bit differently. Another way to do this is trickier. A function which returns sets of tuples will be used to select an ordered set of tuples, each of which represents the next node in the path.
In this case let's look at the results first. The selection is for topic id 11, 'Ingres'. We saw in one of the previous queries that the path to this topic is: "Computer Science->Databases->Ingres" We are joining topics into the query twice; once for the base topic name and once for the name of the topic on the path. The order here is pertinent. The topic id numbers do not necessarily correspond to the order of the tree from the top down. We are relying strictly on the parent id link.
elein=# select t.topic_name as base_topic, tn_id, t2.topic_name from topics t, get_topic_node(11) g, topics t2 where t.topic_id = 11 and t2.topic_id = g.tn_id; base_topic | tn_id | topic_name ------------+-------+------------------ Ingres | 5 | Computer Science Ingres | 9 | Databases Ingres | 11 | Ingres (3 rows)
So, let's take a look at the get_topic_node() function first. First the return tuple must be created as a proper composite type, topic_node. Then in the function, starting at the topic_id which was passed in, we fetch its parent, if there is one and then call get_topic_node() again on the parent topic. The end of the recursion occurs when the parent topic is empty.
When the highest tuple is fetched, it is returned to the previous caller. We return t2 for the parent node and t for the current node. The last return will return a NULL tuple signifying the end of the function.
CREATE TYPE topic_node AS (tn_id integer, tn_parent integer); CREATE or REPLACE FUNCTION get_topic_node( integer ) RETURNS SETOF topic_node AS ' DECLARE t_parent INTEGER; t topic_node; t2 topic_node; BEGIN FOR t IN SELECT topic_id, parent_id FROM topics WHERE topic_id = $1 LOOP IF t.tn_parent IS NOT NULL THEN FOR t2 IN SELECT * FROM get_topic_node(t.tn_parent) LOOP RETURN NEXT t2; END LOOP; END IF; RETURN NEXT t; END LOOP; RETURN t; END; ' language 'plpgsql';
As we saw in the query above, the tuple set returning function get_topic_node relies on a parameter. The parameter was hard coded. It is next to impossible to make the parameter part of the query. The technique we will use to parameterize the query is to wrap the function in another tuple set returning function and qualify the results of the second function. The pros and cons of this technique are flexibility vs. speed. The topic tree is not planned to be very large, but it is anticipated to be complex, so we are opting for flexibility. There are probably other, possibly better ways to do this.
This function, get_item_path() returns an item and its topic path in an ordered set of tuples by relying on the previous function get_topic_node()
CREATE TYPE item_path AS (item_id integer, topic_id integer); CREATE OR REPLACE FUNCTION item_path () RETURNS SETOF item_path AS ' DECLARE it item_path; i record; tn topic_node; BEGIN FOR i IN select item_id, topic_id from item_topics LOOP it.item_id = i.item_id; FOR tn IN SELECT * from get_topic_node ( i.topic_id ) LOOP it.topic_id = tn.tn_id; RETURN NEXT it; END LOOP; END LOOP; RETURN it; END; ' language 'plpgsql';We create a composite type, item_path for the function item_path() to return. This is a simple recursive call linking all topics to their topic paths. Each node is returned as an item_id and a topic_id. The order counts in the same way order counts for get_topic_path.
The query to show all topics paths for all items follows. To show only those for certain items, qualify the selection with item_id.
elein=# SELECT it.item_id, i.item_name, it.topic_id, t.topic_name FROM items i, topics t, item_path() it WHERE it.item_id = i.item_id AND it.topic_id = t.topic_id; item_id | item_name | topic_id | topic_name ---------+------------------------------------+----------+----------------------- 1 | Studying History | 1 | History 2 | Spanish California | 1 | History 2 | Spanish California | 2 | California History 3 | Gold Rush | 1 | History 3 | Gold Rush | 2 | California History 4 | Dot Coms | 1 | History 4 | Dot Coms | 2 | California History 5 | American Presidents | 1 | History 5 | American Presidents | 3 | American History 6 | Civil War | 1 | History 6 | Civil War | 3 | American History 7 | Civil Rights | 1 | History 7 | Civil Rights | 3 | American History 8 | The Making of the Constitution | 1 | History 8 | The Making of the Constitution | 3 | American History 9 | The Golden Gate | 1 | History 9 | The Golden Gate | 2 | California History 9 | The Golden Gate | 4 | San Francisco History 10 | Digging up Market Street | 1 | History 10 | Digging up Market Street | 2 | California History 10 | Digging up Market Street | 4 | San Francisco History 12 | Ada Lovelace: Scientific Computing | 1 | History 12 | Ada Lovelace: Scientific Computing | 5 | Computer Science 12 | Ada Lovelace: Scientific Computing | 6 | Computer Languages 12 | Ada Lovelace: Scientific Computing | 7 | Ada 13 | The Camel Book | 5 | Computer Science 13 | The Camel Book | 6 | Computer Languages 13 | The Camel Book | 8 | Perl 14 | Bad Perl, Good Perl | 5 | Computer Science 14 | Bad Perl, Good Perl | 6 | Computer Languages 14 | Bad Perl, Good Perl | 8 | Perl 15 | Comparing Databases | 5 | Computer Science 15 | Comparing Databases | 9 | Databases 16 | General Bits | 5 | Computer Science 16 | General Bits | 9 | Databases 16 | General Bits | 10 | Postgres 17 | Practicing Postgres | 5 | Computer Science 17 | Practicing Postgres | 9 | Databases 17 | Practicing Postgres | 10 | Postgres 18 | UCBerkeley Archives: Ingres | 5 | Computer Science 18 | UCBerkeley Archives: Ingres | 9 | Databases 18 | UCBerkeley Archives: Ingres | 11 | Ingres 19 | Relational Technology, Inc. | 5 | Computer Science 19 | Relational Technology, Inc. | 9 | Databases 19 | Relational Technology, Inc. | 11 | Ingres (45 rows)
These pieces of examples raise even more questions. I'll raise some key related questions and leave the solutions as an exercise to the reader.
The code and the data used for this article are available in Tidbits.
The problem at hand is to track changes to a table. This is one solution of many solutions available. And this one has (at least) one bug in it. (Your challenge is to find the bug.)
We have created all of the tables we want to be logged with four columns named:
-- -- change log fields --- log_user text, log_create_ts timestamp, log_mod_ts timestamp, log_comment textEvery table we want logged also has a shadow log table named log_<tablename> with the same structure as the original table.
On an insert into any of these tables, the log_user and log_create_ts fields are set to current_user and now(), respectively. This is done via a trigger. When any of these tables are updated, then the old row is written to its shadow log table. The new row then has the log_mod_ts field updated to now(). On inserts or updates, the caller can optionally add a comment, either from the user or programmatically (or both) to inform the reason for change. On deletes, the row in the base table is deleted, but the logged rows remain.
This sounds fairly simple. The insert trigger function is straight forward. The same plpgsql function can be used for all tables since they are using the same column name.
-- -- Standard Generic Table Logging: ALL FIELDS -- Required fields: log_comment, log_user, log_mod_ts -- Tables are
and log_ -- CREATE OR REPLACE FUNCTION insert_trig() RETURNS TRIGGER AS ' BEGIN NEW.log_user := coalesce(NEW.log_user,current_user) ; NEW.log_comment := coalesce( NEW.log_comment, ''Inserted''); NEW.log_create_ts := now(); RETURN NEW; END; ' LANGUAGE 'plpgsql';
The update trigger is a bit harder. In most languages, you would have to write a separate function for each table because you don't know the column names. This is not true for plpython. We can write a generic function to insert a record with the keys to the dictionary that the OLD row is stored in. This is the function:
CREATE OR REPLACE FUNCTION logme() RETURNS TRIGGER AS ' def mogrify(v): if v == None: v = "NULL" else: try: x = v + 1 v = str(v) except: v = "''%s''" % v return v NEW = TD["new"] OLD = TD["old"] if not NEW["log_comment"]: NEW["log_comment"] = "Updated" r = plpy.execute("select now() as log_mod_ts", 1) NEW["log_mod_ts"] = r["log_mod_ts"] r = plpy.execute("select relname from pg_class where oid = %(relid)s" % TD, 1) relname = "log_" + r["relname"] vals = [ mogrify(OLD[k]) for k in OLD.keys() ] cols = ", ".join(OLD.keys()) vals = ", ".join(vals) insstr = "insert into %s (%s) values (%s)" % (relname, cols, vals) r = plpy.execute(insstr, 0) return "MODIFY" ' LANGUAGE plpythonu;
First we have a mogrify function which ensures that "None" is converted to the string "NULL" and makes any numbers into a string. Everything else it quotes. (bug hint) Then we rename TD["new"] and TD["old"] to make the code more friendly.
The first order of business is to default the comment, if there isn't one. Then we need to set the modification timestamp in the NEW row. To set the modification timestamp we need to execute a quick query to get now(). The result set is in r. We get the first row, 0, column log_mod_ts from the query result.
Then we execute another query to get the relname, the table name from the TD dictionary item TD["relid"]. TD["relid"] is one of the values provided in the TD dictionary. (In plpgsql, you are provided with the variable RELNAME, but that is not surfaced in plpython.)
Now on to the good stuff. We create a list of the old values by looping through the keys and mogrifying the values. Then we make them comma separated strings and sprint them into the insert statement. The insert statement is then executed and we are done. Since we have modified the NEW row with the modification timestamp, we must return "MODIFY" to let the system know to make the change.
Now lets see it in action. First we'll create a table and its shadow, define the functions in pylog.sql and define the triggers.
elein=# create table books ( id SERIAL, title text, author text, -- -- change log fields --- log_user text, log_create_ts timestamp, log_mod_ts timestamp, log_comment text ); elein=# create table log_books as select * from books; elein=# \i pylog.sql CREATE FUNCTION CREATE FUNCTION elein=# CREATE TRIGGER log_books BEFORE UPDATE ON books FOR EACH ROW EXECUTE PROCEDURE logme(); elein=# CREATE TRIGGER ins_books BEFORE INSERT ON books FOR EACH ROW EXECUTE PROCEDURE insert_trig();So lets try it out.
elein=# insert into books values (default, 'Art Objects', 'Jeanete Winterson'); elein=# insert into books values (default, 'The Sound and the Feury', 'William Faulkner'); elein=# select * from books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------------------+-------------------+----------+ ---------------------+------------+------------- 1 | Art Objects | Jeanete Winterson | elein | 2004-03-13 21:36:37 | | Inserted 2 | The Sound and the Feury | William Faulkner | elein | 2004-03-13 21:36:37 | | Inserted (2 rows) elein=# update books set author='Jeanette Winterson', elein-# log_comment='correct author' elein-# where id = 1; UPDATE 1 elein=# elein=# select * from books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------------------+--------------------+----------+ -------------------+---------------------+----- ---------------- 2 | The Sound and the Feury | William Faulkner | elein | 2004-03-13 21:36:37| | Inserted 1 | Art Objects | Jeanette Winterson | elein | 2004-03-13 21:36:37| 2004-03-13 21:36:41 | correct author (2 rows) elein=# select * from log_books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------+-------------------+----------+ --------------------+------------+------------- 1 | Art Objects | Jeanete Winterson | elein | 2004-03-13 21:36:37| | Inserted (1 row)The current row will always have the reason for the most current change and the current modification timestamp.
In addition to the bug mentioned above, the glaring feature that is missed is the ability to only track those fields that changed. To do this in the plpython function is trivial. What is difficult is finding the key or keys to the table so that the data in the log table would be useful. Of course this data is also available in the system catalogs and the implementation is left as an exercise to the reader.
Special thanks to David Gould for making my original clumsy plython function into something much more pythonesque.
Rules, also known as rewrite rules, operate on a statement level. They should be used when an entire query should be substituted or rewritten. Triggers operate on a row level. A trigger function is executed for each row in the statement.
If you have an INSERT statement involved in a RULE and that statement includes a SERIAL column, the rewrite rule may not do what you want. Suppose there were a job_queue and job_queue_trace table and job_queue's primary key column was a SERIAL. When the job_queue was updated, a copy of the record is wanted in the trace table for auditing.
CREATE RULE rule_job_queue_trace AS ON INSERT TO job_queue DO INSERT INTO job_queue_trace (job_id,...) VALUES (NEW.job_id,...);This rule inserts jobs from the queue with an id one higher than the original job. The reason for this is that the substitution takes place per statement before the data is finalized. The SERIAL column NEW.job_id is effectively replaced by nextval(...) and the nextval() function is called twice--once for the original row and once for that traced row.
When you are operating on actual row values a TRIGGER does what you want. In this case this trigger would do the trick.
CREATE FUNCTION job_trace() RETURNS TRIGGER AS ' INSERT INTO job_queue_trace (job_id,...) VALUES (NEW.job_id,...); RETURN NEW; ' language 'SQL'; CREATE TRIGGER job_queue_trace BEFORE INSERT ON job_queue FOR EACH ROW EXECUTE PROCEDURE job_trace();
In this case, the job_id has already be calculated and the value of job_id is properly propagated to the job_trace_queue table.
With triggers you have the opportunity of changing the inserted or updated record before it gets written by accessing the tuple directly. It is not necessary to execute a separate insert into, the table and in fact that is the wrong thing to do.
If, for example, you have an insert trigger where
you want to set a field to a value on insert.
You do not want to execute
INSERT INTO table VALUES...
because that will trigger your trigger again.
This means your trigger will be called recursively until
In pl/python this trigger defined on table t1 will be called recursively
CREATE FUNCTION testing() RETURNS trigger AS' plan=plpy.prepare(''INSERT INTO t1 values ($1)'',[''text'']) plpy.execute(plan,[''blah'']) return ''MODIFY'' 'LANGUAGE plpythonu;and generate the following error:
DETAIL: exceptions.RuntimeError: maximum recursion depth exceeded.
To modify the tuple, change the NEW record and then return modify. In pl/pgsql the NEW record is called NEW and the fields can be accessed using NEW.column_name. In pl/python, NEW is accessed via the trigger dictionary, TD["new"] and the fields are access as TD["new"]["column_name"].
The following example shows an update trigger on a table. The trigger sets a modification timestamp. If anything in the tuple is changed, the previous value of the val column is saved in the prevval column. In either case, a note is updated with the information about the last update's effect.
drop table pytrig cascade; create table pytrig ( val integer, prevval text, modtime timestamp, notes text); drop trigger pytrig on pytrig; create or replace function pytrig() returns TRIGGER as ' x = plpy.execute( "select now() as modtime;",1 ) TD["new"]["modtime"] = x["modtime"] if TD["new"] <> TD["old"]: TD["new"]["prevval"] = TD["old"]["val"] TD["new"]["notes"] = "last update: old val <> new val" return "MODIFY" else: TD["new"]["notes"] = "last update: old val == new val" return "MODIFY" ' language 'plpython'; create trigger pytrig BEFORE UPDATE on pytrig for each row execute procedure pytrig(); insert into pytrig values (1,NULL, now(), NULL ); insert into pytrig values (2,NULL, now(), NULL ); select * from pytrig; update pytrig set val = val + 1; select * from pytrig; update pytrig set val = val; select * from pytrig;
You can run this script to see the results of the updates. This type of trigger can also be used as a general model for updating and logging changes.