|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
In this article, Graphing with pl/R, by Robert Bernier you will get a brief, but practical introduction to R, the open source version of the statisical language, S, and pl/R with the priceless example of creating graphs of data. Honestly, creating graphs from data has been a very frustrating problem for me. I have found the tools available, in general, woefully inadequate. Hardcoding the graph requirements instead of creating templates for them seemed inelegant at best. With this hands on example, I think I've found the next addition to my set of reporting tools. Happy Graphing!
This is what happened as I tried to follow the instructions and the intent of the instructions from Robert Bernier's article Graphing with pl/R, and Joe Conway's Instructions on installing plr. Following directions is a difficult thing to do sometimes. Also, for those of us on an older linux system and without Debian, building things piecemeal can prove frustrating. (My upgrade is due any day now!) The R source can be obtained from The R-Project. However, I (and others) ran into problems with the fortran compiler building R. (Yes, it needs a fortan compiler.) After wrangling copy of f2c, I found that gcc and f77 simply could not agree on the size of an int and/or a double. I ended up finding a binary copy of R in RPM form from The R-Project and using that. When it came time to define the language in the database, I took Joe's document's advice and used the contrib/plr/plr.sql instead of the language definition function in Robert's document. I noticed that plr.sql provided additional interesting functions. When I defined the language using plr.sql I got bit by the fact that I had not defined R_HOME before starting the server. There is a warning about this in Joe's document (see above: following directions is difficult). I also found that the plr modules could not find libR.so. I copied libR.so into the postgres installation lib and finally was able to define the language and run Robert's test function. This is not the Right fix, though. The correct fix, thanks to Joe, is to make an entry in /etc/ld.so.conf for where libR is located, since it isn't in any of the system standard places. It is usually in /usr/local/lib/R/bin or $R_HOME/bin. After you add the entry, you must then run ldconfig. In summary:
For even more information on plr I highly recommend revisiting Joe Conway's OSCON 2003 presentation PostgreSQL - embedded Statistical Analysis with PL/R
The problem at hand is to track changes to a table. This is one solution of many solutions available. And this one has (at least) one bug in it. (Your challenge is to find the bug.) We have created all of the tables we want to be logged with four columns named: -- -- change log fields --- log_user text, log_create_ts timestamp, log_mod_ts timestamp, log_comment textEvery table we want logged also has a shadow log table named log_<tablename> with the same structure as the original table. On an insert into any of these tables, the log_user and log_create_ts fields are set to current_user and now(), respectively. This is done via a trigger. When any of these tables are updated, then the old row is written to its shadow log table. The new row then has the log_mod_ts field updated to now(). On inserts or updates, the caller can optionally add a comment, either from the user or programmatically (or both) to inform the reason for change. On deletes, the row in the base table is deleted, but the logged rows remain. This sounds fairly simple. The insert trigger function is straight forward. The same plpgsql function can be used for all tables since they are using the same column name. -- -- Standard Generic Table Logging: ALL FIELDS -- Required fields: log_comment, log_user, log_mod_ts -- Tables are The update trigger is a bit harder. In most languages, you would have to write a separate function for each table because you don't know the column names. This is not true for plpython. We can write a generic function to insert a record with the keys to the dictionary that the OLD row is stored in. This is the function: CREATE OR REPLACE FUNCTION logme() RETURNS TRIGGER AS ' def mogrify(v): if v == None: v = "NULL" else: try: x = v + 1 v = str(v) except: v = "''%s''" % v return v NEW = TD["new"] OLD = TD["old"] if not NEW["log_comment"]: NEW["log_comment"] = "Updated" r = plpy.execute("select now() as log_mod_ts", 1) NEW["log_mod_ts"] = r[0]["log_mod_ts"] r = plpy.execute("select relname from pg_class where oid = %(relid)s" % TD, 1) relname = "log_" + r[0]["relname"] vals = [ mogrify(OLD[k]) for k in OLD.keys() ] cols = ", ".join(OLD.keys()) vals = ", ".join(vals) insstr = "insert into %s (%s) values (%s)" % (relname, cols, vals) r = plpy.execute(insstr, 0) return "MODIFY" ' LANGUAGE plpythonu; First we have a mogrify function which ensures that "None" is converted to the string "NULL" and makes any numbers into a string. Everything else it quotes. (bug hint) Then we rename TD["new"] and TD["old"] to make the code more friendly. The first order of business is to default the comment, if there isn't one. Then we need to set the modification timestamp in the NEW row. To set the modification timestamp we need to execute a quick query to get now(). The result set is in r. We get the first row, 0, column log_mod_ts from the query result. Then we execute another query to get the relname, the table name from the TD dictionary item TD["relid"]. TD["relid"] is one of the values provided in the TD dictionary. (In plpgsql, you are provided with the variable RELNAME, but that is not surfaced in plpython.) Now on to the good stuff. We create a list of the old values by looping through the keys and mogrifying the values. Then we make them comma separated strings and sprint them into the insert statement. The insert statement is then executed and we are done. Since we have modified the NEW row with the modification timestamp, we must return "MODIFY" to let the system know to make the change. Now lets see it in action. First we'll create a table and its shadow, define the functions in pylog.sql and define the triggers. elein=# create table books ( id SERIAL, title text, author text, -- -- change log fields --- log_user text, log_create_ts timestamp, log_mod_ts timestamp, log_comment text ); elein=# create table log_books as select * from books; elein=# \i pylog.sql CREATE FUNCTION CREATE FUNCTION elein=# CREATE TRIGGER log_books BEFORE UPDATE ON books FOR EACH ROW EXECUTE PROCEDURE logme(); elein=# CREATE TRIGGER ins_books BEFORE INSERT ON books FOR EACH ROW EXECUTE PROCEDURE insert_trig();So lets try it out. elein=# insert into books values (default, 'Art Objects', 'Jeanete Winterson'); elein=# insert into books values (default, 'The Sound and the Feury', 'William Faulkner'); elein=# select * from books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------------------+-------------------+----------+ ---------------------+------------+------------- 1 | Art Objects | Jeanete Winterson | elein | 2004-03-13 21:36:37 | | Inserted 2 | The Sound and the Feury | William Faulkner | elein | 2004-03-13 21:36:37 | | Inserted (2 rows) elein=# update books set author='Jeanette Winterson', elein-# log_comment='correct author' elein-# where id = 1; UPDATE 1 elein=# elein=# select * from books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------------------+--------------------+----------+ -------------------+---------------------+----- ---------------- 2 | The Sound and the Feury | William Faulkner | elein | 2004-03-13 21:36:37| | Inserted 1 | Art Objects | Jeanette Winterson | elein | 2004-03-13 21:36:37| 2004-03-13 21:36:41 | correct author (2 rows) elein=# select * from log_books; id | title | author | log_user | log_create_ts | log_mod_ts | log_comment ----+-------------+-------------------+----------+ --------------------+------------+------------- 1 | Art Objects | Jeanete Winterson | elein | 2004-03-13 21:36:37| | Inserted (1 row)The current row will always have the reason for the most current change and the current modification timestamp. In addition to the bug mentioned above, the glaring feature that is missed is the ability to only track those fields that changed. To do this in the plpython function is trivial. What is difficult is finding the key or keys to the table so that the data in the log table would be useful. Of course this data is also available in the system catalogs and the implementation is left as an exercise to the reader. Special thanks to David Gould for making my original clumsy plython function into something much more pythonesque.
|
|||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |