|
||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
This issue contains several interesting articles that were contributed by some very helpful people. Due to (happy) family obligations, my own time this past week has been very limited. I expect to have this same dilemma for the issue on July 14 because I plan to be attending (and speaking at) O'Reilly's Open Source Conference the week before it is due. Content suggestions are always welcome, but are especially welcome for that issue. Suggestions for that issue are due by July 4. Special thanks to Sean Chittenden, Rao Kumar and the San Francisco PostgreSQL Users Group, and you for your patience.
In order to evaluation the overhead of calling plpgsql and C functions, Sean Chittenden wrote the following functions. The functions were run on an unloaded PIII 600Mz laptop. All times are the average of 50 calls to the function and timing is provided by EXPLAIN ANALYZE. The first functions compare plpgsql and a comparable C function. /* pl/pgsql code BEGIN */ CREATE OR REPLACE FUNCTION t1() RETURNS INT AS ' DECLARE v_i INT; BEGIN v_i := 1; RETURN v_i; END; ' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION t2() RETURNS INT AS ' DECLARE v_i INT; BEGIN SELECT 1 INTO v_i; RETURN v_i; END; ' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION t3() RETURNS INT AS '/tmp/t3.so' LANGUAGE C; /* End */ Where t3() is defined as: /* Begin */ #include "executor/spi.h" int t3(void); int t3(void) { int v_i; v_i = 1; return(v_i); } /* End */ These tests are intentionally very basic. They could all just return 1. But it's the assignment that counts even though the C version gets optimized to just return(1). That's the advantage of having a real compiler go to work for you. For the first call tests, a new connection is fired up for every iteration. Preloading didn't make any difference between whether or not the libraries were preloaded or not. In this set of results, the First call(1) is with plpgsql.so preloaded. First call(2) doesn't have plpgsql.so preloaded (new feature in pre-7.4). Function | First call(1) | First call(2) | Subsequent calls -----------+-----------------+-----------------+------------------- t1() | 2.35msec | 3.34msec | 0.24msec t2() | 2.46msec | 3.45msec | 0.28msec t3() | 0.37msec | 0.75msec | 0.16msec These results are not that interesting, but it shows what the overhead is for calling a function in the various circumstances. Here are a set of much more interesting tests, however, with t(4..6): CREATE OR REPLACE FUNCTION t4() RETURNS INT AS ' DECLARE v_i INT; ret INT; BEGIN FOR v_i IN 1..1000 LOOP ret := v_i; END LOOP; RETURN ret; END; ' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION t5() RETURNS INT AS ' DECLARE v_i INT; ret INT; BEGIN FOR v_i IN 1..1000 LOOP SELECT v_i INTO ret; END LOOP; RETURN ret; END; ' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION t6() RETURNS INT AS '/tmp/t6.so' LANGUAGE C; /* Begin C/t6() */ #include "executor/spi.h" int t6(void); int t6(void) { int v_i; int ret; for (v_i = 0; v_i < 1000; v_i++) ret = v_i; return(ret); } /* End C */ Here the difference starts to show. t5() uses SELECT INTO instead of :=. Every time SELECT INTO is used, it has to talk to the backend and has to setup a plan for the query, though plans are cached by pl/pgsql. Nothing quite touches C... it's unfazed by the extra 1,000 iterations. Function | First call(1) | First call(2) | Subsequent calls -----------+-----------------+-----------------+------------------- t4() | 6.16msec | 9.17msec | 3.54msec t5() | 34.52msec | 41.18msec | 29.19msec t6() | 0.37msec | 0.77msec | 0.16msec The C examples don't perform queries, but it wasn't necessary for the tests given. Fetching rows is IO bound and the laptop's HDD isn't server quality so purely mathematical functions were more appropriate for this level of testing. pl/pgsql has a nasty habit of copying data with ROWTYPEs/RECORDs, whereas in C you can use a fixed buffer and just realloc() the space.
Attached is a shell utility for listing and terminating postgreSQL sessions. This script will work with releases 7.2 and higher. This script uses kill -TERM, not kill -9 to terminate sessions. Script: pgsession pgsession : List/Kill database user sessions Usage: pgsession [OPTION]... [USER] Options: --h (help) show this help, then exit -l (list) list database sessions -k (kill) kill/terminate database sessions -f (force) force kill (do not ask for confirmation, use in conjunction with -k option) -u USER specify database user name -p PID specify database user process id (pid) Examples: pgsession -l list all sessions pgsession -l -uYou can download the script from pgsession on the Tidbits Page. Note: that a listing of sessions and their queries is also available in psql using: select * from pg_stat_activity;
If on your machine, getpagesize() returns 4096, does that mean that the page size for postgreSQL is 4096 bytes? The documentation for "Monitoring disk usage" says postgres page size is "usually" 8KB (8192 bytes). Which one is right? The postgreSQL pagesize is determined at compile time and given that most people don't fiddle it, it's going to be 8KB in 99.9% of cases.
The following query resulted in a "Cancelled Execution" message when using ODBC on windows and when the list contains several hundred items. select field1 from table where field2 in (list); The problem seems to be unique to the ODBC driver. The query executes correctly from psql and it also works correctly for much shorter lists. Two workarounds to this problem were suggested. The first suggestion created an equivalent statement using LIKE. SELECT field1 FROM table WHERE field2 IN (1,2,3,4,5,6,7); SELECT field1 FROM table WHERE ',1,2,3,4,5,6,7,' like '%,'||field2||',%'; The second workaround is probably faster at execution time. That suggestion is to put the list into a table, perhaps a temporary table, and change the query to join on the table. select field1 from table t, listtable l where t.field2 = l.list_item;
The talk I am giving at OSCON will be on Running Aggregates for PostgreSQL in plpython. This subject was covered briefly in Issue #23. At the conference the talk will have more examples and cover more detail. I look forward to publishing the extended examples and text after the conference. The short description of the talk is below. A perennial question brought up in many forums is how to perform functions on the current row based on the previous row. This talk will show one technique in detail. This technique can be applied to many different tasks to provide a new way of using your data. |
|||||||||||||||||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!. Copyright A. Elein Mustain 2003 |