PostgreSQL General Bits

By A. Elein Mustain		9-Jun-2003 Issue: 29

General Bits is a column based on the PostgreSQL mailing list pgsql-general. To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Esta Edição em Português
Archives
General Tidbits
Artigos Português
Google General Bits
Notices
To receive email notices of new issues of General Bits, email Elein.
ae Consulting PostgreSQL Design & Implementation
Subscriptions

This Issue
This Issue	08-June-2003

This issue contains several interesting articles that were contributed by some very helpful people. Due to (happy) family obligations, my own time this past week has been very limited.

I expect to have this same dilemma for the issue on July 14 because I plan to be attending (and speaking at) O'Reilly's Open Source Conference the week before it is due. Content suggestions are always welcome, but are especially welcome for that issue. Suggestions for that issue are due by July 4.

Special thanks to Sean Chittenden, Rao Kumar and the San Francisco PostgreSQL Users Group, and you for your patience.

Editor: elein at varlena.com

Testing function calling overhead
by Sean Chittenden	30-May-2003

In order to evaluation the overhead of calling plpgsql and C functions, Sean Chittenden wrote the following functions. The functions were run on an unloaded PIII 600Mz laptop. All times are the average of 50 calls to the function and timing is provided by EXPLAIN ANALYZE. The first functions compare plpgsql and a comparable C function.

	/* pl/pgsql code BEGIN */
	CREATE OR REPLACE FUNCTION t1() RETURNS INT AS '
	DECLARE
		v_i INT;
	BEGIN
		v_i := 1;
		RETURN v_i;
	END;
	' LANGUAGE 'plpgsql';
	
	CREATE OR REPLACE FUNCTION t2() RETURNS INT AS '
	DECLARE
		v_i INT;
	BEGIN
		SELECT 1 INTO v_i;
		RETURN v_i;
	END;
	' LANGUAGE 'plpgsql';
	
	CREATE OR REPLACE FUNCTION t3() RETURNS INT AS '/tmp/t3.so' LANGUAGE C;
	/* End */
	
	Where t3() is defined as:
	
	/* Begin */
	#include "executor/spi.h"
	int t3(void);
	int t3(void) {
		int v_i;
		v_i = 1;
		return(v_i);
	}
	/* End */

These tests are intentionally very basic. They could all just return 1. But it's the assignment that counts even though the C version gets optimized to just return(1). That's the advantage of having a real compiler go to work for you. For the first call tests, a new connection is fired up for every iteration. Preloading didn't make any difference between whether or not the libraries were preloaded or not.

In this set of results, the First call(1) is with plpgsql.so preloaded. First call(2) doesn't have plpgsql.so preloaded (new feature in pre-7.4).

	 Function  |  First call(1)  |  First call(2)  |  Subsequent calls
	-----------+-----------------+-----------------+-------------------
	 t1()      |  2.35msec       |  3.34msec       |  0.24msec
	 t2()      |  2.46msec       |  3.45msec       |  0.28msec
	 t3()      |  0.37msec       |  0.75msec       |  0.16msec

These results are not that interesting, but it shows what the overhead is for calling a function in the various circumstances. Here are a set of much more interesting tests, however, with t(4..6):

	CREATE OR REPLACE FUNCTION t4() RETURNS INT AS '
	DECLARE
		v_i INT;
		ret INT;
	BEGIN
		FOR v_i IN 1..1000 LOOP
			ret := v_i;
		END LOOP;
		RETURN ret;
	END;
	' LANGUAGE 'plpgsql';
	
	CREATE OR REPLACE FUNCTION t5() RETURNS INT AS '
	DECLARE
		v_i INT;
		ret INT;
	BEGIN
		FOR v_i IN 1..1000 LOOP
			SELECT v_i INTO ret;
		END LOOP;
		RETURN ret;
	END;
	' LANGUAGE 'plpgsql';
	
	CREATE OR REPLACE FUNCTION t6() RETURNS INT AS '/tmp/t6.so' LANGUAGE C;
	
	/* Begin C/t6() */
	#include "executor/spi.h"
	int t6(void);
	int t6(void) {
		int v_i;
		int ret;
	
		for (v_i = 0; v_i < 1000; v_i++)
			ret = v_i;
		return(ret);
	}
	/* End C */

Here the difference starts to show. t5() uses SELECT INTO instead of :=. Every time SELECT INTO is used, it has to talk to the backend and has to setup a plan for the query, though plans are cached by pl/pgsql. Nothing quite touches C... it's unfazed by the extra 1,000 iterations.

	 Function  |  First call(1)  |  First call(2)  |  Subsequent calls
	-----------+-----------------+-----------------+-------------------
	 t4()      |   6.16msec      |   9.17msec      |   3.54msec
	 t5()      |  34.52msec      |  41.18msec      |  29.19msec
	 t6()      |   0.37msec      |   0.77msec      |   0.16msec

The C examples don't perform queries, but it wasn't necessary for the tests given. Fetching rows is IO bound and the laptop's HDD isn't server quality so purely mathematical functions were more appropriate for this level of testing. pl/pgsql has a nasty habit of copying data with ROWTYPEs/RECORDs, whereas in C you can use a fixed buffer and just realloc() the space.

Contributors: Sean Chittenden sean at chittenden.org

pgsession utility
By Rao Kumar raokumar@netwolves.com	28-May-2003

Attached is a shell utility for listing and terminating postgreSQL sessions. This script will work with releases 7.2 and higher. This script uses kill -TERM, not kill -9 to terminate sessions.

	Script: pgsession
		pgsession : List/Kill database user sessions

	Usage:
		pgsession [OPTION]... [USER]

	Options:
		--h (help)	show this help, then exit
		-l  (list)	list database sessions
		-k  (kill)	kill/terminate database sessions
		-f  (force)	force kill (do not ask for confirmation,
				use in conjunction with -k option)
		-u  USER	specify database user name
		-p  PID	specify database user process id (pid)

Examples:
	  pgsession -l		list all sessions
	  pgsession -l -u 	list user sessions
	  pgsession -k		kill all sessions
	  pgsession -k -f	force kill all sessions
	  pgsession -k -u 	kill user sessions
	  pgsession -k -p 	kill user session with a specific pid

You can download the script from pgsession on the Tidbits Page.

Note: that a listing of sessions and their queries is also available in psql using:

	select * from pg_stat_activity;

Contributors: Rao Kumar raokumar at netwolves.com

Page Size
[GENERAL] page size in postgresql	11-May-2003

If on your machine, getpagesize() returns 4096, does that mean that the page size for postgreSQL is 4096 bytes? The documentation for "Monitoring disk usage" says postgres page size is "usually" 8KB (8192 bytes). Which one is right?

The postgreSQL pagesize is determined at compile time and given that most people don't fiddle it, it's going to be 8KB in 99.9% of cases.

Contributors: sureshk2003 sureshk2003 at indiatimes.com, Martijn van Oosterhout kleptog at svana.org

Workaround for extra long IN list using ODBC
[GENERAL] "IN" statement causing execution cancel?	22-May-2003

The following query resulted in a "Cancelled Execution" message when using ODBC on windows and when the list contains several hundred items.

	select field1 from table where field2 in (list);

The problem seems to be unique to the ODBC driver. The query executes correctly from psql and it also works correctly for much shorter lists. Two workarounds to this problem were suggested.

The first suggestion created an equivalent statement using LIKE.

	SELECT field1 FROM table WHERE field2 IN (1,2,3,4,5,6,7);
	SELECT field1 FROM table WHERE ',1,2,3,4,5,6,7,' like '%,'||field2||',%';

The second workaround is probably faster at execution time. That suggestion is to put the list into a table, perhaps a temporary table, and change the query to join on the table.

	select field1 from table t, listtable l where t.field2 = l.list_item;

Contributors: Andrew Ayers aayers at eldocomp.com, Mark Wilson mark at mediasculpt.com, Richard Huxton dev at archonet.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running Aggregates in Plpython
	8-Jun-2003

The talk I am giving at OSCON will be on Running Aggregates for PostgreSQL in plpython. This subject was covered briefly in Issue #23. At the conference the talk will have more examples and cover more detail. I look forward to publishing the extended examples and text after the conference. The short description of the talk is below.

A perennial question brought up in many forums is how to perform functions on the current row based on the previous row. This talk will show one technique in detail. This technique can be applied to many different tasks to provide a new way of using your data.
This talk will demonstrate how to use pl/python functions in PostgreSQL in order to perform inter-row calculations like running sums, and time differences between one row and the next. Previous knowledge of pl/python is not required.

Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!.