Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

23-Feb-2003 Issue: 14

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Attaching with the debugger
[GENERAL] Query gone wild	29-Jan-2003

Everybody has a query go wild now and then. To begin to find out which query causing the problem, you first need to find out the process id of the postmaster running wild.

The ideal way to determine the process id (and query information) is via the pg_stat_activity view. If you have the value stats_command_string set in postgresql.conf, this view will not only give you the process id, but the query in progress and you'll have what you need. pg_stat_activity is available in 7.2 and later. Alternatively, you can look at all of the postgres processes by using ps. Your arguments to ps may vary.

	ps ax | grep postgres

If you do not have stats_command_string set, you can try the debugger. But only if the postgres is not stripped. You can tell whether the compiled postgres is stripped or not on unix-like systems by issuing a file command. This one is not stripped.

	$ file /local/pgsql73/bin/postgres
	/local/pgsql73/bin/postgres: ELF 32-bit LSB executable, 
	Intel 80386, version 1, dynamically linked (uses shared libs), not stripped

If your executable is not stripped, you can attach to it using gdb. This example shows debugging a postgres executable and then attaching to the process in question. Then it prints the global variable debug_query_string. The value it prints should show you the query running on the process you have identified as wild.

	$ gdb postgres executable
	gdb> attach PID-OF-PROCESS
	gdb> p debug_query_string
	gdb> quit

Contributors: Garo Hussenjian garo at xapnet.com, Neil Conway neilc at samurai.com, Tom Lane tgl at sss.pgh.pa.us, codeWarrior GPatnude at adelphia.net

Indexing NULLs
[GENERAL] Index not used with IS NULL 1-Feb-2003

When querying on an indexed column for IS NULL or IS NOT NULL, the index is not used, even if the NULL values are indexed. This means that when you have a table with a key on column one, then:

	select * from mytable where colone = 'value';

will probably use the index, but

	select * from mytable where colone IS NULL;

will use a sequential scan.

The problem with the indexes not using the indexed NULLs is that the SQL syntax "IS NULL" and "IS NOT NULL" are not proper operators and therefore not indexable operators. The syntax is parsed inline and not as executable as a functional or operator index.

NULL values are indexed in B-Trees. GIST supports nulls in second and subsequent columns of a multi-column index. Hash and RTree don't store indexed nulls at all so the issue is not applicable. Unique indexes enforce SQL92 unique constraints, which are defined by the spec (sec 4.10):

	A unique constraint is satisfied if and only if no two rows in
	a table have the same non-null values in the unique columns.

Multi-column indexes support index of NULL values. One reason NULLs are indexed is that it enables an indexscan to be a substitute for seqscan plus a sort.

The rest of the thread discussed work arounds and possible solutions for creating an indexable structure for NULLs. Here are some of the discussions.

One person developed a user defined set of operators to work on NULL values. However, creating operators for all of the operator classes is "cumbersome". There are also semantic issues, for example, NULL = NULL (the boolean expression, not the pgplsql assignment) should never return TRUE with the proper definitions of NULLs. (See also: General Bits Issue#6). The choice of operators must not confuse this delicate semantic difference.

In discussing a solution in PostgreSQL, Martin Van Oosterhout brought up the idea of using a boolean isnull() function to use when handling index on queries with IS NULL. Tom Lane added onto that idea by suggesting that the function take ANY, the psuedo type, as an argument. The function would have to be special cased in the planner. This seemed to be the most reasonable suggestion as it incorporated existing capabilities for the most part. No implementation promise was made, however.

Contributors: Nick Wellnhofer wellnhofer at aevum.de, Tom Lane tgl at sss.pgh.pa.us, Dima Tkach dmitry at openratings.com, Andrei Ivanov andrei.ivanov at ines.ro, Martijn van Oosterhout kleptog at svana.org, Dennis Gearon gearond at cvc.net, Greg Stark gsstark at mit.edu

Functions and NULLs
[GENERAL] a function problem... 17-Feb-2003

When you create a new function, you must consider what the result will be if any of the arguments is NULL. There are two behaviors possible if any arguments are NULL:

Skip calling the function and assume the return value is NULL
Handle the possibility of a NULL argument in the function.

These behaviors are pertinant to functions in any language, C, plpgsql, etc. The first behavior is fastest and easiest. Very often, NULL arguments indicate that the function return NULL. The second behavior allows the function developer more control over the possible return values and is important if the NULL argument does not preclude the function from returning a value other than NULL. The second case is the default.

The syntax for defining the behaviour of the function changed between 7.2 and 7.3. To skip calling the function and assume a NULL return these are the arguments:

		7.2	CREATE FUNCTION .... WITH (ISSTRICT);
		7.3:	CREATE FUNCTION ...  RETURNS NULL ON NULL INPUT
	or	
		7.3:	CREATE FUNCTION ...  STRICT

To ensure the function is called whether or not any arguments are NULL, in 7.2 do not use the WITH (ISSTRICT) and in 7.3:

		7.3: CREATE FUNCTION ... CALLED ON NULL INPUT

Remember this is the default.

If you use a function which mysteriously returns NULL, you should consider whether you are calling the function with NULL arguments and whether or not the function is defined to handle NULL input. Many built-in functions are defined ISSTRICT. For example, max() will return NULL with NULL arguments. To see whether a function is defined to RETURN NULL ON NULL INPUT this query can be used:

	select isstrict from pg_proc where proname = 'function name';

Contributors: joyehsu joye4023 at gigigaga.com, Mike Mascari mascarm at mascari.com

Vacuuming very large tables
[GENERAL] 7.3.1 takes long time to vacuum table? 18-Feb-2003

The vacuum command is optimized for small sets of changes. When you have a very large table and you update all rows of the table, then vacuuming can take a very long time. In this thread, the table was 40Gb and the vacuum which usually took 12 hours, was killed, finally, after 55 hours. It was more than halfway into its job. The vacuum portion of the task had been completed, but it was in the middle of compacting the rows. At this stage, it scans the table backwards which is counter-intuitive to caching algorithms, however, this is only a third of the I/O. Tuple writes and Write Ahead Logging (WAL) are the other two thirds.

The primary type of workaround to this problem involved creating a new copy of the table either by CREATE TABLE ... SELECT * FROM ... or pg_dump and restore. The idea behind this is to synchronize the reads and writes instead of thrashing between the beginning and end of large files.

The downside of this type of solution is that it will require at least twice as much space as the original table. In this case, got 40 gigs to spare? The workaround which creates a new table requires that the indexes be rebuilt and that the constraints be moved to the new table by hand. The algorithm for this solution would be:

	begin transaction
	drop indexes on old table
	create table newtable as select * from oldtable;
	create new index on new table
	rename newtable
	reconstruct constraints on table
	commit transaction

There was some discussion on the diagnosis of how far the original vacuum was in its job. There was also mention of improving vacuum's algorithm so that it had an optional technique if the table is large and there were enough space to recreate the table.

Contributors: Mark Cave-Ayland m.cave-ayland at webbased.co.uk, Martijn van Oosterhout kleptog at svana.org, Peter Childs Blue.Dragon at blueyonder.co.uk, Shridhar Daithankar shridhar_daithankar at persistent.co.in, Tom Lane tgl at sss.pgh.pa.us, Jean-Luc Lachance jllachan at nsd.ca, Stephan Szabo sszabo at megazone23.bigpanda.com

Reusing plpython triggers
[GENERAL] Potential bug -- script that drops postgres server [HACKERS] plpython trigger code is wrong 7-Feb-2003

There is a bug with plpython in 7.3 where if a function is used as a trigger for more than one table, then it will cause the server to be dropped.

In the case of the bug, plpython was caching the information about the row on which it was called. Then, when the function was called on a different row, it was using the wrong row type information.

plpython is a particularly good language in which to write triggers and other functions because its dictionaries enable you to store information easily across calls to the function in the same statement. However, storing information across calls is tricky at best and sometimes wrong assumptions are made about what can be stored across calls. plpython functions themselves storing information across statements and sessions must be vigilant about when to initialize and reset stored data in the same way an aggregate with group by must. And so also the writers of plpython need also to distinguish the scope and context of the information it stores in its caches. A bug fix is in the works which may cache rowtype information separately for each table the call is applied to.

Contributors: Lance Thomas LThomas at DevIS.com, Tom Lane tgl at sss.pgh.pa.us, Bradley McLean brad at bradm.net, elein at varlena.com

Inserting corresponding rows using sequences
[GENERAL] accessing currval(), How? ... Trigger? I think...??? 10-Feb-2003

On every insert into table A, we want to make a corresponding insert into table B. And we want the insert of B to use the generated sequence corresponding to the insert on table B. To do this we'll need a trigger function on insert of A to do the insert for B. There are two ways to grab the new id--either from the NEW record or from currval(). Both ways are shown below. The tables are "contacts" and "contact_lists". Be sure, however that the trigger is an AFTER trigger so that the new id has already been created.

	CREATE FUNCTION contact_list_trigger() RETURNS TRIGGER AS '
		BEGIN;
			INSERT INTO contact_lists (contact_id) 
				VALUES (new.id);
			RETURN new;
		END;' LANGUAGE 'plpgsql';

	Or:

	CREATE FUNCTION contact_list_trigger RETURNS TRIGGER as '
		BEGIN;
			INSERT INTO contact_lists(contact_id) 
				VALUES (currval(''contact_id_seq''));
			RETURN new;
		END;' language 'plpgsql';

	The trigger definition is:

	CREATE TRIGGER insert_contact_list AFTER INSERT ON contacts
	FOR EACH ROW EXECUTE PROCEDURE contract_list_trigger();