Everybody has a query go wild now and then. To begin to find out which query causing the problem, you first need to find out the process id of the postmaster running wild.
The ideal way to determine the process id (and query information) is via the pg_stat_activity view. If you have the value stats_command_string set in postgresql.conf, this view will not only give you the process id, but the query in progress and you'll have what you need. pg_stat_activity is available in 7.2 and later. Alternatively, you can look at all of the postgres processes by using ps. Your arguments to ps may vary.
ps ax | grep postgres
If you do not have stats_command_string set, you can try the debugger. But only if the postgres is not stripped. You can tell whether the compiled postgres is stripped or not on unix-like systems by issuing a file command. This one is not stripped.
$ file /local/pgsql73/bin/postgres /local/pgsql73/bin/postgres: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), not stripped
If your executable is not stripped, you can attach to it using gdb. This example shows debugging a postgres executable and then attaching to the process in question. Then it prints the global variable debug_query_string. The value it prints should show you the query running on the process you have identified as wild.
$ gdb postgres executable gdb> attach PID-OF-PROCESS gdb> p debug_query_string gdb> quit
When querying on an indexed column for IS NULL or IS NOT NULL, the index is not used, even if the NULL values are indexed. This means that when you have a table with a key on column one, then:
select * from mytable where colone = 'value';will probably use the index, but
select * from mytable where colone IS NULL;will use a sequential scan.
The problem with the indexes not using the indexed NULLs is that the SQL syntax "IS NULL" and "IS NOT NULL" are not proper operators and therefore not indexable operators. The syntax is parsed inline and not as executable as a functional or operator index.
NULL values are indexed in B-Trees. GIST supports nulls in second and subsequent columns of a multi-column index. Hash and RTree don't store indexed nulls at all so the issue is not applicable. Unique indexes enforce SQL92 unique constraints, which are defined by the spec (sec 4.10):
A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.Multi-column indexes support index of NULL values. One reason NULLs are indexed is that it enables an indexscan to be a substitute for seqscan plus a sort.
The rest of the thread discussed work arounds and possible solutions for creating an indexable structure for NULLs. Here are some of the discussions.
One person developed a user defined set of operators to work on NULL values. However, creating operators for all of the operator classes is "cumbersome". There are also semantic issues, for example, NULL = NULL (the boolean expression, not the pgplsql assignment) should never return TRUE with the proper definitions of NULLs. (See also: General Bits Issue#6). The choice of operators must not confuse this delicate semantic difference.
In discussing a solution in PostgreSQL, Martin Van Oosterhout brought up the idea of using a boolean isnull() function to use when handling index on queries with IS NULL. Tom Lane added onto that idea by suggesting that the function take ANY, the psuedo type, as an argument. The function would have to be special cased in the planner. This seemed to be the most reasonable suggestion as it incorporated existing capabilities for the most part. No implementation promise was made, however.
When you create a new function, you must consider what the result will be if any of the arguments is NULL. There are two behaviors possible if any arguments are NULL:
The syntax for defining the behaviour of the function changed between 7.2 and 7.3. To skip calling the function and assume a NULL return these are the arguments:
7.2 CREATE FUNCTION .... WITH (ISSTRICT); 7.3: CREATE FUNCTION ... RETURNS NULL ON NULL INPUT or 7.3: CREATE FUNCTION ... STRICTTo ensure the function is called whether or not any arguments are NULL, in 7.2 do not use the WITH (ISSTRICT) and in 7.3:
7.3: CREATE FUNCTION ... CALLED ON NULL INPUTRemember this is the default.
If you use a function which mysteriously returns NULL, you should consider whether you are calling the function with NULL arguments and whether or not the function is defined to handle NULL input. Many built-in functions are defined ISSTRICT. For example, max() will return NULL with NULL arguments. To see whether a function is defined to RETURN NULL ON NULL INPUT this query can be used:
select isstrict from pg_proc where proname = 'function name';
The vacuum command is optimized for small sets of changes. When you have a very large table and you update all rows of the table, then vacuuming can take a very long time. In this thread, the table was 40Gb and the vacuum which usually took 12 hours, was killed, finally, after 55 hours. It was more than halfway into its job. The vacuum portion of the task had been completed, but it was in the middle of compacting the rows. At this stage, it scans the table backwards which is counter-intuitive to caching algorithms, however, this is only a third of the I/O. Tuple writes and Write Ahead Logging (WAL) are the other two thirds.
The primary type of workaround to this problem involved creating a new copy of the table either by CREATE TABLE ... SELECT * FROM ... or pg_dump and restore. The idea behind this is to synchronize the reads and writes instead of thrashing between the beginning and end of large files.
The downside of this type of solution is that it will require at least twice as much space as the original table. In this case, got 40 gigs to spare? The workaround which creates a new table requires that the indexes be rebuilt and that the constraints be moved to the new table by hand. The algorithm for this solution would be:
begin transaction drop indexes on old table create table newtable as select * from oldtable; create new index on new table rename newtable reconstruct constraints on table commit transaction
There was some discussion on the diagnosis of how far the original vacuum was in its job. There was also mention of improving vacuum's algorithm so that it had an optional technique if the table is large and there were enough space to recreate the table.
There is a bug with plpython in 7.3 where if a function is used as a trigger for more than one table, then it will cause the server to be dropped.
In the case of the bug, plpython was caching the information about the row on which it was called. Then, when the function was called on a different row, it was using the wrong row type information.
plpython is a particularly good language in which to write triggers and other functions because its dictionaries enable you to store information easily across calls to the function in the same statement. However, storing information across calls is tricky at best and sometimes wrong assumptions are made about what can be stored across calls. plpython functions themselves storing information across statements and sessions must be vigilant about when to initialize and reset stored data in the same way an aggregate with group by must. And so also the writers of plpython need also to distinguish the scope and context of the information it stores in its caches. A bug fix is in the works which may cache rowtype information separately for each table the call is applied to.
On every insert into table A, we want to make a corresponding insert into table B. And we want the insert of B to use the generated sequence corresponding to the insert on table B. To do this we'll need a trigger function on insert of A to do the insert for B. There are two ways to grab the new id--either from the NEW record or from currval(). Both ways are shown below. The tables are "contacts" and "contact_lists". Be sure, however that the trigger is an AFTER trigger so that the new id has already been created.
CREATE FUNCTION contact_list_trigger() RETURNS TRIGGER AS ' BEGIN; INSERT INTO contact_lists (contact_id) VALUES (new.id); RETURN new; END;' LANGUAGE 'plpgsql'; Or: CREATE FUNCTION contact_list_trigger RETURNS TRIGGER as ' BEGIN; INSERT INTO contact_lists(contact_id) VALUES (currval(''contact_id_seq'')); RETURN new; END;' language 'plpgsql'; The trigger definition is: CREATE TRIGGER insert_contact_list AFTER INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE contract_list_trigger();See also General Bits Issue #5.
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009