Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

29-Aug-2005 Issue: 6

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Sometimes Sequential Scans are Better
[GENERAL] Lack of use of indexes	22-Nov-2002

Yet another question about why indexes are not being used. The table in question is as follows with 138459 rows.

	  Column  |     Type     | Modifiers
	----------+--------------+-----------
	 state    | character(2) | not null
	 county   | text         | not null
	 township | character(5) |
	 range    | character(5) |
	 section  | integer      |
	 meridian | integer      |
	 boundary | polygon      |
	Indexes: lgrng, lgsec, lgst, lgtwn

The queries are:

	select * from lg where state='NM';
	select * from lg where section=14;

Explain showed that both queries performed sequential scans. Several people explained two items that would affect this scenario.

First, use text or varchar() for the columns. This is because constants like 'NM' are seen as text types which is a different type than the column character type. It was suggested that character types are treated differently in the optimizer.

Then, from the explain's output, it was surmised that the data was not very well clustered. The rows were in fairly random physical order. When retrieving 10% of the data it is actually faster to do the sequential scan than it is to use the index so the optimizer choose the right path.

These suggestions were tested by setting sequential scan on and then off for runs of explain analyze which showed that in fact the scan was faster.

	gds2=# explain analyze select boundary from lg where section=14;
	NOTICE:  QUERY PLAN:

	Index Scan using lgsec on lg  (cost=0.00..12167.45 rows=3836 width=32) 
	(actual time=44.98..18325.87 rows=3759 loops=1)
	Total runtime: 18344.06 msec

	EXPLAIN
	gds2=# set enable_seqscan=on;
	SET VARIABLE
	gds2=# explain analyze select boundary from lg where section=14;
	NOTICE:  QUERY PLAN:

	Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=32) (actual 
	time=0.24..2974.65 rows=3759 loops=1)
	Total runtime: 2987.61 msec

Note that the first number following cost= or actual= shows the start up time needed to fetch the first row.

Contributors: Don Isgitt djisgitt at soundenergy.com, Doug McNaught doug at mcnaught.org, scott.marlowe scott.marlowe at ihs.com, Stephan Szabo sszabo at megazone23.bigpanda.com, Tom Lane tgl at sss.pgh.pa.us

What is running now?
[GENERAL] running query 15-Nov-2002

To see the queries currently running on your installation use the view:

	select * from pg_stat_activity;

You must be postgres to actually see the queries and you must have set your postgresql.conf file to monitor the queries. The line to set query monitoring is:

	set stats_command_string=on;

Contributors: pginfo pginfo at t1.unisoftbg.com, Hervé Piedvache herve at elma.fr, Tom Lane tgl at sss.pgh.pa.us, Patrick Welche prlw1 at newn.cam.ac.uk

pg_ctl Options
[GENERAL] 7.3 pg_ctl anomaly 03-Dec-2002

The -o option to pg_ctl enables you to pass options to the postmaster directly. Unfortunately, pg_ctl does not actually "see" the options so it may give some misleading messages.

In this example:

	pg_ctl -w -o "-p 12345" start

a non-standard port number is passed to the postmaster. However, pg_ctl tries to affirm that the postmaster started by connecting to psql using the default parameters in its environment. It then returns the message that the postmaster did not start when it in fact did.

In this case, setting the PGPORT environment variable will enable pg_ctl to do the right thing.

Contributors: Ed L. pggeneral at bluepolka.net, Tom Lane tgl at sss.pgh.pa.us

A NULL is a NULL is a NULL
[GENERAL] Is NULL equal to NULL or not? 09-Dec-2002

In Releases 6.5 up to but not including 7.2, in a select target list you could use the following expression and the result appear to be true or false. The '= NULL' had been substituted with IS NULL.

	SELECT col = NULL

This behavior is wrong and has been changed as of 7.2.

Older versions of PostgreSQL had allowed comparisons between NULLs for compatibility with older versions of Access. Access has been changed since then, too. Tom Lane said, "This should never have been made the default behavior IMHO, ... because it blatantly broke SQL compatibility, not to mention violating any usable concept of what NULL means. ... [Releases] 6.5, 7.0, and 7.1 lasted long enough to allow a lot of people to become confused :-( "

When testing a NULL value you must always use IS NULL.

	SELECT col IS NULL from mytable where othercol IS NOT NULL;

Assignment of variables to NULL in pgplsql can be done with the assignment operator ':=' which should never be confused with the boolean operator '='.

It is a common misconception that NULL is a value which can be compared against another value. The fact is that NULL is not a value; it is the absence of a value. While values may be absent in two variables, the variables still cannot be compared for equality.

Contributors: jco at cornelius-olsen.dk, scott.marlowe scott.marlowe ihs.com, Mike Mascari mascarm at mascari.com, Rich Shepard rshepard at appl-ecosys.com Tom Lane tgl at sss.pgh.pa.us elein at varlena.com

Returning record sets from functions
[GENERAL] Returning multiple columns with a function?? 15-Dec-2002

Returning multiple columns as well as multiple rows sometimes causes the two ideas to be confused. For multiple columns you must use composite types. For multiple rows, you would return a SETOF some-type-could-be-composite-or-not.

As of 7.3, returning a SETOF some type is possible in SQL, plpgsql and C. (Ed. What about plpython??)

In order to return multiple columns from a function, you must return a composite type. This is true whether you are returning one row or many rows.

In this example, first a composite type named doubletext is defined and then only one row value is returned.

	CREATE TYPE doubletext(a text, b text);

	CREATE OR REPLACE FUNCTION test_multiple() 
	RETURNS doubletext AS
		'select ''a''::text, ''b''::text;' 
	language 'sql';

	select * from test_multiple();

If you want to return many rows of doubletext, return a SETOF doubletext.

	CREATE OR REPLACE FUNCTION test_multiple2() 
	RETURNS SETOF doubletext AS
		'select ''a''::text, ''b''::text union select ''c''::text, ''d''::text;'
	language 'sql';

	select * from test_multiple2();

Contributors: Joshua D. Drake jd at commandprompt.com, Neil Conway neilc at samurai.com, Renê Salomão rene at ibiz.com.br, Stephan Szabo sszabo at megazone23.bigpanda.com, Eric B.Ridge ebr at tcdi.com

Integers and empty strings
[GENERAL] trouble caused by change in 7.3 handling of '' in integer context 18-Dec-2002

As of 7.3, an empty string is no longer converted to 0 on assignment to an integer. The change was made in order to conform more closely with SQL standards. In the past,

	insert into test (integercolumn) values ('');

resulted in 0 being assigned to the column value. Now, this assignment will raise an error message. This behavior applied both to integer columns as well as integer arrays.

It was noted that float4, float8, OID still accept '' as 0 and int8 and numeric have never accepted '' as 0. By the time you read this, float4, float8 and OID will probably be fixed so as not to accept '' as 0. Another side observation was the proper treatment of spaces in quoted numbers. Currently leading spaces are accepted, however, trailing spaces are not. The SQL99 rules state that both leading and trailing spaces should be removed. This will probably also be fixed by the time you read this :-)

The primary discussion of this thread, however, discussed the fact that Request Tracker broke with this change. It was said that, "The author has gone so far as to notify all current users that Postgres 7.3 is not usable with RT, and to stick to 7.2, or to use MySQL." The general concensus seemed to be that this was very unfortunate, however, it was a bug with Request Tracker.

Some people criticized the fact that this change was not widely advertised. It is generally accepted that behavior changes that affect many programs need to be implemented slowly so that their impact is minimized. Others pointed out that participation in the beta program was important. The beta program is the time to single out particular problems like this and to help the team choose which things to change and which to deprecate over a release. In the 7.3 beta program, this issue seems to had been raised once but was not deemed widespread enough to provide a deprecation path. Lively discussion ensued.

Bruce Momjian created a patch which would allow the old behavior but would also raise a warning. This patch is now available.

Contributors: Vivek Khera khera at kcilink.com, Larry Rosenman ler at lerctr.org, Bruce Momjian pgman at candle.pha.pa.us, scott.marlowe scott.marlowe at ihs.com, Benjamin Scherrey scherrey at proteus-tech.com, Tom Lane tgl at sss.pgh.pa.us, Peter Eisentraut peter_e at gmx.net, Robert Treat xzilla at users.sourceforge.net, Lee Kindness lkindness at csl.co.uk, S Dawalt shane.dawalt at wright.edu

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top