Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

26-May-2003 Issue: 27

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

New Portuguese Articles
Novo Artigos em Português	24-May-2003

There is another new link on PostgreSQL General Bits for Portuguese speakers. In contains articles from General Bits translated by Juliano da Silva Ignacio. Welcome to Portuguese readers. I'm sure you will enjoy Juliano's work.

Há um novo link no General Bits do PostgreSQL para o idioma Português. Nele estão contidos artigos do General Bits traduzidos por Juliano da Silva Ignacio. Bem vindos leitores da Língua Portuguesa. Eu tenho certeza que o trabalho feito por Juliano irá agradá-los.

Este edição é o primeiro de uma série PostgreSQL General Bits Edição: 26

Contributors: Juliano da Silva Ignacio jsignacio at hotmail.com, elein at varlena.com

Clashing Variables Names
[GENERAL] PLPGSQL Fetching rows 20-May-2003

When creating a good pl/pgsql function, be sure that the declared variables do not have names which are the same as names as field or tables names in the function. pl/pgsql variables will have precedence and that is not always what you intend. It should also be noted, while we are on the subject, that you also should not use SQL keywords as variable names, but you knew that already, right?

It is particularly important to review your variable names for conflicts if you have a pl/pgsql function that is not behaving as you expect it to behave. In the example from the thread, a function had project_code declared and later in the function selected project_code from the table.

CREATE FUNCTION UpdateNextProjectCode() RETURNS OPAQUE AS '
DECLARE
	project_code          INTEGER;
	...
BEGIN
	...
	OPEN used_project_codes 
		FOR SELECT project_code FROM projects 
		WHERE project_code > 0 ORDER BY project_code ASC;
	...

project_code had not yet been assigned in pl/pgsql, so its value was NULL. The pl/pgsql variables are expanded before the select is executed so the following statement was what was actually executed (oops):

	SELECT NULL 
	FROM projects 
	WHERE NULL > 0 
	ORDER BY NULL ASC;

This was probably not what want originally meant.

The problem of names clashing in pl/pgsql is fairly common so add it to your check list when testing and debugging server functions.

Contributors: Mark Nelson mn at tardis.cx, Tom Lane tgl at sss.pgh.pa.us

Table Size and Performance
[GENERAL] Postgresql capabilities question 03-Apr-2003

The size of a table is not as important for performance as a well tuned server and good indexing. A couple of tens, no, say hundreds of thousands of rows is a perfectly reasonable size for a table and many will still call it small.

The question about porting a table of 95000 rows brought up the following testimonials and cautions:

I have a number of 1,000,000-plus row tables (very plus in some cases) running on some nasty low-end (Celerons with 5400rpm IDE drives, Netras) and performance is quite adequate for typical use. -- Steve

Almost laughably small :) Yeah, it's those sequential and tsearch index scans that kill it but selective queries fly. -- Nigel

You're absolutely correct that there are *many* other factors that determine performance aside from row count. That being said, I have table with over a million entries on actively used systems that perform really well with queries utilize and index and acceptably well on queries that require a sequential scan. -- Ryan

If the queries are selective and typically indexed, or you have enough RAM then I'd say there won't be much difference between splitting the tables or combining them into one.
95000 rows could be considered small. You might even have enough RAM to cache the whole DB- can estimate from DB footprint on disk. --Link

We use postgresql for lots of stuff that's more than 90,000 rows. We have a table we use to merge log files from multiple servers and then sort them by time.
The size of this table on the hard drive is 226,689,024 bytes. It's one of the smaller tables we play with on this server. Please note that this is our well tuned production server. A workstation with default settings and less memory might not be quite as fast as that, but you're still looking at sub second response times on well indexed datasets with less than say 5,000,000 rows or so. --Scott

Contributors: John Wells jb at sourceillustrated.com, Steve Atkins steve at blighty.com, Nigel J. Andrews nandrews at investsystems.co.uk, Ryan Mahoney ryan at paymentalliance.net, Lincoln Yeoh lyeoh at pop.jaring.my, scott.marlowe scott.marlowe at ihs.com

Caching Query Results
[GENERAL] caching query results	21-May-2003

Especially with slow queries it helps to be able to cache the result set and reuse it. This can be a bit complicated if your application is a web application. In this case it is important to know the scope in which you want to maintain the result set. By transaction? By connection? By session?

The "usual" way to cache query results is to use CURSORS:

	BEGIN;
	DECLARE cursorname FOR query;
	FETCH number_of_rows FROM cursorname;
	MOVE {FORWARD|BACKWARD} number_of_rows IN cursorname;
	...
	COMMIT;

This method has a problem when working across transactions and connections. Using persistent pooled connections is sometimes good, but it sometimes the performance benefit is not worth the hassle. Scott Marlow writes,

"Persistent connections are, generally 1000 times (or more) faster at establishing than non-persistent, but when non-persistent connects only take about 1/10,000 of a second, it's no great gain speed wise to have persistent connects.

With persistent connections it is good practice to commit or rollback before releasing the connection to the pool because it is difficult to maintain a transaction across pooled connections. Session tracking is required as well as coordinating the sessions with their connections. Most connection pooling utilities do not accommodate that requirement.

Cursors are required to be within a transaction, so their benefit is limited to the scope of a transaction. This tends to rule out connection pools with cursors as a solution when you want the cursor to last for the session.

The other "usual" way is for the client application to cache the data in the structures it has available. This is quite common, although the benefit of availability, storage and manipulation must be compared to the cost of reissuing the query.

A temporary table to store the selected information will not work because it disappears when the connection is closed. With careful handling they could work with connection pooling. The alternative is a regular table of results. Some consideration would have to be made for garbage collecting the temporary "regular" table, depending on the desired scope of the result set.

Contributors: alex b. mailinglists1 at gmx.de, scott.marlowe scott.marlowe at ihs.com, Darko Prenosil darko.prenosil at finteh.hr, Nigel J. Andrews nandrews at investsystems.co.uk

Grammar for SELECT DISTINCT
[GENERAL] SELECT DISTINCT ON bug? 20-May-2003

The grammar for DISTINCT is easy to misinterpret. Here is the formal definition:

	SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
	* | expression [ AS output_name ] [, ...]

It is easy to write:

	SELECT DISTINCT ON (dai_genres.id), dsc FROM dai_genres

when you really mean is

	SELECT DISTINCT ON (dai_genres.id) dsc FROM dai_genres

Leave out the comma after the parenthesis. The DISTINCT clause is before and separate from the target list. The column used in the DISTINCT clause is not included in the target list unless you specifically include by name or implicitly include it with a *.

	SELECT DISTINCT ON (dai_genres.id) * FROM dai_genres

It was discussed whether the grammar should have had curly brackets around the target list. This was suggested to separate the optional DISTINCT or ALL clause from the target list. But I'm not convinced; then again nobody asked me :-)

It was also pointed out that the OR'd part:

	* | expression [ AS output_name ] [, ...]

implied that you could not SELECT *, oid FROM ... which as we all know is a valid target list.

This is what the SQL Spec says about the symbols:

3.2 Notation
[ ] Square brackets indicate optional elements in a formula. The portion of the formula within the brackets may be explicitly specified or may be omitted.

{ } Braces group elements in a formula. The portion of the for- mula within the braces shall be explicitly specified.

| The alternative operator. The vertical bar indicates that the portion of the formula following the bar is an alterna- tive to the portion preceding the bar. If the vertical bar appears at a position where it is not enclosed in braces or square brackets, it specifies a complete alternative for the element defined by the production rule. If the vertical bar appears in a portion of a formula enclosed in braces or square brackets, it specifies alternatives for the contents of the innermost pair of such braces or brackets.

We heavily rely on the grammars as our bridge from our understanding to the computer's understanding. Discussion of grammar minutiae helps us create a readable and easy to understand way of expressing the instructions. Contributors: Jean-Christian Imbeault jc at mega-bucks.co.jp, Martijn van Oosterhout kleptog at svana.org, Nigel J. Andrews nandrews at investsystems.co.uk, Bruno Wolff III bruno at wolff.to, scott.marlowe scott.marlowe at ihs.com

Looping over an Array
[GENERAL] Pl/PgSql: Loop over Array	22-May-2003

A question about how to loop over an array brought the very nice example of exactly how to do so from Joe Conway. This example requires 7.3. For similar functions see also Issue 24 Article 6.
NOTE: Many simpler array functions were added in 7.4 and 8.0, after this article was published. See also the Array Documentation.

	CREATE OR REPLACE FUNCTION atest(INT[]) RETURNS TEXT AS '
	DECLARE
	   mya alias for $1;
	   myret text = '''';
	   low int;
	   high int;
	BEGIN
	   low  := replace(split_part(array_dims(mya),'':'',1),''['','''')::int;
	   high := replace(split_part(array_dims(mya),'':'',2),'']'','''')::int;
	
	   FOR i IN low..high LOOP
	     myret := myret || ''|'' || mya[i];
	   END LOOP;
	
	   RETURN myret;
	END;
	' LANGUAGE 'plpgsql' IMMUTABLE STRICT;
	
	regression=# select atest('{1,2,3,4}'::int[]);
	   atest
	----------
	  |1|2|3|4
	(1 row)

Contributors: Axel Straschil axel at pizza.home.kosnet.com, Joe Conway mail at joeconway.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top