Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

31-Mar-2003 Issue: 19

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Double those backslashes
[GENERAL] Limitations in PL/perl	18-Mar-2003

When creating functions in pl/perl, expecially those using expression modifiers, don't forget to double the backslashes. Functions like the following which strips punctuation and spaces work like a charm--if you get the backslashes right.

	
	create or replace function strip_punc (text) returns text as
	'
		my $data = $_[0];
		$data =~ s/\\W|\\s//g ;
		return $data;
	' language 'plperl';

While you are at it, double check that you are also doubling your single quotes inside of all of your functions definitions. That one gets me every time.

Contributors: Josh Berkus josh at agliodbs.com, Tom Lane tgl at sss.pgh.pa.us

Replication puzzle pieces
[GENERAL] 32/64-bit transaction IDs? 20-Mar-2003

In this thread Ed Loehr is deep in the design of a serial asyncronous replication system based on triggers, like dbmirror. Tom Lane is helping him sort out the methodology necessary to properly order the transactions to replicate.

Ed's original concern was whether transaction ids could be used deterministicly to order the transactions for application on the target system. Part of this question was a concern about transaction ids used to order the transactions and what happened when the transaction ids wrap around (in one to three years). The logic required to track the wrap around transactions was a little messy. Ed's suggested that if transaction ids were 64 rather than 32 bits then the wrap around would be delayed. This is true, but this turned out not to be the interesting problem.

Tom, early on, cut to the core of the issue but it took a while for all of the pieces to come together. He said, "XIDs are issued in transaction start order, not transaction commit order." The real issue is the ordering of the transactions.

The discussion turned to using the write-ahead-log (WAL) to provide a transaction ordering number, but it would require a hook into the back end to surface the WAL log addresses to SQL. But it would provide a 64 bit identifier for the transactions.

However, the discussion ran into the catch-22. A trigger firing inside a transaction has no knowledge about whether or not its transaction would commit or rollback or when that would happen. And, more importantly, it is impossible for any process to write to the database outside of a transaction so recording the commit or rollback in SQL is impossible.

The problem was to store the transactions and select them back out in the proper order to replay them on the target database. What the transaction id or WAL address would provide is the start order of each transaction. However, if each transaction were committed in the order it was begun, it would not be an accurate replication of the original system.

Applying each transaction in the order they began does not take into account concurrent transactions with read-committed mode. The following shows an example of this situation. xact 1 was in read-committed mode:

	begin xact 1;
		update tuple X;
		...

	begin xact 2;
		update tuple Y;
	commit;

	...
	update tuple Y;
	commit;

Because of the read-committed mode, the change made in xact 2 to tuple Y should be overwritten by xact 1's update of tuple Y.

The transactions should be applied not in transaction id order, but rather in commit order. The transactions need to be be queued for the target database in the order they are applied in the master database.

As a footnote, the question was raised whether dbmirror applies transactions in transaction id order. If so it would also fail in the case of concurrent transactions and read-committed mode. There was no confirmation that dbmirror actually has this issue.

Contributors: Ed L. pgsql at bluepolka.net, Tom Lane tgl at sss.pgh.pa.us

Insert or Update?
[GENERAL] Solution to UPDATE...INSERT problem 26-Mar-2003

In an application, there is the perennial problem of whether to do an insert or an update. If the record is there, an update is required. If not, an insert is appropriate.

The logic is usually handled in the application by first updating the record and if it no rows are updated, then inserting it. This technique has a race condition. A row with the primary key in question may be inserted by another transaction between the update and insert statements, causing the insert to fail with a duplicate key message. The solution is to create a table level LOCK around the update/insert statements.

Handling this issue without the table lock would be ideal, but at this time is it still impossible due to race conditions. There is no

	SELECT * FROM FOO WHERE pkfield = 'XX' for INSERT;

which would block on uncommitted inserts/updates of pkfield='XX' and other selects for insert/update.

But the stabs at possible solutions are interesting. Although they are not 100% reliable, the window for the race condition is smaller than the update/insert technique. Still, these techniques are not recommended for production systems.

These two tries at a solution both create insert statements which always "succeed". The insert can then always be followed by the update statement or the resulting number of rows updated can be checked. The race condition exists between the time it is determined that the row exists and when the insert is done. Another connection may have inserted the row in that time slice.

In this first example, the parentheses are not necessary, however, they can clarify the statement a little. From the outside in, a selection is being inserted into table FOO. That selection is the values you want to insert except if there is a record in FOO with the same primary key.

	INSERT INTO FOO 
		((SELECT 'XX','Unknown' )
		EXCEPT 
		(SELECT 'XX', 'Unknown' FROM FOO WHERE pkfield = 'XX'));

The except construct is like a join, except that it returns all of the first query after it eliminates those rows which are present in the second query. The rows must match exactly in order to be eliminated.

In our example, the second SELECT will return an empty set if there is no pkfield = 'XX' and that in turn returns the values from the first SELECT to be inserted. If the second SELECT found a record where pkfield = 'XX', then it would return the same values as are in the first SELECT, eliminating them. In that case, no rows are inserted, but the INSERT "succeeds".

The second method is a bit simpler, however, it uses the same underlying technique of eliminating the rows being selected and therefore inserted in the row already exists.

	INSERT INTO FOO SELECT 'XX', 'Unknown' 
	WHERE NOT EXISTS (SELECT TRUE FROM FOO WHERE pkfield='XX');

The second SELECT will return either one row containing 'TRUE' or an empty set. The empty set causes "WHERE NOT EXISTS" to be TRUE allowing the insert. If the second SELECT returned TRUE, 'WHERE NOT EXISTS' would be FALSE, causing the first select to return an empty set which is "successfully" inserted into FOO.

While these examples are interesting, it is still recommended that you use a table level lock or some other application/client coordination system to handle the update/insert problem.

Contributors: Christopher Kings-Lynne chriskl at familyhealth.com.au, Tom Lane tgl at sss.pgh.pa.us, Haroldo Stenger hstenger at adinet.com.uy, Lincoln Yeoh lyeoh at pop.jaring.my, Dennis Gearon gearond at cvc.net, pgsql-hackers at postgresql.org, elein at varlena.com

Name Parser
Hint by way of Rajesh Kumar Mallah 29-Mar-2003

How many times have you written a name parser? Too many times to count? Here is a name parser contributed by Rajesh Mullah that you could put in your tool box. This function parses a titled name and returns it as a pipe | delimited string.

	   select parse_name( 'Ms. A. Elein Mustain');
	     parse_name      
	   ---------------------
	    Ms.|A|Elein|Mustain
	   (1 row)

	CREATE OR REPLACE FUNCTION parse_name (text) RETURNS text AS '

        my ($name) = @_;
        $name =~ s/^\\s*|\\s*$//g;

        if (! defined $name) { return q!|||!; }

        my $titles = 
			"Mrs|Ms|M/s|Mr|Miss|Dr|Sir|Dame|Reverend|Father|\
			 Captain|Capt|Colonel|Col|General|Gen|Major|Maj";

        my ($title,$fname,$mname,$lname,@lname);

        $name =~ s#^($titles)([\\.\\s])##i;

        $name ||=q!!;

        if ($1 || $2) { $title = "$1$2" ; }

        $name =~ s/^\\s*|\\s*$//g;

        ($fname,$mname,@lname) =  split /[\\s+\\.]+/ , $name ;
        $fname||=q!!;
        $mname||=q!!;
        $lname = join q! ! ,@lname;
        $lname||=q!!;

        unless ($lname) {
                $lname = $mname;
                $mname = q!!;
        }
        return "$title|$fname|$mname|$lname";
	' LANGUAGE 'plperl';

Originally, this function was defined as a plperlpu function. It is not recommended that you use untrusted languages in the server if there is any other recourse. This function works fine as trusted plperl.

Improvements to this function could be made by keeping the titles in a table instead of hard coded in the function. That way you could add or drop titles as necessary. No doubt you will have forgotten some title. The hard coded assignment of $titles would be replaced by a query.

By adding a plpgsql layer surrounding a call to this function, a name tuple could be returned instead of the pipe delimited string.

Disclaimer: Since TMTOWTDI*, your perl code can vary.

* There's More Than One Way To Do It Contributors: Rajesh Kumar Mallah mallah at trade-india.com, elein at varlena.com

Order by scope in UNION
[GENERAL] missing FROM-clause notice but nothing is missing ... 27-Mar-2003

When you UNION two queries, it is important to know the resulting target list names and how to use them in the ORDER BY clause. It is also important to understand how the ORDER BY applies to the UNION.

A query of the form

	SELECT ... UNION SELECT ... ORDER BY ...

is executed as

	( (SELECT ...) UNION (SELECT ...) ) ORDER BY ...

The point of this is that you must order by the result set of the unions.

If the column names of the UNIONed queries are different, the first query's column names are chosen. But also, the columns can be refered to by number in the ORDER BY clause.

The following example leaves the server to decide "id" is the target list of the UNION. Then when it gets down to the ORDER BY clause, it has no idea what products.id is. It thinks a from clause is missing because at that point there is no products table.

	SELECT products.id
	      FROM products
	      WHERE name ILIKE 'AA'
	
	      UNION
	
	      SELECT prod_id
	      FROM   rel_cast_crew_movies
	      WHERE  cast_crew_id=1012
	      ORDER BY products.id;
	
	NOTICE:  Adding missing FROM-clause entry for table "products"
	ERROR:  ORDER BY on a UNION/INTERSECT/EXCEPT result must be on one of 
	the result columns

The solution is, of course, in the naming of the columns. Instead of ORDER BY products.id, the target list name or column number should be used instead. Replacing that clause with either

	ORDER BY id

	ORDER BY 1

will fix this query. Perfectionists may want to change the second SELECT to be

	SELECT prod_id as id

to clarify the result set.

Contributors: Jean-Christian Imbeault jc at mega-bucks.co.jp, Hadley Willan hadley.willan at deeperdesign.co.nz, Greg Stark gsstark at mit.edu, Stephan Szabo sszabo at megazone23.bigpanda.com, scott.marlowe scott.marlowe at ihs.com

Constructing Points
[GENERAL] Earth distance 27-Mar-2003

The examples in the contrib/earthdistance functions all have hard coded points. How does one use these functions with points from code or tables?

As we saw in Issue #15 (with a correction in Issue #17) you cannot create a string constant by embedding variables or column values into it. You have to construct the string or use other methods to construct the data type.

In this case, you can see that z1.lat and z1.lon are going to be treated as text inside of the string.

	SELECT up.first_name, up.last_name, up.city, zip_code 
	FROM user_primary up, ziplocs zl 
	where zl.zip=up.zip_code::integer and 
		'(33.0, 97.1)'::point <@> '(zl.lat, zl.lon)'::point < 50;

The easy solution in this case is to use the Point() function. point(z1.lat,z1.lon) will get you what you want.

Contributors: Tad Marko tad at tadland.net, Bruno Wolff III bruno at wolff.to, Stephan Szabo sszabo at megazone23.bigpanda.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top