Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

4-Aug-2003 Issue: 37

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Bookmark Changes!
Check your bookmarked URLs	1-Aug-2003

The URLs for the individual issues of General Bits will be in flux as I do a little housecleaning. A bookmark to 32.html, for example, may no longer work. For consistent access to older issues, bookmark the Archive page listed below.

The following bookmarks will remain stable.

Current Issue: http://www.varlena.com/GeneralBits/
Archives: http://www.varlena.com/GeneralBits/archive.php
TidBits: http://www.varlena.com/GeneralBits/Tidbits/
Portuguese: http://www.varlena.com/GeneralBits/po.php

Editor: elein at varlena.com

Python Interface moved
[HACKERS] python interface 31-Jul-2003

The client side python interface for 7.4 has been removed from the regular CVS source tree. It can now be found at

http://www.pygresql.org

The interface for 7.3 and earlier remains in the ordinary distribution.

This change is somewhat out of the ordinary. In the past interfaces have moved to gborg.postgresql.org. On gborg the page for the PyGreSQL project the website link points to pygresql.org. It is unclear whether the CVS, bugs and features links will be maintained on gborg.

Contributors: Bruce Momjian pgman at candle.pha.pa.us, Marc G. Fournier scrappy at hub.org, Tom Lane tgl at sss.pgh.pa.us, D'Arcy J.M. Cain darcy at PyGreSQL.org

Trigger to implement Exclusive OR
[SQL] unique value - trigger? 17-Jul-2003

Suppose we had a table representing processes tasks like this:

	=# \d procs
	                Table "procs"
	   Column   |         Type          | Modifiers
	 -----------+-----------------------+-----------
	  pid       | integer               | not null
	  task      | character varying(10) | not null
	  owner     | character varying(10) | not null
	  IsActive  | boolean               |

The tasks are executed in single-threaded mode, one after the other. No two processes can be active at the same moment. What is wanted is the exclusive OR behavior like a radio button set.

The problem is how to keep the table updated with one and only one record in the table with IsActive set as TRUE. When a process is made active active, any other active process must be set to inactive.

The implementation is a trigger that fires when a processes updated to active which updates the row that was previously set to active as inactive. Here is the function and trigger.

	CREATE TRIGGER proc_current AFTER UPDATE OR INSERT ON procs
	FOR EACH ROW EXECUTE PROCEDURE proc_current_trigger();


	CREATE FUNCTION proc_current_trigger() RETURNS TRIGGER AS '
	BEGIN
	  IF NEW.IsActive THEN
	    UPDATE procs SET IsActive = ''f''
	    WHERE pid != NEW.pid AND IsActive = ''t'';
	  END IF;
	  RETURN NEW;
	END' LANGUAGE 'plpgsql';

Both INSERT and UPDATE cases are being handled by the trigger to ensure that integrity remains. Because there is no DELETE trigger and no action when a process is set to inactive, it is possible to have no active processes.

The function checks to see if the new or updated row is set to active, and if so, unset all (should only be one!) other rows to inactive. The other is accomplished by excluding the new row by pid in the where clause.

Contributors: Gary Stainburn gary.stainburn at ringways.co.uk, Dmitry Tkach dmitry at openratings.com, Richard Poole richard at ruthie.org

Upgrade in place
[GENERAL] Upgrading to 7.3.4? 30-Jul-2003

Minor releases can be upgraded in place. In that case nothing needs to be done to the database for the upgrade. Only the software needs upgrading.

A minor release is signaled by a change in the 3rd number, for example 7.3.4 is a minor release update to 7.3.2. When the second number changes, there are usually database changes to be done for the upgrade. For example, updates were necessary when transitioning from 7.2 to 7.3.

Minor releases can be done quite quickly and easily. Here are a couple examples of how people do minor updates. The first takes a cautious approach and first backs up the database moves the old installation out of the way. Then the new version is built and installed. The new data area has initdb run and then the data from the backup is loaded.

	pgdumpall > dbbackup                  -- backup and stop
	pg_ctl stop
	mv %PGHOME% /usr/local/pgsql.old      -- move old pgsql program

	cd /usr/local/src/postgresql-7.3.2    -- installs new pgsql version
	make install

	initdb -D %PGHOME%/data               -- start and restore db.
	pg_ctl start
	psql < dbackup

For the less cautious, this approach barrels through with no backup at all.

	tar xvfz postgresql-7.3.4.tar.gz ; 
	cd postgresql-7.3.4
	./configure ; 
	make ; 
	pg_ctl stop ; 
	make install
	pg_ctl start

A more moderate approach might be to just back up the data, just in case. But install the new release over the old one.

	pgdumpall > dbbackup 
	tar xvfz postgresql-7.3.4.tar.gz ; 
	cd postgresql-7.3.4
	./configure ; 
	make; 
	pg_ctl stop ; 
	make install

Notice that the database server should be stopped before running make install. And of course, YMMV.

Contributors: jørn T Johansen btj at havleik.no, scott.marlowe scott.marlowe at ihs.com, Freddy Menjívar M. mmfreddy at hotmail.com, Paul Ramsey pramsey at refractions.net

How many records?
[GENERAL] Billions of records? 15-Jul-2003

There is a limit of 16 Terrabytes for a PostgreSQL database. This is limited by the BlockNumber being 32 bits, so you cannot have a table larger than 2 or 4 billion blocks.

(The FAQ is conservatively assumes that the limit is 2 billion blocks; 2G blocks * 8KB block size = 16TB. In principle 4 billion should work, but it needs more rigorous testing. Old code which uses signed arithmetic on the BlockNumber may still exist. Is there anyone willing and able to test a database between 16T and 32T in size?)

Suppose you have a table with 50 fields of 20 characters each. If a table can be 16 TB, how many records could it hold? Let's calculate.

You have 50 fields of 20 char each.
Each field would be 24 bytes, using a 4 byte integer for length.
Each field also have 28 bytes of overhead, assuming the use of OIDS which is the default. Without OIDS, the row overhead is 23.
A tuple size is 50*(24+28)=1228 bytes.
In a page of 8KB, you would get 6 tuples.
In 16TB you can accommodate, 2GB pages
The total number of tuples you can have is approximately 12 billion.

There have been references of databases as big as 300GB. People routinely have more than 10GB databases. You can dig in list archives for the cases.

The 4TB database mentioned in the FAQ belongs to the American Chemical Society (something about scanning all their journals back to the mid-1800s...)

Another large database is the 2-micron sky survey: http://pegasus.astro.umass.edu/ which covers a bit under half a billion stars; it is reportedly about 150GB when loaded into Postgres. The UMass people seem to be happy with the performance they get.

Contributors: John Bercik bercikj at musc.edu, shridhar_daithankar at persistent.co.in, Tom Lane tgl at sss.pgh.pa.us, Jim C. Nasby jim at nasby.net, Bruce Momjian pgman at candle.pha.pa.us

Hexadecimal to decimal conversion
[GENERAL] hexadecimal to decimal 30-Jul-2003

Two functions were proposed, one in plpgsql and the other in plperl.


	create or replace function hex_to_int(char(2)) returns integer as '
	   declare
	      v_ret record;
	   begin
	      for v_ret in execute ''select x'''''' || $1 || ''''''::int as f'' loop 
	         return v_ret.f;
	   end loop;
	   end;
	' language 'plpgsql';

	create or replace function hex_to_int_perl(char(2)) returns integer as '
	   return hex $_[0];
	' language 'plperl';

Both functions produced the result correctly.

	create table foo(f1 char(2));
	insert into foo values ('ff');
	insert into foo values ('fe');
	insert into foo values ('fd');

	select hex_to_int[_perl](f1) from foo;
	 hex_to_int
	------------
	        255
	        254
	        253
	(3 rows)

The discussion branched here to the speed of these two functions in 7.4. Experiments were run with EXPLAIN ANALYZE showed that plperl function is fastest with or without preloading the libraries. In the course of the experiments, a problem was found with the preloading function for plperl. The problem was that if the library was not found or could not be loaded, no ERROR or NOTICE was raised. And the initialization function for plperl was a static library and could not be dynamically loaded anyway. But who knew? Both of these will be fixed in 7.4 by Joe Conway for plperl, plpgsql, pltcl and plpython.

In 7.4 you will be able to preload language libraries, using the postgresql.conf variable preload_libraries.

The hexadecimal to decimal functions are good, but they lack the inverse function as well as the ability to process variable length hexadecimal strings. This is left as an exercise for the reader :-)

Contributors: Claudio Lapidus clapidus at hotmail.com, Joe Conway mail at joeconway.com, Ron Johnson ron.l.johnson at cox.net, Tom Lane tgl at sss.pgh.pa.us

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top