Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

16-Feb-2004 Issue: 62

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Fixed Length Records
Char gotcha in 7.4	13-Feb-2004

In 7.4, one of the changes was that the char(n) type now truncates trailing spaces. This causes a problem for those of us using ::char(n) to create fixed length fields. Creating fixed length field batch files are usually required to interface with legacy systems.

In the example below, psql is called from a shell with tuples only (t) and unaligned (A). The result of the selection creates a fixed width file.

One of the techniques used to create fixed length fields in 7.3 and earlier was to cast the value to char(n). This along with the various to_char() functions used to be able to create fixed length records. For example:

% psql -At > batch.out << END
	select accountid::char(30),
	   to_char( transaction_amount, 'FM000.00'),
	   (lastname || ',' || firstname )::char(40),
	   bankid::char(15),
	   to_char( now(), 'YYYYMMDD');
END

In 7.4 this no longer works. The fields created with the cast are no longer fixed length. Instead of using the cast to make fixed length fields, use rpad() and lpad() to do the same thing.

	rpad(string text, length integer  [, fill text])
	lpad(string text, length integer  [, fill text])

The previous selection should now be written as follows.

% psql -At > batch.out << END
	select rpad(accountid,30),
	   to_char( transaction_amount, 'FM000.00'),
	   rpad( (lastname || ',' || firstname ), 40),
	   rpad(bankid, 15),
	   to_char( now(), 'YYYYMMDD');
END

Contributors: elein at varlena.com

Trigger recursion danger
[GENERAL] pl/pythonu 11-Feb-2004

With triggers you have the opportunity of changing the inserted or updated record before it gets written by accessing the tuple directly. It is not necessary to execute a separate insert into, the table and in fact that is the wrong thing to do.

If, for example, you have an insert trigger where you want to set a field to a value on insert. You do not want to execute INSERT INTO table VALUES... because that will trigger your trigger again. This means your trigger will be called recursively until postgres complains.

In pl/python this trigger defined on table t1 will be called recursively

	CREATE FUNCTION testing() RETURNS trigger AS'

	   plan=plpy.prepare(''INSERT INTO t1 values ($1)'',[''text''])
	   plpy.execute(plan,[''blah''])
	   return ''MODIFY''

	'LANGUAGE plpythonu;

and generate the following error:

	DETAIL: exceptions.RuntimeError: maximum recursion depth exceeded.

To modify the tuple, change the NEW record and then return modify. In pl/pgsql the NEW record is called NEW and the fields can be accessed using NEW.column_name. In pl/python, NEW is accessed via the trigger dictionary, TD["new"] and the fields are access as TD["new"]["column_name"].

The following example shows an update trigger on a table. The trigger sets a modification timestamp. If anything in the tuple is changed, the previous value of the val column is saved in the prevval column. In either case, a note is updated with the information about the last update's effect.

	drop table pytrig cascade;
	create table pytrig (
	   val   integer,
	   prevval  text,
	   modtime  timestamp,
	   notes text);
	
	drop trigger pytrig on pytrig;
	create or replace function pytrig()
	returns TRIGGER as '
	
	x = plpy.execute( "select now() as modtime;",1 )
	TD["new"]["modtime"] = x[0]["modtime"]
	if TD["new"] <> TD["old"]:
	   TD["new"]["prevval"] = TD["old"]["val"]
	   TD["new"]["notes"] = "last update: old val <> new val"
	   return "MODIFY"
	else:
	   TD["new"]["notes"] = "last update: old val == new val"
	   return "MODIFY"
	' language 'plpython';
	
	create trigger pytrig BEFORE UPDATE on pytrig for each row execute procedure pytrig();
	
	insert into pytrig values (1,NULL, now(), NULL );
	insert into pytrig values (2,NULL, now(), NULL );
	select * from pytrig;
	
	update pytrig set val = val + 1;
	select * from pytrig;
	
	update pytrig set val = val;
	select * from pytrig;

You can run this script to see the results of the updates. This type of trigger can also be used as a general model for updating and logging changes.

Contributors: C G csgcsg39 at hotmail.com, Mike Mascari mascarm at mascari.com, scott.marlowe scott.marlowe at ihs.com, Barbara Lindsey blindsey at cog.ufl.edu, elein at varlena.com

Interesting schema issue
[GENERAL] large crontab database design 03-Feb-2004

The design of a database version of crontab brings up several interesting questions. First let us look at the table layout. There are also indexes created on each field in the schedule detail table.

A "multiple items" syntax like '1,2,3' in a crontab time field will be presented with several records in the schedule_detail table. An "every N" syntax like '*/5' will be represented by a negative number '-5' in the database field. (For a quick review of crontab entries see % man 5 crontab)

 create table schedule (
   id      int not null primary key,
   command text
 );

 create table schedule_detail(
   schedule_id int      not null references schedule(id),
   minute      smallint not null,
   hour        smallint not null,
   day         smallint not null,
   month       smallint not null,
   year        smallint not null,
   dayofweek   smallint not null
 );

This is an example of how this crontab entry matches the records in the table.


 # crontab
 # every three hours except on weekends (sat/sun)
 0 */3 * * 1,2,3,4,5 CHECK-THE-COPIER-MACHINE


 -- psql
 insert into schedule values (1, 'CHECK-THE-COPIER-MACHINE');
 insert into schedule_detail values (1, 0, -3, -1, -1, -1, 1);
 insert into schedule_detail values (1, 0, -3, -1, -1, -1, 2);
 insert into schedule_detail values (1, 0, -3, -1, -1, -1, 3);
 insert into schedule_detail values (1, 0, -3, -1, -1, -1, 4);
 insert into schedule_detail values (1, 0, -3, -1, -1, -1, 5);

The query will then be:

	select distinct schedule_id,command 
	from schedule_detail
	left join schedule on schedule_id=schedule.id
	where
	(year = extract(year from current_date) or
	   (year < 0 and cast(extract(year from current_date) as smallint) 
	      % year = 0)) and
	(month=extract(month from current_date) or 
	   (month < 0 and cast(extract(month from current_date) as smallint) 
	      % month = 0)) and
	(day=extract(month from current_date) or 
	   (day < 0 and cast(extract(day from current_date) as smallint) 
	      % day = 0)) and
	(hour=extract(hour from current_time) or 
	   (hour < 0 and cast(extract(hour from current_time) as smallint) 
	      % hour = 0)) and
	(minute=extract(minute from current_time) or 
	   (minute < 0 and cast(extract(minute from current_time) as smallint) 
	      % month = 0));

A question about performance of this query led to a suggestion by someone who also stores crontab entries in the database. Their alternative is to store the next run time along with the original crontab string in the schedule table. Whenever a row is inserted or a job is run, the next runtime value is updated. The cron process, then would simply poll the schedule table for current jobs to run. This solution must lock the table for updates if it is accessed by more than one client.

It also suggests a possible perl function which quickly parses the crontab entry and returns the next run time.

   update schedule set next_time = (
	   select next_run_time( schedule_id, now() )
	   from schedule 
	   where schedule_id = schedule_id of current job

The logical followup could be to create a crontab entry domain with supporting functions, probably primarily written in plperl. These might perform set and get of the attributes of the crontab entry, implying that the crontab domain is a complex type stored as a string.

Contributors: David Garamond lists at zara.6.isreserved.com, Csaba Nagy nagy at ecircle-ag.com, elein at varlena.com

How NULLs are Stored
[GENERAL] How are NULLS stored on disk for PG 7.4? 9-Jan-2004

In 7.4 and in most previous releases information about NULL column values is stored in the tuple header. It is stored as a bitmap of the fields in the row indicating which fields are NULL.

By storing this data in the tuple header as meta data it can be accessed quickly with by the implementation of the IS NULL functionality. However, it is also probably why NULL values are not indexable.

It is rumored that Oracle stores the length of a column with the field's data. A value of zero indicates a NULL. This is possibly why Oracle tends to confuse empty fields and NULL values.

Other databases also used bit null indicators either stored with the field information or in the tuple header.

Contributors: Arnold Mavromatis A.Mavromatis at bom.gov.au, Bruce Momjian pgman at candle.pha.pa.us, Yannick Warnier ywarnier at beeznest.org, scott.marlowe scott.marlowe at ihs.com, vhikida at inreach.com, Jan Wieck JanWieck at Yahoo.com, elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top