Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

16-Jun-2003 Issue: 30

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Thirty Issues!
Editorial	15-Jun-2003

This is Issue # 30 for General Bits. Our latest addition of a Portuguese translation has been very successful so far thanks to Juliano S. Ignacio.

Don't forget to subscribe to PostgreSQL General Bits. Support your local open source projects! Some new feature are in the works that will only available to subscribers.

As usual, feedback is requested. Send in your comments and corrections and suggestions.

And we are all looking forward to 7.4!

Cheers,

Editor: elein at varlena.com

Temporary Tables with pl/pgsql
[GENERAL] Temporary tables inside functions problem 7-Jun-2003

You can create and use temporary tables inside of pl/pgsql functions, however, there is a catch. pl/pgsql functions are parsed when they are first run. Subsequent calls use the same query structures to run. If one of those queries accesses a temporary table created and dropped in the function it will work the first time, but not the second. The second time it is called, it will still be trying to access the first invocation's temporary table, not the newly created one.

The workaround/proper way to handle this situation is to perform queries which access temporary tables in plpgsql functions by using EXECUTE. By using EXECUTE, the query is reparsed and the proper (new) temp table is accessed.

	-- This will not work
	...
	CREATE TEMP TABLE foo AS SELECT ....;
	FOR r IN SELECT * FROM foo
	LOOP
		....
	END LOOP;
	DROP TABLE foo;

	-- This will work
	...
	CREATE TEMP TABLE foo AS SELECT ....;
	FOR r IN EXECUTE "SELECT * FROM foo"
	LOOP
		....
	END LOOP;
	DROP TABLE foo;
	...

Contributors: Avi Schwartz avi at CFFtechnologies.com, Stephan Szabo sszabo at megazone23.bigpanda.com, Ian Barwick barwick at gmx.net, Sean Chittenden sean at chittenden.org

Decimal Marker
[GENERAL] converting from one locale to another	13-Jun-2003

To convert a database from one monetary unit to another, you only need to set the variables LC_MONETARY and LC_NUMERIC. This will hint to the database that the decimal should be a comma instead of a dot and hint what the currency symbol should be.

Unfortunately, none of the input and output for any of the number types listen to this hint. According to SQL specification, a dot is a decimal divider and PostgreSQL sticks to the specs where ever possible.

The functions that do pay attention to LC_MONETARY and LC_NUMERIC hints are the to_char() and to_number() functions. Capture input correctly using to_number() and send out properly displayed values using to_char().

In this example we have a float and numeric columns in a table.

	create table to_what ( fl float, num numeric);

To check what the current values of LC_MONETARY and LC_NUMERIC are use SHOW. These show we are starting out with US English values.

	=# show LC_MONETARY;
	 lc_monetary 
	-------------
	 en_US

	=# show LC_NUMERIC;
	 lc_numeric 
	------------
	 en_US

We'll insert one row in using these values and show the result which is as expected, dots for decimals.

	=# insert into to_what values ( 1.2, 34.56 );
	INSERT 1622475 1
	elein=# select * from to_what;
	 fl  |  num  
	-----+-------
	 1.2 | 34.56

We'll change the values of LC_MONETARY and LC_NUMERIC to 'fr_FR' in order to see how that changes things. A row using dots as decimals is inserted and also inserted is a row where the values have a comma as a decimal separator. In order to use the comma separator, the values must be single quoted and passed to to_number() with an appropriate format.

	=# set LC_MONETARY='fr_FR';
	=# set LC_NUMERIC='fr_FR';
	=# insert into to_what values ( 78.09, 12.34 );
	=# insert into to_what values
		( to_number( '56,78', '99D99'), to_number('90,12','99D99') );

Selecting out all of the values shows that the output routines still use the dot.

	=# select * from to_what;
	  fl   |  num  
	-------+-------
	   1.2 | 34.56
	 78.09 | 12.34
	 56.78 | 90.12

However, the function to_char() can be used to format the output to the proper one for our locale.

	=# select to_char( fl, '99D99'), to_char(num, '99D99') from to_what;
	 to_char | to_char 
	---------+---------
	 F  1,20 | F 34,56
	 F 78,09 | F 12,34
	 F 56,78 | F 90,12

See the docs, Section 6.7. Data Type Formatting Functions for more information about the symbols you can use in creating your formats. And as a side note, the currency symbol for 'fr_FR' still shows 'F'.

Contributors: Tony Grant tony at tgds.net, Tom Lane tgl at sss.pgh.pa.us, elein at varlena.com

Bad Dupes
Identifying and Removing Duplicate Rows 13-Jun-2003

Although we all know that in a relational database, all tables have a functional primary key, sometimes we end up with a table of data generated by something or someone else which does not have this key. In these cases we generate a primary key if the natural key is not adequate.

In the course of loading data, it happens sometimes that data gets loaded twice and you've ended up with duplicate rows in your table and the generated key won't help you a bit. And neither will oids, especially if your table does not have them. (One more argument for natural keys...)

Someone, who will go unnamed, has a table that is loaded directly from an apache log file. Someone, that same unnamed person, made some kind of mistake and ended up loading many rows a second time. oops.

To see how bad the problem is this query can be issued. This counts the instances of duplicates. The group by has all of the columns in the table, so a duplicate is identified by two or more rows having all columns equivalent. The having clause is evaluated last. It will throw out all of the rows which have a count of 1.

	select count(*) from
	(select count(*) as howmany from rawlog
		group by
		ip, ldate, ltime, method, doc, qs, code, dsize
		having count(*) > 1) foo;

The subquery can be issued to show you the data for the records which are duplicates. This is helpful if there are only a few of them. That is why we counted them first.

	select count(*) as howmany, ldate, ltime, doc, ip, method, qs, code, dsize
	from rawlog
	group by ldate, ltime, doc, ip, method, qs, code, dsize
	having count(*) > 1;

So now we know how many there are and which ones there are. But now we want to keep one and only one row of the rows which have two or more copies and delete the others. For this we will use a brute force approach. If we had OIDs in our table it could be simpler. We could have used the same same technique used for updating duplicate rows shown in Issue #25 Article 5. But alas, we have no OIDs, so brute force must be the way.

This function uses the query above to select out the data from the duplicate rows. It deletes all of them and re-inserts one of them before going to the next row. The deletion and re-insertion should work correctly because there should be no dependencies on this table--it has no keys.

	create or replace function fixdupes()
	returns void as '
	DECLARE
	d record;
	BEGIN
	   for d in select oid 
	            from rawlog 
	            group by ldate, ltime, doc, ip, method, qs, code, dsize
	            having count(*) > 1 limit 1;
	   loop
	      delete from rawlog  
	      where ldate=d.ldate and ltime=d.ltime and 
	            doc=d.doc and ip=d.ip and method=d.method and
	            qs=d.qs and code=d.code and dsize=d.dsize;
	
	      insert into rawlog values (d.ip, d.ldate, d.ltime, d.method, d.doc,
	         d.qs, d.code, d.dsize);
	   end loop;
	   RETURN;
	END;
	' language 'plpgsql';

Vacuuming might be a good thing to do after running this function if there were many rows changed. This function was written for this specific table because I have no intention of having the same problem with other tables.

Contributors: elein at varlena.com

Automated ALTER
[GENERAL] full featured alter table? 12-Jun-2003

This thread generated two mostly separate discussions--one about altering tables and the second about recording a display order for columns. So, this thread is split into this item and the one that follows.

The issue was raised (once again) that PostgreSQL does not support changing the definitions of a column. What is not supported, particularly, is changing the type of a column and that was what most of the discussion covered.

The complaint is that one has to manage all of the pieces by hand, including keys, foreign keys, constrains, triggers and indexes.

There are no current plans to implement a "full featured" alter table in PostgreSQL at this time. This is a complex task that requires human intervention for semantic interpretation and without that, an enormous risk of corrupting data. For example, MySQL can convert a character column to a date type but it silently converts any field it can't understand to "the very wonderful value of 0000-00-00." Since the transaction (what transaction?) didn't rollback, your original data is gone.

The problems that arise from changing the type of a column are many.

Storage is needed.
Clean and safe failure for everything from pulling the plug to conversion failure.
Changing an active database would need to handle any stored or cached plans.
Existing plpgsql applications and foreign keys must still work, including functions returning a set of the column.
Handling conversion failure is requires semantic information when changing the column to something smaller or very different.
Indexes need to be dropped and rebuilt. It is possible that the operator class for the new datatype is different than the one for the original and so some indexes may not be able to be recreated exactly as they were.
Functional indexes would require that the function be dropped and redefined for the new type and then the index must be rebuilt.
Constraints must be changed to match the new datatypes. Any functions in constraints must be changed to take the new datatype.

Most data conversion is handled well by implicit or explicit casts and conversion routines. It is those cases where casts do not exist where there will be data conversion problems. It's up to you to fix the data and try again, or use a process with a more complex conversion function.

The best suggestion was from several people who suggested that an administration tool like pgAdmin could be built to do this conversion. Tom Lane outlined it saying, " It could walk you through all these considerations, get your okay on each nontrivial semantic change, and then apply all the operations within a single transaction. We have all the ALTER functionality needed to support such a tool."

Here are some suggested algorithms for doing the basic column changes. To these I would add a check list of items to review for possible changes:

manage data conversion errors
indexes
triggers
rules/views
constraints
functional indexes
foreign keys

This one uses existing ALTER commands:

Rename existing column
	BEGIN ;
	ALTER TABLE foo RENAME COLUMN col TO backupcol;
Create new column
	ALTER TABLE foo ADD COLUMN col newtype
Repopulate from existing column
	UPDATE foo set col=backupcol;
Assuming all data was successfully copied, Drop existing column
	ALTER TABLE foo DROP COLUMN backupcol;
	COMMIT;

If you are comfortable with editing the pg_dump script you could use this:

Dump the schema and data separately from that table only.
	pg_dump -a -t tabletochange > tabledata.sql
	pg_dump -s -t tabletochange > tableschema.sql
Edit the schema and make the appropriate changes.
	vi tableschema.sql 
Reload the schema and data.  Handle data loading errors.
	psql testdatabase < tableschema.sql
	psql testdatabase < tabledata.sql

This is one of the create table methods.

Create a table exactly like the old one
except for the column definition change.
	CREATE TABLE new (
	...
	);
Load the new table from the old one.
	INSERT INTO new SELECT ... FROM old;
If that went well, drop the old table and rename the new one.
	DROP TABLE old;
	ALTER TABLE new RENAME TO old;

Contributors: Sven Koehler skoehler at upb.de, Jonathan Bartlett johnnyb at eskimo.com, Stephan Szabo sszabo at megazone23.bigpanda.com, Reynard Hilman reynardmh at lightsky.com, scott.marlowe scott.marlowe at ihs.com, Alvaro Herrera alvherre at dcc.uchile.cl, weigelt at metux.de, Tino Wildenhain tino at wildenhain.de, Tom Lane tgl at sss.pgh.pa.us

Column Presentation Ordering
[GENERAL] full featured alter table? 12-Jun-2003

The second part of this thread dealt with the display ordering of columns. This means the default ordering when performing a SELECT *. It does not mean reordering the storage of the columns on disk.

It was pointed out that the system catalog table pg_attribute is the natural place to keep the column ordering info. Adding an attlognum, as distinct from attnum the physical column position, was discussed some months ago. Admin tools could use this to determine column display order too.

However, this is a client application function even if the client is a pg administration tool. pgaccess, for example, stores the ordering separately from the system catalog. Some people don't like this idea because they do not like administration tools to create tables in the database. But administration tools are just one kind of client. A database should be built for multiple clients. Only SELECT * would be affected by this attribute and nothing else on the server side would.

Other solutions were presented, including the obvious ones where one should never use SELECT * anyway and always name your target list explicitly and, of course, views.

Still, Tom Lane holds that the request for display ordering is reasonable (but not yet done) and you can see the benefit if all clients of the database can draw on a centralized column display ordering for consistency between applications.

Contributors: Tom Lane tgl at sss.pgh.pa.us, Jim C. Nasby jim at nasby.net, Sven Köhler skoehler at upb.de, Guillaume LELARGE gleu at wanadoo.fr, Bruno Wolff III bruno at wolff.to, weigelt at metux.de, Jay O'Connor joconnor at cybermesa.com, Nigel J. Andrews nandrews at investsystems.co.uk, Bruno BAGUETTE pgsql-ml at baguette.net, Randal L. Schwartz merlyn at stonehenge.com , Ernest E Vogelsinger ernest at vogelsinger.at, Mike Mascari mascarm at mascari.com, Tino Wildenhain tino at wildenhain.de

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top