This is Issue # 30 for General Bits. Our latest addition of a Portuguese translation has been very successful so far thanks to Juliano S. Ignacio.
Don't forget to subscribe to PostgreSQL General Bits. Support your local open source projects! Some new feature are in the works that will only available to subscribers.
As usual, feedback is requested. Send in your comments and corrections and suggestions.
And we are all looking forward to 7.4!
You can create and use temporary tables inside of pl/pgsql functions, however, there is a catch. pl/pgsql functions are parsed when they are first run. Subsequent calls use the same query structures to run. If one of those queries accesses a temporary table created and dropped in the function it will work the first time, but not the second. The second time it is called, it will still be trying to access the first invocation's temporary table, not the newly created one.
The workaround/proper way to handle this situation is to perform queries which access temporary tables in plpgsql functions by using EXECUTE. By using EXECUTE, the query is reparsed and the proper (new) temp table is accessed.
-- This will not work ... CREATE TEMP TABLE foo AS SELECT ....; FOR r IN SELECT * FROM foo LOOP .... END LOOP; DROP TABLE foo; -- This will work ... CREATE TEMP TABLE foo AS SELECT ....; FOR r IN EXECUTE "SELECT * FROM foo" LOOP .... END LOOP; DROP TABLE foo; ...
To convert a database from one monetary unit to another, you only need to set the variables LC_MONETARY and LC_NUMERIC. This will hint to the database that the decimal should be a comma instead of a dot and hint what the currency symbol should be.
Unfortunately, none of the input and output for any of the number types listen to this hint. According to SQL specification, a dot is a decimal divider and PostgreSQL sticks to the specs where ever possible.
The functions that do pay attention to LC_MONETARY and LC_NUMERIC hints are the to_char() and to_number() functions. Capture input correctly using to_number() and send out properly displayed values using to_char().
In this example we have a float and numeric columns in a table.
create table to_what ( fl float, num numeric);To check what the current values of LC_MONETARY and LC_NUMERIC are use SHOW. These show we are starting out with US English values.
=# show LC_MONETARY; lc_monetary ------------- en_US =# show LC_NUMERIC; lc_numeric ------------ en_USWe'll insert one row in using these values and show the result which is as expected, dots for decimals.
=# insert into to_what values ( 1.2, 34.56 ); INSERT 1622475 1 elein=# select * from to_what; fl | num -----+------- 1.2 | 34.56We'll change the values of LC_MONETARY and LC_NUMERIC to 'fr_FR' in order to see how that changes things. A row using dots as decimals is inserted and also inserted is a row where the values have a comma as a decimal separator. In order to use the comma separator, the values must be single quoted and passed to to_number() with an appropriate format.
=# set LC_MONETARY='fr_FR'; =# set LC_NUMERIC='fr_FR'; =# insert into to_what values ( 78.09, 12.34 ); =# insert into to_what values ( to_number( '56,78', '99D99'), to_number('90,12','99D99') );Selecting out all of the values shows that the output routines still use the dot.
=# select * from to_what; fl | num -------+------- 1.2 | 34.56 78.09 | 12.34 56.78 | 90.12However, the function to_char() can be used to format the output to the proper one for our locale.
=# select to_char( fl, '99D99'), to_char(num, '99D99') from to_what; to_char | to_char ---------+--------- F 1,20 | F 34,56 F 78,09 | F 12,34 F 56,78 | F 90,12See the docs, Section 6.7. Data Type Formatting Functions for more information about the symbols you can use in creating your formats. And as a side note, the currency symbol for 'fr_FR' still shows 'F'.
Although we all know that in a relational database, all tables have a functional primary key, sometimes we end up with a table of data generated by something or someone else which does not have this key. In these cases we generate a primary key if the natural key is not adequate.
In the course of loading data, it happens sometimes that data gets loaded twice and you've ended up with duplicate rows in your table and the generated key won't help you a bit. And neither will oids, especially if your table does not have them. (One more argument for natural keys...)
Someone, who will go unnamed, has a table that is loaded directly from an apache log file. Someone, that same unnamed person, made some kind of mistake and ended up loading many rows a second time. oops.
To see how bad the problem is this query can be issued. This counts the instances of duplicates. The group by has all of the columns in the table, so a duplicate is identified by two or more rows having all columns equivalent. The having clause is evaluated last. It will throw out all of the rows which have a count of 1.
select count(*) from (select count(*) as howmany from rawlog group by ip, ldate, ltime, method, doc, qs, code, dsize having count(*) > 1) foo;The subquery can be issued to show you the data for the records which are duplicates. This is helpful if there are only a few of them. That is why we counted them first.
select count(*) as howmany, ldate, ltime, doc, ip, method, qs, code, dsize from rawlog group by ldate, ltime, doc, ip, method, qs, code, dsize having count(*) > 1;So now we know how many there are and which ones there are. But now we want to keep one and only one row of the rows which have two or more copies and delete the others. For this we will use a brute force approach. If we had OIDs in our table it could be simpler. We could have used the same same technique used for updating duplicate rows shown in Issue #25 Article 5. But alas, we have no OIDs, so brute force must be the way.
This function uses the query above to select out the data from the duplicate rows. It deletes all of them and re-inserts one of them before going to the next row. The deletion and re-insertion should work correctly because there should be no dependencies on this table--it has no keys.
create or replace function fixdupes() returns void as ' DECLARE d record; BEGIN for d in select oid from rawlog group by ldate, ltime, doc, ip, method, qs, code, dsize having count(*) > 1 limit 1; loop delete from rawlog where ldate=d.ldate and ltime=d.ltime and doc=d.doc and ip=d.ip and method=d.method and qs=d.qs and code=d.code and dsize=d.dsize; insert into rawlog values (d.ip, d.ldate, d.ltime, d.method, d.doc, d.qs, d.code, d.dsize); end loop; RETURN; END; ' language 'plpgsql';Vacuuming might be a good thing to do after running this function if there were many rows changed. This function was written for this specific table because I have no intention of having the same problem with other tables.
This thread generated two mostly separate discussions--one about altering tables and the second about recording a display order for columns. So, this thread is split into this item and the one that follows.
The issue was raised (once again) that PostgreSQL does not support changing the definitions of a column. What is not supported, particularly, is changing the type of a column and that was what most of the discussion covered.
The complaint is that one has to manage all of the pieces by hand, including keys, foreign keys, constrains, triggers and indexes.
There are no current plans to implement a "full featured" alter table in PostgreSQL at this time. This is a complex task that requires human intervention for semantic interpretation and without that, an enormous risk of corrupting data. For example, MySQL can convert a character column to a date type but it silently converts any field it can't understand to "the very wonderful value of 0000-00-00." Since the transaction (what transaction?) didn't rollback, your original data is gone.
The problems that arise from changing the type of a column are many.
Most data conversion is handled well by implicit or explicit casts and conversion routines. It is those cases where casts do not exist where there will be data conversion problems. It's up to you to fix the data and try again, or use a process with a more complex conversion function.
The best suggestion was from several people who suggested that an administration tool like pgAdmin could be built to do this conversion. Tom Lane outlined it saying, " It could walk you through all these considerations, get your okay on each nontrivial semantic change, and then apply all the operations within a single transaction. We have all the ALTER functionality needed to support such a tool."
Here are some suggested algorithms for doing the basic column changes. To these I would add a check list of items to review for possible changes:
Rename existing column BEGIN ; ALTER TABLE foo RENAME COLUMN col TO backupcol; Create new column ALTER TABLE foo ADD COLUMN col newtype Repopulate from existing column UPDATE foo set col=backupcol; Assuming all data was successfully copied, Drop existing column ALTER TABLE foo DROP COLUMN backupcol; COMMIT;If you are comfortable with editing the pg_dump script you could use this:
Dump the schema and data separately from that table only. pg_dump -a -t tabletochange > tabledata.sql pg_dump -s -t tabletochange > tableschema.sql Edit the schema and make the appropriate changes. vi tableschema.sql Reload the schema and data. Handle data loading errors. psql testdatabase < tableschema.sql psql testdatabase < tabledata.sqlThis is one of the create table methods.
Create a table exactly like the old one except for the column definition change. CREATE TABLE new ( ... ); Load the new table from the old one. INSERT INTO new SELECT ... FROM old; If that went well, drop the old table and rename the new one. DROP TABLE old; ALTER TABLE new RENAME TO old;
The second part of this thread dealt with the display ordering of columns. This means the default ordering when performing a SELECT *. It does not mean reordering the storage of the columns on disk.
It was pointed out that the system catalog table pg_attribute is the natural place to keep the column ordering info. Adding an attlognum, as distinct from attnum the physical column position, was discussed some months ago. Admin tools could use this to determine column display order too.
However, this is a client application function even if the client is a pg administration tool. pgaccess, for example, stores the ordering separately from the system catalog. Some people don't like this idea because they do not like administration tools to create tables in the database. But administration tools are just one kind of client. A database should be built for multiple clients. Only SELECT * would be affected by this attribute and nothing else on the server side would.
Other solutions were presented, including the obvious ones where one should never use SELECT * anyway and always name your target list explicitly and, of course, views.
Still, Tom Lane holds that the request for display ordering is reasonable (but not yet done) and you can see the benefit if all clients of the database can draw on a centralized column display ordering for consistency between applications.
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009