Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

2-May-2004 Issue: 72

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Copying out a select
Using a select to create a file	30-Apr-2004

As described in Issue #46 the shell with psql can be very powerful in helping you to manipulate data.

The ability to COPY the output of a selection to a file has often been requested. One of the reasons it has not quite been implemented yet is that it is possible to do using psql.

The following bash function copies the output of a selection into a file and removes the delimiters. The output is a fixed length record suitable for legacy applications. It does not include all of the columns in the original view, file_footer_output. Notably the column file_footer_output.batch is not included but it is qualified by a shell variable which has been previously defined. $outfile is also a previously set shell variable.

psql -tA -F'|' << OUTPUT_FILE_FOOTER | sed -e 's/|//g' >> $outfile
select
   record_type,                            -- len 1
   to_char(batch_count,'FM000000'),        -- len 6
   to_char(block_count,'FM000000'),        -- len 6
   to_char(entry_count,'FM00000000'),      -- len 8
   to_char(entry_code, 'FM0000000000'),    -- len 10
   to_char(db_amount,  'FM000000000000'),  -- len 12
   to_char(cr_amount,  'FM000000000000'),  -- len 12
   rpad(reserved,12,' ')                   -- len 12
from file_footer_output where batch = $batch;
OUTPUT_FILE_FOOTER

The psql options used are -t (tuples only) and -A (unaligned). The field separator is set to a pipe since there may never be pipes in the data and setting the field separator to an empty string does not work as expected. The input of the psql statement is between the two labels OUTPUT_FILE_FOOTER. The output of the psql statement is piped through a simple sed script to get rid of the field separators and then sent to the file named by the value of $outfile.

When you need a delimited output file, specify your delimiter and remove the sed portion of the shell statement. Interleaving psql and shell commands can be an easy way to copy selection statements out.

Contributors: elein at varlena.com

How to add a sequence to an existing table
Adding Serial Columns 16-Apr-2004

A SERIAL column in a table is implemented using a sequence. It is stored as an integer or bigint. A corresponding SEQUENCE is created with the name 'tablename_columnname_seq' and the default value of the SERIAL column is set to the result of the function nextval('tablename_columnname_seq'). The documentation says:

	CREATE TABLE tablename (
    	colname SERIAL
	);

	is equivalent to specifying:

	CREATE SEQUENCE tablename_colname_seq;
	CREATE TABLE tablename (
    	colname integer DEFAULT nextval('tablename_colname_seq') NOT NULL
	);

Suppose you have a table already defined that now requires a SERIAL column. You cannot (yet) use ALTER TABLE to add the SERIAL column because ALTER TABLE does not (yet) support the addition of columns and DEFAULT in the same statement. So you must set up the sequence and set the default yourself. And, if initial values are required, you must initialize the new SERIAL column of the table.

These are the steps you would use to add a SERIAL column to the table person_aliases. This example shows creating the column and corresponding sequence and then adding the default to the column.

	ALTER TABLE person_aliases 
	   ADD COLUMN alias_id integer;

	CREATE SEQUENCE person_aliases_alias_id_seq;

	ALTER TABLE person_aliases 
	   ALTER alias_id 
	   SET DEFAULT nextval('person_aliases_alias_id_seq'::text);

Then it goes further to initialize the value of the column for the entire table. Note that the initialization of the values is in no particular order. Finally, it adds the constraints NOT NULL and UNIQUE on the column. Since 7.3, SERIAL columns are not guaranteed to be UNIQUE and if they should be unique, you must explicitly state that.


	UPDATE person_aliases SET alias_id = DEFAULT;

	ALTER TABLE person_aliases 
	   ALTER alias_id SET NOT NULL;

	ALTER TABLE person_aliases 
	   ADD CONSTRAINT person_aliases_ukey UNIQUE(alias_id);

Contributors: Andrew Dunston amdunstan at ncshp.org elein at varlena.com

Better Solution for disjoint summation
[SQL] Counting rows from two tables in one query 01-May-2004

In Issue #70 I offered a solution to multiple aggregates in one query. The solution used subqueries and solution does not scale well.

 select tname, matches, won, ttype, area 
	from 
	   (select
	      tname, tykey, akey,
	   ((select count(*) from tmatches where team1 = sname )
	    + (select count(*) from tmatches where team2 = sname ) ) as matches,
	   ((select count(*) from tmatches where team1 = sname and score1 < score2 )
	    + (select count(*) from tmatches where team2 = sname and score2 < score1 ) ) as won
	from teams) foo 
	JOIN ttypes ty USING (tykey)
	JOIN areas USING (akey)
	where matches >< 0
	order by matches desc;

Jeff Boes offered a better solution. When the aggregation is a count, it can be done more simply using a CASE inside of a SUM aggregate. This enables the result to truly be done in one pass of the base table(s). This solution may not work for fancier aggregations, but it depends on the nature of that aggregation.

select sname, sum(case when sname in (team1,team2) then 1 else 0 end) as tmatches, sum(case when sname = team1 and score1 < score2 then 1 when sname = team2 and score2 > score1 then 1 else 0 end) as wins, sum(case when sname = team2 and score1 < score2 then 1 when sname = team1 and score2 > score1 then 1 else 0 end) as losses from teams join tmatches on (team1 = sname or team2 = sname ) group by sname order by sname;

Jeff's solution scales much better because it only runs through the tmatches table once and has no COUNT(*) subqueries. Standing alone, it ran about 75ms faster than the original query. There are a couple of differences in the queries, however. Primarily, the original query joined in the team type and the area the team was from. This is only possible in the second solution by wrapping the query one more level in order to avoid grouping by these values.

There are many ways to do complex queries. Always make sure you know what you want as a result first. Then you can use EXPLAIN ANALYZE to choose the fastest plan. You can use subqueries and JOINs in different ways to produce similar results. And be careful, long, complicated output from EXPLAIN ANALYZE is not a sign that the query is too complex or slow. Read the numbers carefully.

Contributors: Jeff Boes jboes at nexcerpt.com elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top