varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

19-Apr-2004 Issue: 71

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Top N Rows
Comments on Top N Rows 17-Apr-2004

In Issue #70 an algorithm was described enabling the first N rows in a grouped query. This query returned on N rows for each group.

A much simpler solution to the original Top N with group by in Issue #70 was suggested Alistair Eaves. The SQL code for the original sets up this better solution.

	SELECT *
	FROM table70
	WHERE id IN (
	   SELECT id
	   FROM table70 AS topn
	   WHERE topn.query = table70.query
	   ORDER by score, id
	   LIMIT 2 )
	ORDER BY query

It was also brought to my attention by Troels Arvin that not everyone thinks of "Top N" in this way. In other definitions of Top N, you may have more than N rows when there are tie conditions for the values in the group. For example, from Troel's RDBMS Writings, given the person table, the three youngest people might be four people of two ages.

	SELECT * FROM person ORDER BY age ASC;
	+----------+-------------+-----+
	|PERSON_ID | PERSON_NAME | AGE |
	+----------+-------------+-----+
	|        7 | Hilda       |  12 |
	|        8 | Bill        |  12 |
	|        4 | Joe         |  23 |
	|        2 | Veronica    |  23 |
	|        3 | Michael     |  27 |
	|        9 | Marianne    |  27 |
	|        1 | Ben         |  50 |
	|       10 | Michelle    |  50 |
	|        5 | Irene       |  77 |
	|        6 | Vivian      |  77 |
	+----------+-------------+-----+
Now, we only want the three (n=3) youngest persons displayed, i.e. a result set like this:
	+----------+-------------+-----+
	|PERSON_ID | PERSON_NAME | AGE |
	+----------+-------------+-----+
	|        7 | Hilda       |  12 |
	|        8 | Bill        |  12 |
	|        4 | Joe         |  23 |
	|        2 | Veronica    |  23 |
	+----------+-------------+-----+

The difference depends on exactly what you want in your query. Using ORDER BY with LIMIT N will give you exactly N rows. You will have to write a more complex function to be able to handle the tie situations in a consistent and logical way.

Contributors: elein at varlena.com troels arvin at arvin.dk aeaves at contextworld.com
SQL 2003 Standards Approved
Notes on SQL Standards 17-Apr-2004

As you probably saw in Robert Treat's Weekly news, the SQL2003 was finally published. An overview of some of the changes can be seen SQL:2003 Has Been Published by Andrew Eisenberg, et al. (You may need to purchase this document or be an ACM member to access this document.)

In this article, I will informally review some of the changes reviewed by Eisenberg, et al. to see how the PostgreSQL addresses these standards. The last four items in the list below are not covered in this article. They will be covered in a future article.

Statements about whether PostgreSQL supports or does not support an element is based on 7.4 at this time. Plans for 7.5 are not addressed. Quoted remarks are from the SQL:2003 Has Been Published Document.

Overview Summarized into an Executive Summary of Changes
Change PostgreSQL
BIT and BITVARYING data types removed Supported
BIGINT data type added Supported
Permissions on running Procedures Supported
Parameter syntax in Procedures*Not Supported
Extensions added to CREATE TABLE AS Supported
Sequence Generator Added*Supported
Identity Column type*Supported
Generated Column type*Not Supported
Extensions added to CREATE TABLE LIKE Not Supported
TABLE functions added *Not Supported
MERGE statement added Not Supported
MULTISET data type added Not Supported
XML data type added Not Supported
* Indicates partial or work around support

BIT and BITVARYING data types removed

PostgreSQL supports both bit and bit varying types. The reason the committee stated that these were removed was the lack of support of these types in "existing SQL database engines, and the lack of expected support in the future." This is certainly not the case for PostgreSQL. Below is an example of input support for these types.
	create table bits ( b bit, bv bit varying);
	insert into bits values ( '1'::bit, '1'::bit varying);
	insert into bits values ( '101'::bit, '101'::bit varying);
	insert into bits values ( 'xF'::bit, 'xF'::bit varying);
	elein=# select * from bits;
 	b |  bv  
	---+------
 	1 | 1
 	1 | 101
 	1 | 1111
	(3 rows)

BIGINT data type added

The BIGINT type may be a representation of a 64 bit or a 32 bit integer depending on the implementation and platform. PostgreSQL supports BIGINT. It is equivalent to an int8 in PostgreSQL.

Permissions on running Procedures

This item adds the ability to define a function so that it runs with either the privileges of its definer or the privileges of its invoker. This feature is supported in PostgreSQL with the CREATE FUNCTION options [EXTERNAL] SECURITY INVOKER and [EXTERNAL] SECURITY DEFINER. The default is to run the function with the permissions of the invoker of the function.

Parameter syntax in Procedures

This SQL2003 feature allows the use of function_name.parameter_name inside of procedures in order to avoid name conflicts when columns referenced inside of the procedure are the same as the parameter name.

PostgreSQL does not support named parameters in its function definitions so it does not support this syntax which uses named parameters. However, the underlying problem of conflicting names is solved with PostgreSQL with the use of the ALIAS declaration of the parameters in plpgsql. The example below shows the different ways this is used. Both functions are only minimally defined in order to show the use of the named parameter and alias.
SQL2003PostgreSQL
CREATE FUNCTION getdepnameCREATE FUNCTION depname
(deptno char(3))(char(3))
RETURNS varchar(20)RETURNS varchar(20)
 AS '
 DECLARE;
 dno as alias for $1;
 BEGIN
select deptnameselect deptname
from deptfrom dept
where deptno=getdepname.depname;where deptno=dno;
 END;
 ' language 'plpgsql';

Extensions added to CREATE TABLE AS

This extension enables renaming of columns when using the CREATE TABLE AS statement. PostgreSQL supports this. It is handy for rearranging and renaming columns.
	elein=# select * from yyy;
	 one | two | data 
	-----+-----+------
	   2 |     |    3
	   3 |     |    4
	   4 |     |    5
	(3 rows)
	
	elein=# create table xxx (a, b, c) as (select * from yyy);
	SELECT
	elein=# select * from xxx;
	 a | b | c 
	---+---+---
	 2 |   | 3
	 3 |   | 4
	 4 |   | 5
	(3 rows)

An interesting note in the SQL2003: Published document is that

the syntax of the CREATE TABLE AS statement is similar to what is known as Materialized Query Tables (MQTs) or Materialized Views in many commercial products, the semantics are different.
The semantics are different in that there is no dependency between the tables. Materialized views have an implied relationship to the underlying tables.

Sequence Generator Added

SQL2003 added Sequences to the specification. PostgreSQL already supports this with two extra options. The options specified by the spec are:
  • STARTS WITH n
  • INCREMENT BY n
  • MINVALUE n
  • MAXVALUE n
  • [NO] CYCLE
The two additional options in PostgreSQL are the ability to create a temporary sequence and to specify how many values are preallocated. The specification does not make mention of whether the sequences may or may not have gaps. The additional statement ALTER SEQUENCE and DROP SEQUENCE are supported. The function the 2003 specification specifies as NEXT VALUE FOR sequence is not supported. The equivalent function is nextval(sequence). There is no mention of syntax equivalent to the PostgreSQL function currval(sequence).

Identify Column type

The equivalent of an IDENTITY column in PostgreSQL is a SERIAL or SERIAL8. This feature creates a sequence associated with a column in a table. As specified the sequence can also have its parameters set (see above) in the column definition like this:
	CREATE TABLE parts (
	   partnum INTEGER GENERATED ALWAYS AS IDENTITY
		( START WITH 100 INCREMENT BY 10 MIN VALUE 10),
		partname	VARCHAR(40)
	); 
The PostgreSQL equivalent requires that you ALTER the sequence after creation if you want options different from the default options. And this requires that you know the sequence name. In PostgreSQL the default sequence name is tablename_columnname_seq.
	CREATE TABLE parts (
	   partnum SERIAL,
	   partname	VARCHAR(40)
	);
	ALTER SEQUENCE 'parts_partnum_seq' 
		START WITH 100 INCREMENT BY 10 MIN VALUE 10;

PostgreSQL allows you to say SERIAL or SERIAL8 where the spec requires that you declare the column as INTEGER AS IDENTITY or BIGINT AS IDENTITY.

An interesting limitation in the specifications is that one table can have at most one IDENTITY column. In PostgreSQL you may have as many SERIAL columns as you like in a table.

There is one more option on the 2003 specification which is GENERATED ALWAYS or GENERATED BY DEFAULT. The first raises an error if any value inserted into the column other than DEFAULT. The second option will allow assignment of the value in an UPDATE or INSERT statement. PostgreSQL supports the second option by default only and does not support the GENERATED ALWAYS option.

Generated Column type

Generated column types are the ability to define a column in a table based on an expression using other columns in a row. Like the IDENTITY column, the option GENERATED ALWAYS is specified. For example:
CREATE TABLE employees (
	emp_id integer,
	salary decimal(7,2),
	bonus decimal(7,2),
	total_com GENERATED ALWAYS
	   AS (SALARY + BONUS)
	);
The value of total_com is stored when a row is INSERTED into the table. The overview did not say whether the value of the generated column is updated when the base columns are updated. It only addressed the issue of INSERT. It specifically gave an example where the option GENERATED BY DEFAULT is not supported. When a row is inserted, the value of the generated column must not have a value. It must be designed DEFAULT.
	insert into employees values (1234, 75000, 1500, DEFAULT );

This behavior is similar to virtual columns. Virtual columns had more or less the same declaration, but the value was not stored. It was calculated on the fly for each selection. Virtual columns were supported in Illustra. PostgreSQL does not support virtual columns.

PostgreSQL does not support generated column types in the CREATE TABLE definitions. The same behavior can be created using a TRIGGER function. With a TRIGGER function, the functionality is less limiting.

Extensions added to CREATE TABLE LIKE

The CREATE TABLE LIKE statement has been extended to include the addition of GENERATED columns and IDENTITY columns. PostgreSQL does not support the syntax in CREATE TABLE so it does not support it in CREATE TABLE LIKE.

It is important to note that in PostgreSQL when creating a table LIKE another table that has a serial column, a new sequence is not created. That must be done by a separate CREATE SEQUENCE and ALTER TABLE to set the default for the column to be the nextval() of the column.

Contributors: elein at varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW