Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

12-Apr-2004 Issue: 70

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

OSCON 2004 Registration Is Open
O'Reilly Open Source Convention 2004	9-Apr-2004

Registration is now open for the 2004 O'Reilly Open Source Convention July 26-30, 2004 in Portland, OR. For those of us in or near the States, this is the PostgreSQL conference to attend.

PostgreSQL people will be there in full force. You will have a chance to meet your favorite developers and contributors face to face.

PostgreSQL will have both a table and a birds-of-a-feather session. Please contact me if you are planning on coming and want to know what we are planning and how you can join rest of us. You may also want to tune into the pgsql-advocacy list to hear some of the public discussions of PostgreSQL at OSCON.

The PostgreSQL Track will be very full of great presentations by a good variety of impressive people. I recommend all of them without hesitation. But don't miss the panel discussion of Replication.

The tutorials will be held on the Monday and Tuesday before the open conference. They are listed at the end of this table and include sessions by Joe Conway, Greg Sabino Mullane and yours truly.

Speakers Wednesday, July 28

Topic/Speaker Speaker Time

Setting up a Large PostgreSQL Server: A Case Study Salon B 4:30pm - 5:15pm
Vivek Khera, Khera Communications, Inc.

Applying Advanced Database Features Salon B 5:20pm - 6:05pm
Robert Treat

Tsearch2: PostgreSQL Full Text Search Extension Salon H 10:45am - 11:30am
George Essig, Web Developer, Federal Reserve Bank of St. Louis

Speakers Thursday, July 29

Topic/Speaker Room Time

Flexible Data Acquisition and Analysis Salon H 11:35am - 12:20pm
Joseph Conway

State of PostgreSQL Salon F 1:45pm - 2:30pm
Bruce Momjian, Software Research Associates

Structuring Structural Biology with PostgreSQL Salon C 2:35pm - 3:20pm
Reece Hart, Scientist, Genentech, Inc.

PostgreSQL Replication Panel Salon A 4:30pm - 6:05pm
Jan Wieck, PostgreSQL Project, Robert Kaye, Mayhem and Chaos Coordinator, MusicBrainz, Andrew Sullivan, Afilias / Liberty RMS

Speakers Friday, July 30

Topic/Speaker Room Time

Build Your Own MapQuest! Adding Spatial Smarts with PostGIS Salon G 10:45am - 11:30am
Paul Ramsey, Senior Analyst, Refractions Research

The Joy of Index Salon B 11:35am - 12:20pm
Josh Berkus, Core Team, PostgreSQL.org

Tutorials Monday July 26

Topic/Speaker Room Time

Introduction to PostgreSQL Eugene 8:45am - 12:15pm
A. Elein Mustain, Varlena, LLC

Power PostgreSQL: Extending the Database with C Eugene 1:45pm - 5:15pm
Joseph Conway

Tutorials Tuesday July 27

Topic/Speaker Room Time

How to Play Together Nicely: Strategies for DBA and Application Developers Willamette 1:45pm - 5:15pm
Greg Sabino Mullane

Note that times, names and places may change anytime up to the conference date.

Contributors: elein at varlena.com

NLUUG May 2004
NLUUG Open Source for Business 10-Apr-2004

The Netherland's NLUUG is sponsoring an Open Source for Business Conference in Ede at the De Reehorst Hotel on May 27, 2004. For more information on the conference see: http://www.nluug.nl/events/vj04/

Elein Mustain will be speaking on PostgreSQL Extensibility. Many talks will be in English and most people will also speak English.

Contributors: elein at varlena.com

Subselect your disjoint aggregates
[SQL] Counting rows from two tables in one query 07-Apr-2004

In order to select the results from two separate aggregates, those aggregates must be in subselects. For example, in order to count the rows for each value of the foreign key main_id in Table A and Table B this is your query:

	SELECT id,
	  (SELECT COUNT(*) FROM sub_a WHERE sub_a.main_id=main.id),
	  (SELECT COUNT(*) FROM sub_b WHERE sub_b.main_id=main.id)
	FROM main ORDER BY id;

The aggregate queries are run in the target list. Each subquery is run once for each distinct id in table main. This is not a speedy query, but it will do the job.

For small tables, you can use this technique for more complex queries. For example, given tables containing rugby matches and a teams table, we can aggregate how many games each team played and how many each team won in the same query. In this query, the team could be in the match table as team1 or as team2. We do the aggregation in the smallest possible query against teams and join in the lookup tables for team types and areas on the result. But at the end we do want to eliminate those teams who never played a game and we'd like to order by the number of matches played.

	select tname, matches, won, ttype, area 
	from (
	   (select
	      tname, tykey, akey,
	   ((select count(*) from tmatches where team1 = sname )
	    + (select count(*) from tmatches where team2 = sname ) ) as matches,
	   ((select count(*) from tmatches where team1 = sname and score1 > score2 )
	    + (select count(*) from tmatches where team2 = sname and score2 > score1 ) ) as won
	from teams) foo 
	JOIN ttypes ty USING (tykey)
	JOIN areas USING (akey)
	where matches <> 0
	order by matches desc;

	                  tname             | matches | won |       ttype       |        area         
	------------------------------------+---------+-----+-------------------+----------------
	 Berkeley All Blues                 |      23 |  15 | Club              | Pacific Coast
	 USA U23 National Team              |      21 |   6 | National/All Star | All
	 Women's All Star 7s Pacific        |      16 |  14 | National/All Star | Pacific Coast
	 Women's All Star 7s Atlantic       |      16 |   9 | National/All Star | Mid-Atlantic
	 Women's All Star 7s Northeast      |      16 |   7 | National/All Star | Northeast
	 Women's All Star 7s West           |      16 |   6 | National/All Star | West
	 Women's All Star 7s So California  |      15 |   2 | National/All Star | So California
	 Women's All Star 7s Midwest        |      15 |   8 | National/All Star | Midwest
	 Beantown                           |      14 |   8 | Club              | Northeast
	 Minnesota Valkyries                |      14 |   8 | Club              | Midwest
	 New York                           |      13 |   7 | Club              | Northeast
	 Twin Cities Amazons                |      11 |   5 | Club              | Midwest
	 Women's All Star 7s South          |      10 |   4 | National/All Star | South
	 Atlanta Harlequins                 |       8 |   3 | Club              | South
	 Washington Furies                  |       8 |   3 | Club              | Mid-Atlantic
	 Emerald City Mudhens               |       8 |   1 | Club              | Pacific Coast
	 UCLA                               |       8 |   1 | Club              | So California

Still this is only helpful on tables small enough to wait for the results. And this example also points out the benefit of well defined views. Clearly the nested SQL can be difficult to read. That would be simplified by a view.

It is possible to do a wide variety of reporting type queries with this technique. You could select into the final target list the total number of teams, the total number of games played with the matches and games won, you can calculate the percentages of matches played and won by each team.

When the queries are against larger tables, you may want to consider warehousing the counts of various items in an aggregate table. This could be populated by a trigger on the original table. In that case, you have the aggregates ready to be selected out and joined into your final query for further aggregation or calculations.

Contributors: Stefan Weiss spaceman-75e1f-20040422 at ausgehaucht.sensenmann.at, Greg Sabino Mullane greg at turnstep.com, Bruno Wolff III bruno at wolff.to elein at varlena.com

Selecting the Top N rows
[SQL] SQL challenge--top 10 for each key value? 08-Apr-2004

To get a one level TopN query in SQL, simply order descending by the column and limit the result by N.

	elein=# select id from table70 order by id desc limit 5;
	 id 
	----
	 19
	 18
	 17
	 16
	 15
	(5 rows)

This query gets a lot trickier when what you'd really like to do is to get the TopN grouped by another field. The discussion on this ran from UNIONs, four level subselects, JOINS and the use of the ROW_NUMBER() function.

(The ROW_NUMBER() function is simply an output row counter. It can be written in plpython (see Issue #34 containing my plpython talk) or other pl languages, but PostgreSQL does not support it directly. Rod Taylor pointed out is in the spec 6.10 of SQL200N, but Tom Lane countered that "If the spec doesn't even have a year number yet, you can hardly expect real implementations to support it ;-)." )

The problem is set up like this: Given this table:

	  Column  |     Type     | Modifiers 
	----------+--------------+-----------
	 id       | integer      | PRIMARY KEY
	 query    | integer      | 
	 checksum | character(2) | 
	 score    | integer      |

show the top N scores grouped by query. The output would look like this where the rows containing the top 2 scores for each query are displayed.

	 id | query | checksum | score 
	----+-------+----------+-------
	  1 |     1 | aa       |    10
	 11 |     1 | bb       |    10
	  5 |     2 | cc       |    11
	 12 |     2 | cc       |    11
	  3 |     3 | cc       |    11
	  9 |     3 | bb       |    11
	(6 rows)

Greg Sabino Mullane provided a detailed example using sequences, and Rod Taylor provided the solution using ROW_NUMBER(). I decided to try to create the result set using a function that returned SETS of ROWS.

The type must be created before hand. The function was created so that you could pass in N, the number of results you wanted for each group. As I had done in the previous article I chose to use nested selects. I used nested select loops to do this in this case. I used the ordinary select ... order by ... DESC LIMIT N technique to fetch only the number of rows required.

For each query, we select top N rows.

	CREATE TYPE topscores AS
	   (id integer, query integer, checksum char(2), score integer);
	
	CREATE OR REPLACE FUNCTION topscores(integer) RETURNS SETOF topscores AS
	'
	DECLARE
	   t topscores%ROWTYPE;
	   r RECORD;
	   q RECORD;
	   n alias for $1;
	BEGIN
	   FOR q IN SELECT distinct query from table70 order by query LOOP
	      FOR t IN SELECT id , query, checksum, score
	         FROM table70
	         where query = q.query
	         ORDER BY query, score DESC LIMIT n LOOP
	            RETURN NEXT t;
	      END LOOP;
	   END LOOP;
	   RETURN;
	END;
	' language 'plpgsql';

The full SQL for this example is available.

Contributors: Jeff Boes jboes at nexcerpt.com, Greg Stark gsstark at mit.edu, Rod Taylor pg at rbt.ca, Tom Lane tgl at sss.pgh.pa.us, Josh Berkus josh at agliodbs.com, Greg Sabino Mullane greg at turnstep.com, elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top