|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Registration is now open for the 2004 O'Reilly Open Source Convention July 26-30, 2004 in Portland, OR. For those of us in or near the States, this is the PostgreSQL conference to attend. PostgreSQL people will be there in full force. You will have a chance to meet your favorite developers and contributors face to face. PostgreSQL will have both a table and a birds-of-a-feather session. Please contact me if you are planning on coming and want to know what we are planning and how you can join rest of us. You may also want to tune into the pgsql-advocacy list to hear some of the public discussions of PostgreSQL at OSCON. The PostgreSQL Track will be very full of great presentations by a good variety of impressive people. I recommend all of them without hesitation. But don't miss the panel discussion of Replication. The tutorials will be held on the Monday and Tuesday before the open conference. They are listed at the end of this table and include sessions by Joe Conway, Greg Sabino Mullane and yours truly.
The Netherland's NLUUG is sponsoring an Open Source for Business Conference in Ede at the De Reehorst Hotel on May 27, 2004. For more information on the conference see: http://www.nluug.nl/events/vj04/ Elein Mustain will be speaking on PostgreSQL Extensibility. Many talks will be in English and most people will also speak English.
In order to select the results from two separate aggregates, those aggregates must be in subselects. For example, in order to count the rows for each value of the foreign key main_id in Table A and Table B this is your query: SELECT id, (SELECT COUNT(*) FROM sub_a WHERE sub_a.main_id=main.id), (SELECT COUNT(*) FROM sub_b WHERE sub_b.main_id=main.id) FROM main ORDER BY id; The aggregate queries are run in the target list. Each subquery is run once for each distinct id in table main. This is not a speedy query, but it will do the job. For small tables, you can use this technique for more complex queries. For example, given tables containing rugby matches and a teams table, we can aggregate how many games each team played and how many each team won in the same query. In this query, the team could be in the match table as team1 or as team2. We do the aggregation in the smallest possible query against teams and join in the lookup tables for team types and areas on the result. But at the end we do want to eliminate those teams who never played a game and we'd like to order by the number of matches played. select tname, matches, won, ttype, area from ( (select tname, tykey, akey, ((select count(*) from tmatches where team1 = sname ) + (select count(*) from tmatches where team2 = sname ) ) as matches, ((select count(*) from tmatches where team1 = sname and score1 > score2 ) + (select count(*) from tmatches where team2 = sname and score2 > score1 ) ) as won from teams) foo JOIN ttypes ty USING (tykey) JOIN areas USING (akey) where matches <> 0 order by matches desc; tname | matches | won | ttype | area ------------------------------------+---------+-----+-------------------+---------------- Berkeley All Blues | 23 | 15 | Club | Pacific Coast USA U23 National Team | 21 | 6 | National/All Star | All Women's All Star 7s Pacific | 16 | 14 | National/All Star | Pacific Coast Women's All Star 7s Atlantic | 16 | 9 | National/All Star | Mid-Atlantic Women's All Star 7s Northeast | 16 | 7 | National/All Star | Northeast Women's All Star 7s West | 16 | 6 | National/All Star | West Women's All Star 7s So California | 15 | 2 | National/All Star | So California Women's All Star 7s Midwest | 15 | 8 | National/All Star | Midwest Beantown | 14 | 8 | Club | Northeast Minnesota Valkyries | 14 | 8 | Club | Midwest New York | 13 | 7 | Club | Northeast Twin Cities Amazons | 11 | 5 | Club | Midwest Women's All Star 7s South | 10 | 4 | National/All Star | South Atlanta Harlequins | 8 | 3 | Club | South Washington Furies | 8 | 3 | Club | Mid-Atlantic Emerald City Mudhens | 8 | 1 | Club | Pacific Coast UCLA | 8 | 1 | Club | So California Still this is only helpful on tables small enough to wait for the results. And this example also points out the benefit of well defined views. Clearly the nested SQL can be difficult to read. That would be simplified by a view. It is possible to do a wide variety of reporting type queries with this technique. You could select into the final target list the total number of teams, the total number of games played with the matches and games won, you can calculate the percentages of matches played and won by each team. When the queries are against larger tables, you may want to consider warehousing the counts of various items in an aggregate table. This could be populated by a trigger on the original table. In that case, you have the aggregates ready to be selected out and joined into your final query for further aggregation or calculations.
To get a one level TopN query in SQL, simply order descending by the column and limit the result by N. elein=# select id from table70 order by id desc limit 5; id ---- 19 18 17 16 15 (5 rows) This query gets a lot trickier when what you'd really like to do is to get the TopN grouped by another field. The discussion on this ran from UNIONs, four level subselects, JOINS and the use of the ROW_NUMBER() function. (The ROW_NUMBER() function is simply an output row counter. It can be written in plpython (see Issue #34 containing my plpython talk) or other pl languages, but PostgreSQL does not support it directly. Rod Taylor pointed out is in the spec 6.10 of SQL200N, but Tom Lane countered that "If the spec doesn't even have a year number yet, you can hardly expect real implementations to support it ;-)." ) The problem is set up like this: Given this table: Column | Type | Modifiers ----------+--------------+----------- id | integer | PRIMARY KEY query | integer | checksum | character(2) | score | integer |show the top N scores grouped by query. The output would look like this where the rows containing the top 2 scores for each query are displayed. id | query | checksum | score ----+-------+----------+------- 1 | 1 | aa | 10 11 | 1 | bb | 10 5 | 2 | cc | 11 12 | 2 | cc | 11 3 | 3 | cc | 11 9 | 3 | bb | 11 (6 rows) Greg Sabino Mullane provided a detailed example using sequences, and Rod Taylor provided the solution using ROW_NUMBER(). I decided to try to create the result set using a function that returned SETS of ROWS.
The type must be created before hand. The function was created so
that you could pass in N, the number of results you wanted for each
group. As I had done in the previous article I chose to use nested selects. I
used nested select loops to do this in this case. I used the ordinary
For each query, we select top N rows.
CREATE TYPE topscores AS (id integer, query integer, checksum char(2), score integer); CREATE OR REPLACE FUNCTION topscores(integer) RETURNS SETOF topscores AS ' DECLARE t topscores%ROWTYPE; r RECORD; q RECORD; n alias for $1; BEGIN FOR q IN SELECT distinct query from table70 order by query LOOP FOR t IN SELECT id , query, checksum, score FROM table70 where query = q.query ORDER BY query, score DESC LIMIT n LOOP RETURN NEXT t; END LOOP; END LOOP; RETURN; END; ' language 'plpgsql'; The full SQL for this example is available.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |