varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

5-Jan-2004 Issue: 56

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Survey Time
PostgreSQL Survey 05-Jan-2004

It is time again for another survey. This survey is about how you use PostgreSQL. Please cut and paste the questions into your editor to answer the questions and email your reply to generalbits@varlena.com. The responses will be anonymously compiled and republished if possible.

1. What PostgreSQL Client API do you use the most?

2. To what extent do you or your company use PostgreSQL for its object or object relational features? For example, user defined functions, user defined data types, inheritance, pre-defined non-traditional datatypes such as the geometric types, various indexing methods such as GiST, RTREE, tsearch2.

3. What is your favorite new feature of PostgreSQL 7.4?

4. Should there be a standard worldwide PostgreSQL training course?

5. Does your business utilize a PostgreSQL support company? If so, which one?

6. What version of PostgreSQL do you currently use? If you do not use 7.4 yet what are the technical and/or business reasons that you have not upgraded yet,

7. What features would make PostgreSQL more competitive with Oracle or DB2?

8. Would you or your company utilize a company specializing in PostgreSQL hosting?

9. Are you or your company using (PostgreSQL with) Open Office?

10. What procedural language do you use with PostgreSQL the most?

11. How important is PostgreSQL certification to you and/or your company?

12. What database did you use before PostgreSQL? What other databases do you currently use?

13. Do you or your company use MySQL because it was designed for speed?

14. What feature in PostgreSQL do you leverage the most?

  • reliability
  • extensibility/object-relational design
  • ACID compliance
  • ability to handle large amounts of data
  • Cost

15. What platform do you primarily run PostgreSQL on?

16. What do you think would attract the most new PostgreSQL users?

17. What is your preferred PostgreSQL administration tool?

18. What is your preferred PostgreSQL visualization or design tool?

19. What is your backup strategy for production?

20. What is your fail over strategy for production?

Contributors: elein at varlena.com
Union, Intersect and Except Basics
Merging, Intersecting and Excepting Rows in Tables 02-Jan-2004

Using a UNION is like stacking two (or more) tables or selections together. There must be the same number of columns in each selection and they must be compatible types.

Table 1                  
           
           
Table 2            
           

These examples tables of employees and contractors have the same table structure. In the data, note that there is a contractor 'Dave' and an employee 'Dave' that started on the same day. (The Classic Too Many Daves problem :-) (In real life there would be more information to qualify the Daves uniquely, however, to show variations on UNION we'll keep this anomaly.)

   -- drop table employees; drop table contractors;
   CREATE TABLE employees (
      name text,
      startdate date,
      enddate date );

   CREATE TABLE contractors (
      name text,
      startdate date,
      enddate date );
   
   insert into employees values ('Maxine', '1/1/2002',NULL);
   insert into employees values ('Dave', '1/1/2002',NULL);
   insert into employees values ('Edmund', '3/14/2002',NULL);
   insert into employees values ('Maggie', '4/27/2002',NULL);
   insert into employees values ('Andrew', '10/27/2002',NULL);
   insert into contractors values ( 'Andrew', '1/1/2002', '10/25/2002');
   insert into contractors values ( 'Marcel', '2/1/2002', NULL);
   insert into contractors values ( 'Carrie', '3/14/2002', '12/15/2002');
   insert into contractors values ( 'Jacob', '4/27/2002', NULL);
   insert into contractors values ( 'Dave', '1/1/2002', NULL);

A simple UNION of employees and contractors drops one of the Daves. This is because, by definition a UNION of two sets eliminates the duplicates.

   SELECT name, startdate, enddate FROM employees
   UNION 
   SELECT name, startdate, enddate FROM contractors
   
     name  | startdate  |  enddate   
   --------+------------+------------
    Andrew | 2002-01-01 | 2002-10-25
    Andrew | 2002-10-27 | 
    Carrie | 2002-03-14 | 2002-12-15
    Dave   | 2002-01-01 | 
    Edmund | 2002-03-14 | 
    Jacob  | 2002-04-27 | 
    Maggie | 2002-04-27 | 
    Marcel | 2002-02-01 | 
    Maxine | 2002-01-01 | 

Notice that Andrew began as a contractor and then became an employee. There have been no duplicate row issues with Andrew because the rows are not exact duplicates--they have different start and end dates. If the query involved only names instead of names and dates, then Andrew also would have been a duplicate.

In order to see all of the rows from both tables regardless of duplicates, use UNION ALL. This concatenates the selections instead of performing a formal union on them.

   SELECT name, startdate, enddate FROM employees
   UNION ALL
   SELECT name, startdate, enddate FROM contractors;
   
     name  | startdate  |  enddate   
   --------+------------+------------
    Maxine | 2002-01-01 | 
    Dave   | 2002-01-01 | 
    Edmund | 2002-03-14 | 
    Maggie | 2002-04-27 | 
    Andrew | 2002-10-27 | 
    Andrew | 2002-01-01 | 2002-10-25
    Marcel | 2002-02-01 | 
    Carrie | 2002-03-14 | 2002-12-15
    Jacob  | 2002-04-27 | 
    Dave   | 2002-01-01 | 
   (10 rows)

The Too Many Daves problem as it turns out was a mistake. Dave is an employee not a contractor. So now we want to know which rows in employees have equivalent rows in contractors so we can verify and correct the mistake and see if there are any others like it. To do this, use INTERSECT instead of JOIN.

   SELECT name, startdate, enddate FROM employees
   INTERSECT
   SELECT name, startdate, enddate FROM contractors;

    name | startdate  | enddate 
   ------+------------+---------
    Dave | 2002-01-01 | 
   (1 row)

To eliminate this data set you can use the INTERSECT statement nested in the WHERE IN clause of the DELETE. Why are we using the nested subquery SELECT name FROM (SELECT ... INTERSECT ...)? Try it without the extra level to see.

   DELETE FROM contractors 
   WHERE name IN ( 
      SELECT name FROM (
         SELECT name, startdate, enddate FROM employees
         INTERSECT
         SELECT name, startdate, enddate FROM contractors
      ) foo
   );

Back to the original table (except for the extra Dave). Suppose we wanted to see all the employees who were once contractors. INTERSECT would do that. And what about employees who were never contractors? You can use EXCEPT for that question. Notice that we expect all of the employees from the employee table except Andrew. (Remember Andrew who started out as a contractor?)

   SELECT name FROM employees
   EXCEPT
   SELECT name FROM contractors;

     name  
   --------
    Dave
    Edmund
    Maggie
    Maxine
   (4 rows)

INTERSECT and EXCEPT also eliminate duplicate rows, if any, in the result set. If you want to keep duplicates, you can use the ALL qualifier on INTERSECT and EXCEPT as you would with UNION.

Contributors: elein@varlena.com
Join Basics
How to Join Queries and Tables 02-Jan-2004

A JOIN is a way of selecting columns from two or more tables into the same target list, putting the selections side by side instead of stacking their rows as was done for a UNION. Usually, there are overlapping columns in each table which are important as they specify the JOIN criteria--how the rows are constructed. How to specify the JOIN criteria is what you need to know to use JOINS properly.

Table 1   Table 2
              
              
              
              
              

The simplest JOIN is also the simplest one to write by mistake and it is only sometimes useful.

   CREATE TABLE numbers ( one integer, two integer);
   CREATE TABLE letters ( a text, b text, c text);
   INSERT INTO numbers VALUES (1,2);
   INSERT INTO numbers VALUES (3,4);
   INSERT INTO letters VALUES ('a','b');
   INSERT INTO letters VALUES ('c','d');
   INSERT INTO letters VALUES ('e','f');
   
   SELECT numbers.one, numbers.two, letters.a, letters.b, letters.c
   FROM numbers, letters;

By not specifying any JOIN either in the WHERE clause or in the FROM clause, a Cartesian product is created. If the number table had 2 rows and the letter table had 3 rows, then the result of this query would be 2 * 3 rows. Each of the 2 rows in numbers will create 3 rows in the result.

    one | two | a | b | c 
   -----+-----+---+---+---
      1 |   2 | a | b | 
      1 |   2 | c | d | 
      1 |   2 | e | f | 
      3 |   4 | a | b | 
      3 |   4 | c | d | 
      3 |   4 | e | f | 
   (6 rows)

If you really meant to do this type of join, you may want to write it using the explicit CROSS JOIN so that you can document your intentions. Otherwise someone might fix for you.

   SELECT numbers.one, numbers.two, letters.a, letters.b, letters.c
   FROM numbers CROSS JOIN letters;

Usually, the rows in one table are linked to one or more rows in the joined table by designating a column in each that is the same or a condition which is true.

Lets us first create some sample data to look at simple examples of the other kinds of JOINS. We have a table of rugby events, a table of teams and a table of matches played at events. Notice we are entering a match for which an event does not exist and a team with no matches. Normally for this schema, we'd define FOREIGN KEYS to ensure that all matches had teams and events.

   -- DROP TABLE events; DROP TABLE teams; DROP TABLE matches;
   CREATE TABLE events ( e_id INTEGER, e_name TEXT, e_date DATE);
   CREATE TABLE teams ( t_id INTEGER, t_name TEXT, t_loc TEXT);
   CREATE TABLE matches (
      t1_id TEXT,       
      t2_id TEXT,
      t1_score INTEGER,
      t2_score INTEGER,
      e_id INTEGER);
   
   INSERT INTO teams VALUES (1, 'Twin Cities Amazons', 'Minnesota');
   INSERT INTO teams VALUES (2, 'Mudhens', 'Washington');
   INSERT INTO teams VALUES (3, 'Blues', 'California');
   INSERT INTO teams VALUES (4, 'Stingers', 'Maryland');
   INSERT INTO teams VALUES (5, 'Valkaries', 'Minnesota');
   INSERT INTO events VALUES (1, '2002 Division 1', '10/15/02');
   INSERT INTO events VALUES (2, '2003 Division 1', '10/20/03');
   INSERT INTO events VALUES (3, '2004 Division 1', '10/20/04');
   INSERT INTO matches VALUES (1,2, 12, 10, 1);
   INSERT INTO matches VALUES (3,4, 35, 16, 1);
   INSERT INTO matches VALUES (1,3, 11, 25, 1);
   INSERT INTO matches VALUES (1,3, 18, 30, 2);
   INSERT INTO matches VALUES (2,4, 1, 14, 2);
   INSERT INTO matches VALUES (3,4, 45, 2, 2);
   INSERT INTO matches VALUES (3,1, 10, 3, 4);

There are several kinds of JOINs. We've seen the CROSS JOIN. Now let us take a look at the others: INNER and OUTER. All of these join types are both concepts and SQL commands. That is, you can implement the JOINs with WHERE clauses and subselects and UNIONS or with explicit JOIN statements.

The explicit form of JOINS use NATURAL, ON or USING to say how the rows tables in the table should be joined.

An INNER JOIN shows all rows in the first table which have corresponding rows in the second table. If there is a row in the first table without a counter part in the second it is not shown. The same is true for the second table.

An OUTER JOIN is one where if there is not a matching join column in one of the tables, the NULL values are put into the columns that would have represented the other table. There are three variations of outer joins: LEFT, RIGHT and FULL. In a LEFT OUTER JOIN, all of the columns and rows from the left table are shown, but where there is not a corresponding entry in the right table, those columns are set to NULL. In a RIGHT OUTER JOIN, it is the same except that all of the columns and rows in the right table are shown and the left table values may be NULL. A FULL OUTER JOIN is a combination of the LEFT and RIGHT JOIN. All of the columns and rows for both tables are shown and where there is not a corresponding entry in the left or the right table, those columns are set to NULL.
Inner JoinLeft Outer JoinRight Outer JoinFull Outer Join
Table 1   Table 2
              
              
              
              
              
Table 1   Table 2
              
              
              
              
              
Table 1   Table 2
              
              
              
              
              
Table 1   Table 2
              
              
              
              
              
KEY: Table 1Join Column Table 2NULL Column
The following statements show all of the different ways to perform an INNER JOIN. They show how to perform an INNER JOIN using a WHERE clause and using ON, USING and NATURAL. The JOIN links the matches' event id to the event's event id and so brings in the event name and date of the match. Notice that the match for event 3 is not shown. An INNER JOIN only shows rows where there is a match on the join column(s).

With the WHERE clause, just specify equality of the column(s) you wish to join on. With a JOIN USING you specify the common names of columns in both tables. A NATURAL join is a short hand for JOIN USING which specifies that the join should be on all of the commonly named columns. A JOIN ON allows you to specify an expression as the join criteria. This helps when the join column names are different in the two tables or if you need to add more complex logic.

   select m.t1_id, m.t2_id, m.t1_score, m.t2_score, e.e_name, e.e_date
   from matches m, events e
   where m.e_id = e.e_id;
   
   select m.t1_id, m.t2_id, m.t1_score, m.t2_score, e.e_name, e.e_date
   from matches m JOIN events e USING (e_id) ; 
   
   select m.t1_id, m.t2_id, m.t1_score, m.t2_score, e.e_name, e.e_date
   from matches m NATURAL JOIN events e;
   
   select m.t1_id, m.t2_id, m.t1_score, m.t2_score, e.e_name, e.e_date
   from matches m JOIN events e ON (m.e_id = e.e_id) ; 
   
    t1_id | t2_id | t1_score | t2_score |     e_name      |   e_date   
   -------+-------+----------+----------+-----------------+------------
    1     | 2     |       12 |       10 | 2002 Division 1 | 2002-10-15
    3     | 4     |       35 |       16 | 2002 Division 1 | 2002-10-15
    1     | 3     |       11 |       25 | 2002 Division 1 | 2002-10-15
    1     | 3     |       18 |       30 | 2003 Division 1 | 2003-10-20
    2     | 4     |        1 |       14 | 2003 Division 1 | 2003-10-20
    3     | 4     |       45 |        2 | 2003 Division 1 | 2003-10-20
   (6 rows)
You can have more than two table joined in using any JOIN type. (And of course, each table could also be a subquery, but that is a different article.) Because the column names are different for the team ids in the matches table and the teams table, we used a JOIN ON condition. This could also have been a WHERE condition, but it could not have been a USING condition or a NATURAL join.

   select t1.t_name as team_one, t2.t_name as team_two,
   m.t1_score, m.t2_score, e.e_name
   from matches m JOIN events e USING (e_id) 
      JOIN teams t1 ON (m.t1_id = t1.t_id)
      JOIN teams t2 ON (m.t2_id = t2.t_id); 
   
         team_one       | team_two | t1_score | t2_score |     e_name      
   ---------------------+----------+----------+----------+-----------------
    Twin Cities Amazons | Mudhens  |       12 |       10 | 2002 Division 1
    Twin Cities Amazons | Blues    |       11 |       25 | 2002 Division 1
    Blues               | Stingers |       35 |       16 | 2002 Division 1
    Twin Cities Amazons | Blues    |       18 |       30 | 2003 Division 1
    Mudhens             | Stingers |        1 |       14 | 2003 Division 1
    Blues               | Stingers |       45 |        2 | 2003 Division 1
   (6 rows)

An example of a LEFT OUTER JOIN would be to select all of the matches for all of the events. We are using the same query as the one above, however we are specifying a LEFT join between matches and events. This should show us the all of matches, even if there is no qualifying event. In this case, we should see the match between the Blues and the Amazons with event id 4.

   select t1.t_name as team_one, t2.t_name as team_two,
      m.t1_score, m.t2_score, e.e_name
   from matches m LEFT OUTER JOIN events e USING (e_id)
      JOIN teams t1 ON (m.t1_id = t1.t_id)
      JOIN teams t2 ON (m.t2_id = t2.t_id);

         team_one       |      team_two       | t1_score | t2_score |     e_name      
   ---------------------+---------------------+----------+----------+-----------------
    Blues               | Twin Cities Amazons |       10 |        3 | 
    Twin Cities Amazons | Mudhens             |       12 |       10 | 2002 Division 1
    Twin Cities Amazons | Blues               |       11 |       25 | 2002 Division 1
    Twin Cities Amazons | Blues               |       18 |       30 | 2003 Division 1
    Mudhens             | Stingers            |        1 |       14 | 2003 Division 1
    Blues               | Stingers            |       35 |       16 | 2002 Division 1
    Blues               | Stingers            |       45 |        2 | 2003 Division 1
   (7 rows)

A RIGHT OUTER JOIN select all of the rows in the second table and sets columns in the first table to NULL where there is no match. Joins are commutative so the following query should give the same results as the one above by switching LEFT for RIGHT and matches for events. Try it.

   select t1.t_name as team_one, t2.t_name as team_two,
      m.t1_score, m.t2_score, e.e_name
      from events e RIGHT OUTER JOIN matches m USING (e_id)
         JOIN teams t1 ON (m.t1_id = t1.t_id)
         JOIN teams t2 ON (m.t2_id = t2.t_id);

A FULL OUTER JOIN is a combination of a LEFT and RIGHT OUTER JOIN. A FULL OUTER JOIN could be used to show all of the matches and all of the events whether or not there are corresponding events or matches. In this query we expect to see both the match without an event as above and the event without matches (2004 Division 1). Again we will just modify the original query. The difference here is that the two joins on the teams table will also have to be changed to FULL OUTER JOINS in order to see any event rows without matches. If we do not make this change, then even though the matches and events are a full join, the two team joins are INNER JOINS and eliminate rows which do not match. Note also, that by making this multi-join query all use FULL OUTER JOINS the team with no matches will also be shown.

   select t1.t_name as team_one, t2.t_name as team_two,
      m.t1_score, m.t2_score, e.e_name
      from matches m FULL OUTER JOIN events e USING (e_id)
         FULL OUTER JOIN teams t1 ON (m.t1_id = t1.t_id)
         FULL OUTER JOIN teams t2 ON (m.t2_id = t2.t_id);
   
         team_one       |      team_two       | t1_score | t2_score |     e_name      
   ---------------------+---------------------+----------+----------+-----------------
    Blues               | Twin Cities Amazons |       10 |        3 | 
    Twin Cities Amazons | Mudhens             |       12 |       10 | 2002 Division 1
    Twin Cities Amazons | Blues               |       11 |       25 | 2002 Division 1
    Twin Cities Amazons | Blues               |       18 |       30 | 2003 Division 1
    Mudhens             | Stingers            |        1 |       14 | 2003 Division 1
    Blues               | Stingers            |       35 |       16 | 2002 Division 1
    Blues               | Stingers            |       45 |        2 | 2003 Division 1
                        | Valkaries           |          |          | 
    Stingers            |                     |          |          | 
    Valkaries           |                     |          |          | 
                        |                     |          |          | 2004 Division 1
   (11 rows)
Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW