varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

28-Nov-2005 Issue: 124

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Simple and Not So Simple Selects
Simple Selects 26-Nov-2005

The problem at hand is to select the customers delinquent on their payments. So we want to select the customers where the customers' purchases are 30 days old. The customer table and the purchases tables are what we want.

The query that should be in your mind first off follows. Unfortunately, the best query does not always come to mind first.

	SELECT name, address 
	FROM customers c JOIN purchases ON (purchaser_id = customer_id)
	WHERE purchase_date > current_date-30;
This query creates a set of customers and purchases and then filters in the ones with purchases 30 days ago. It leverages the JOIN capability of the database and lets the optimizer do the filtering.

This query does the same thing and gives the same results. But is unnecessarily ugly, complicated and slow.

	SELECT name, address 
	FROM customers 
	where EXISTS (SELECT 1 
		FROM purchases 
		WHERE purchaser_id=customers.id AND purchase_date > current_date-30 
		LIMIT 1);
The statement does this: look at each customer, then see if that customer has a purchase greater than 30 days ago. This type of statement construction is a result of not really understanding and leveraging JOINs. This is a common misunderstanding with people who are used to programming in loops and not in sets.

A variation on the above query uses IN.

	SELECT name, address 
	FROM customers 
	WHERE id IN (SELECT purchaser_id 
		FROM purchases 
		WHERE purchase_date > current_date-30 LIMIT 1);
This query is no better than the previous query. EXISTS and IN are comparable functionality. Both of these functions are correct but are less than optimal queries. They each do a select of purchases for each customer. (Note also that in 7.3 and earlier IN is particularly slower than EXISTS.)

When constructing a SELECT statement, first build the set of data that you want to return as well as the data you want to qualify on. Use the JOIN statement to bring two (or more) tables together to construct that set. The target list will be the data you want to return and the condition should identify the primary condition of the query.

Contributors: elein at varlena.com andrewsn at freenode.net
Constructing Aggregates
Constructing Aggregates 26-Nov-2005

In my family, we are mostly women, kids and pets. This is a picture of the family tree from my mother with three daughters and the sisters' kids and all of our pets. The people are blue the pets are brown. (Names have been changed to protect the innocent :)

The tables we have are these:

What I want to know is how may kids and pets each person has. So that query is easy.

	SELECT people.login, COUNT(pets.name) AS pets, COUNT(kids.name) AS kids
	FROM people JOIN pets ON (login=owner) JOIN kids ON (login = parent1)
	GROUP BY people.login;
	   login   | pets | kids
	-----------+------+------
	 ann       |    3 |    3
	 rachel    |    3 |    3
	 cora      |    1 |    1
	 stephanie |    1 |    1
	(4 rows)
Whoa. That was terribly wrong. Ann and Rachel only have one pet each. But they do each have three kids. For some reason, Cora and Stephanie were right--one pet and one kid each. But Cat and I am missing completely. Oh. That must be because I should have used a left outer join. Let's try again.
	SELECT people.login, COUNT(pets.name) AS pets, COUNT(kids.name) AS kids
	FROM people
	LEFT OUTER JOIN pets ON (login=owner) LEFT OUTER JOIN kids ON (login = parent1)
	group by people.login;
	   login   | pets | kids
	-----------+------+------
	 ann       |    3 |    3
	 cat       |    1 |    0
	 CJ        |    0 |    0
	 cora      |    1 |    1
	 david     |    0 |    0
	 elein     |    4 |    0
	 laura     |    0 |    0
	 rachel    |    3 |    3
	 stephanie |    1 |    1
	(9 rows)
Well, OK. That brought in the rest of the gang. But Ann and Rachel's counts are still wrong. What is going on? Let us look at the data in kids and people separately.
	SELECT login, count(pets.name )
	FROM people LEFT OUTER JOIN pets ON (login=owner)
	GROUP BY login;
	   login   | count
	-----------+-------
	 ann       |     1
	 cat       |     1
	 CJ        |     0
	 cora      |     1
	 david     |     0
	 elein     |     4
	 laura     |     0
	 rachel    |     1
	 stephanie |     1
	
	SELECT login, COUNT(kids.name )
	FROM people LEFT OUTER JOIN kids ON (login=parent1)
	GROUP BY login;
	   login   | count
	-----------+-------
	 ann       |     3
	 cat       |     0
	 CJ        |     0
	 cora      |     1
	 david     |     0
	 elein     |     0
	 laura     |     0
	 rachel    |     3
	 stephanie |     1
These numbers are correct in every case. What is really happening when both kids and pets are brought in together? Let us remove the COUNT and look at the data.
	SELECT login, pets.name AS pet, kids.name AS kid
	FROM people LEFT OUTER JOIN pets on (login=owner)    
	LEFT OUTER JOIN kids ON (login = parent1);
	   login   |    pet    |    kid
	-----------+-----------+-----------
	 ann       | buster    | elein
	 ann       | buster    | rachel
	 ann       | buster    | stephanie
	 cat       | edward    |
	 CJ        |           |
	 cora      | bowwow    | CJ
	 david     |           |
	 elein     | geordie   |
	 elein     | belle     |
	 elein     | bubbajack |
	 elein     | leaf      |
	 laura     |           |
	 rachel    | kiwi      | cat
	 rachel    | kiwi      | laura
	 rachel    | kiwi      | david
	 stephanie | chewy     | cora
	(16 rows)
Oh, my. The pets are repeated for every kid. This is an inadvertent Cartesian product of kids and pets. This accounts for both the correct and incorrect answers above. Ann and Rachel each have one pet but three kids. But the pets are repeated for each kid and it looks like they have more pets than they do.

Because there is no correlation between pets and kids we will have to JOIN them to people separately. Abandoning the JOIN syntax for the where clause we join in the people table twice:

	SELECT p1.login, COUNT(k.name) AS kids, COUNT(pt.name) AS pets
	FROM kids k , pets pt, people p1, people p2
	WHERE k.parent1 = p1.login AND pt.owner = p2.login
	AND p1.login = p2.login
	GROUP BY p1.login;
	   login   | kids | pets
	-----------+------+------
	 ann       |    3 |    3
	 rachel    |    3 |    3
	 cora      |    1 |    1
	 stephanie |    1 |    1
	(4 rows)
Good grief. That was wrong, too! Let us back up a bit. We want to join people to pets and kids separately and then we want the counts in the resulting target list. So let us separate things like that.
	SELECT login, kids, pets
	FROM
	   (SELECT login, COUNT(kids.name) AS kids
	   FROM people LEFT OUTER JOIN kids ON (login=parent1)
	   group by login) k
	JOIN
	   (SELECT login, COUNT(pets.name) AS pets
	   FROM people LEFT OUTER JOIN pets ON (login=owner)
	   GROUP BY login ) p 
	USING (login);
	   login   | kids | pets
	-----------+------+------
	 ann       |    3 |    1
	 cat       |    0 |    1
	 CJ        |    0 |    0
	 cora      |    1 |    1
	 david     |    0 |    0
	 elein     |    0 |    4
	 laura     |    0 |    0
	 rachel    |    3 |    1
	 stephanie |    1 |    1
	(9 rows)
Aha! That is correct. Finally. The two issues we addressed were the inadvertent Cartesian product multiplying the number of pets and the JOIN type leaving out the people with no kids or no pets. We did this by using LEFT OUTER JOIN first to get back the missing people. Then we looked at what the real data was instead of the counts. This told us about the Cartesian product. When we phrased the question correctly, we found the correct methodology.

Test your aggregates, especially if you are using multiple aggregates in one statement. More often than not, aggregates on different things cannot be done without subselects. Use subselects sparingly, but you will find them invaluable for statements with multiple aggregates.

Contributors: elein at varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW