Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

15-Aug-2005 Issue: 116

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Vacuuming, More Than You Wanted To Know.
Why vacuum?	30-Jul-2005

If you ever ask a question about query speed on the postgresql mailing lists (www.postgresql.org) or IRC (#postgresql freenode.net), the first thing that you will be asked is whether or not you have vacuumed recently. Regular vacuums with analyze are the first step in query optimization and it cannot be bypassed. Once proper vacuuming is in place, then detailed query analysis is possible.

PostgreSQL has No Overwrite Storage. This means that when deletes occur, the row is simply marked as deleted and updates are implemented as a delete mark on the old row and the insertion of a new row.

No Overwrite Storage was in the initial design of UCB Postgres and has two benefits and The first benefit is speed. Lazy deletes and fast inserts made the system go faster. The second is/was timetravel. In UCB postgres and Illustra it was possible to query the data at a point in time based on the marked deleted rows. Alas, PostgreSQL has removed this feature. (It can still be done, but not as a built-in feature.)

Other databases do not have no overwrite storage, for example, Informix, Oracle and mysql. These database do inplace updates if possible and delete rows as they go. The benefits of no overwrite storage and vacuum vs. overwrite storage has been fodder for grad student dissertations and can be argued ad infinitum. This has never been a make or break feature for choosing a database, however odd vacuuming seems to people used to other databases.

Vacuuming has three forms. Plain vacuum, vacuum analyze and vacuum full. Analyze can also be run without vacuum. Do not confuse vacuum analyze or analyze alone with explain analyze.

Plain vacuum returns the deleted rows to the free space maps, enabling reuse of the space. Usually run with analyze.

Analyze collects statistics on the tables. The statistic sampling is configurable. Running analyze without vacuum is the exception to the standard operating procedure but sometimes it is helpful.

The primary purpose of gathering statistics is for query optimization. It samples the tables and collects information about the number and widths of rows in a table as well as the volatility of a table. The PostgreSQL query optimizer depends on the table statistics to re-order queries and joins in the best way. One simple case, for example, is that it knows the sequential scan of a small table is faster than an index lookup on the same table.

Vacuum full is a heavier operation that restructures the data pages to eliminate gaps. There is debate about the usefulness of vacuum full. Standard belief is that proper vacuum timing should eliminate the necessity of doing a vacuum full. But people may argue with me on that issue as well. Vacuum full also requires access to table locks. This is not always possible of very active systems.

Vacuum is usually run periodically from a cronjob. The proper timing vacuuming is an art. Databases with high rates of inserts and deletes require vacuuming often. Some very high activity databases vacuum every few minutes. Databases which do not delete records do not need to be vacuumed as often, but do need vacuuming in order for the statistics to be somewhat accurate. They can be vacuumed daily. Readonly databases do not need to be vacuumed except once when the data is loaded. Archival databases can be vacuumed when large amounts of data is added or changed and then left as is.

There is an auntovacuum process which runs continually, in a less intensive operation. The reports of the stability of autovacuum range from wonderful to it's broken! Since I have not had a chance to run my own tests on autovacuum, I do not currently recommend it, but am open to the possibilities.

Contributors: elein at varlena.com

Dropping and Recreating Constraints
Reconstructing Constraints 14-Aug-2005

Sometimes for various reasons you may want to drop and recreate your foreign key constraints. You may want to drop them during a database load and re-create them after the load completes in order to load faster. Or you may want to list and review them. It is also possible someone altered a table mistakenly and a foreign key was added twice. You would want to be able to find and drop these duplicate constraints.

To list the constraints by table you can use the following select:

select n.nspname || '.' || c.relname as table_name,
   con.conname as constraint_name,
   pg_get_constraintdef( con.oid, false) as constraint_def
    from pg_constraint con
         join pg_namespace n on (n.oid = con.connamespace)
         join pg_class c on (c.oid = con.conrelid)
   where con.conrelid != 0 and
     con.contype = 'f';

This query looks at the constraints and joins in the namespace of the constraint (connamespace) and the table name (conrelid). It filters on constraints on tables (conrelid != 0) and on foreign key constraints (contype = 'f'). The contraints on table qualification is redundant, however, it is useful to know about.

Given this query is is extremely easy to write a shell script which creates a file of CREATE CONSTRAINT and DROP CONSTRAINT statements. This script which follows takes a database name and produce two files: one to create constraints and one to drop constraints. The resulting files can then be examined and run as appropriate.

	#!/bin/bash
	# create create/drop foreign key constraints
	# input database name
	# output to db_create_fkeys.out and db_drop_fkeys.out.
	#
	[ -z $1 ] && ( echo "USAGE: constraints.sh db_name"; exit 1; );
	db=$1
	psql -At  $db << END
	\o ${db}_create_fkeys.sql
	select 'ALTER TABLE ' ||
   	n.nspname || '.' || c.relname || ' ' ||
   	'ADD CONSTRAINT ' || con.conname || ' ' ||
   	pg_get_constraintdef( con.oid, false) || ';' as create_constr
    	from pg_constraint con
         	join pg_namespace n on (n.oid = con.connamespace)
         	join pg_class c on (c.oid = con.conrelid)
   	where con.conrelid != 0
     	and n.nspname = 'public'
     	and con.contype = 'f';
	\o ${db}_drop_fkeys.sql
	select 'ALTER TABLE ' ||
   	n.nspname || '.' || c.relname || ' ' ||
   	'DROP CONSTRAINT ' || con.conname || ';'
    	from pg_constraint con
         	join pg_namespace n on (n.oid = con.connamespace)
         	join pg_class c on (c.oid = con.conrelid)
   	where con.conrelid != 0
     	and n.nspname = 'public'
     	and con.contype = 'f';
	END

Warning: It is contrary to all good database form to run the database without the foreign keys defined properly. This enables referential integrity to become corrupt endangering the consistency of your database.

Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top