Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

11-July-2005 Issue: 115

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

General Bits
Schedules	9-Jul-2005

Again I extend my apologies for missing two weeks of new General Bits columns. From here to the foreseeable future I expect to miss a Monday publication occasionally. Life and paid work call. I appreciate your patience and encourage your perusal of the archives and Tidbits.

Contributors: elein at varlena.com

Multiple Conditional Aggregates
Originally 70-3 and 72-3 9-Jul-2005

This is a re-edited version of articles 70-3 and 72-3.

In my rugby matches table I track matches played and scores. What I want to find out is this. For each team, how many matches were played and how many matches were won. There is a catch here in that a team many be mentioned in the matches table as either team1 or team2. The matches table looks like this:

                                Table "public.tmatches"
    Column  |  Type   |                          Modifiers 
   ---------+---------+-------------------------------------------------------------
    matid   | integer | not null default nextval('public.tmatches_matid_seq'::text) 
    team1   | text    |
    team2   | text    |
    score1  | integer |
    score2  | integer |
    venue   | text    |
    descr   | text    |

To select the number of matches by team, one would "drive" the query by selecting the team name, sname in this case, from the teams table and plug it into a subquery.

	select sname,
    ( select count(matid) from tmatches m where team1 = sname or team2 = sname ) as matches
	from teams;

To select the number of wins by team, you would use the same strategy with a condition indicating winning.

   select sname,
    ( select count(matid) from tmatches m 
      where (team1 = sname and score1 > score2) or (team2 = sname and score2 > score1)
    ) as wins
   from teams;

Because the queries have the same driver, they can be combined into one query:

	select sname,
    ( select count(matid) from tmatches m where team1 = sname or team2 = sname ) as matches,
    ( select count(matid) from tmatches m 
      where (team1 = sname and score1 > score2) or (team2 = sname and score2 > score1)
    ) as wins
	from teams;

Another way to think of the logic is to push the conditionals into the aggregates using sum instead of count. While we were at it, we added a column to track the losses.

   select sname, 
   sum(case when sname in (team1,team2) then 1 else 0 end) as tmatches,
   sum(case when sname = team1 and score1 > score2 then 1 
            when sname = team2 and score2 > score1 then 1 else 0 end) as wins, 
   sum(case when sname = team2 and score1 > score2 then 1 
            when sname = team1 and score2 > score1 then 1 else 0 end) as losses
   from teams join tmatches on (team1 = sname or team2 = sname ) 
   group by sname order by sname;

In this last query, the driver is a join on teams and matches in order to bring in the matches columns for reference in sum(). Group by is necessary in this case since the sum() are in the same query as the selection of sname.

It would appear to be the case that this last query scales better and for some people it is easier to comprehend. EXPLAIN ANALYZE will tell you, in your cases, whether this strategy is the better.

Contributors: elein at varlena.com, Jeff Boes jboes at nexcerpt.com

Database Administration Checklist
Creating multiple machine installations 9-Jul-2005

It is a critical milestone when a development project goes into production with a database. From this time forward you have a least two, maybe three or more instances of your database to maintain consistently. This article addresses an ad hoc checklist of items to be considered for easier administration of databases and database installations.

You might look at this list and say, "What an order!". Do not be discouraged. You do not have to do everything at once. Trust that when each item is reviewed and acted upon, your overall job will be easier.

If you already have specific techniques to handle the various issues described here, by all means continue to use what works for you.

Configuration

Network connectivity
This small bullet can encompass a lot of work depending on the complexity of the network you are working within.
- Know what machines can accept connections from what machines Make a list. Put it in the documents or on a wiki. Draw a picture. This information will be invaluable.
- Know system logins and passwords to connect from one machine to another via ssh and scp. Do not publish this list. Keep it in a safe location as you do all passwords, right?
- Establish one machine, if at all possible given your security setup, that can connect to every database server. Database administration will be easier if done from a central location.
Machine Usage
Understand which machines are production, testing, development, etc., and who and what are affected by systems running on them. For example, changes may be made on development often with little notice to developers but changes to production machines must be scheduled to synchronize with application updates.
Configuration Files
- Establish a standard pg_hba.conf file.
- Establish a standard postgresql.conf file. You may want to create base postgresql.conf files for machines with different RAM, for example a 1gig and a 4gig RAM postgresql.conf file.
- It may be necessary to have a standard configuration files for your production system(s) different from your development system(s) and different again from your testing system(s).
- Keep copies of standard configuration files in source code control.
- Ensure each installation is running with standard configuration files.
Users
- Establish a list of postgres users, their passwords and roles. Do not publish the passwords :) Ensure all databases have a consistent set of users and known passwords.
Startup and PostgreSQL Logs
- Standardize the system start up scripts for all machines. These are usually the init.d scripts. Create standard ones for each OS being used. (Ideally all are on the same OS but the world is not ideal.)
- Only use the system startup/shutdown/reload to start, stop and reload. This will ensure that postgres is always started in a consistent way. For example the logging will be sent to the standard location instead of stdout or /dev/null.
- Save your log files on a different disk than your data directory. This will require a change to most startup scripts, but will ensure that you can read the logs when a disk goes bad.
- Rotate your log files. This will also require a change to most startup scripts. There are two methodologies for this:
  - pg_ctl start -D $PGDATA | /usr/sbin/rotatelogs $PGDATA/pglog 86400 2>&1
  - Use the log rotation available in 8.0 by setting the following in your postgresql.conf file.
```
		log_destination=stderr
		redirect_stderr=true
		log_directory=/var/log/pg_log
						
```

Maintenance

Vacuuming
- Create a cronjob to vacuum every database on every installation.
  Run these cronjobs as postgres to avoid colliding cronjobs run by various people.
- Document which machines run vacuum and vacuum analyze at what times.
  Getting vacuum timing perfect is an art and you will change the timing occasionally. Document those changes and reasons, too.
Backups without Replication or PITR
- Create a cronjob to dump every important database on every installation.
  Run these cronjobs as postgres to avoid colliding cronjobs run by various people.
  Important means, if the database becomes corrupted then people will want it back. The timing of the backups can be monthly, daily, hourly--depending on the amount of data you are prepared to lose and the system drag time it imposes.
  I recommend backing up to rotating files and depend on the timestamp of the file to get the latest. For example, dump to file1 the first, third, fifth times and then file2 at the second, fourth, sixth times.
- Document which machines run dumps at what times and where the dump files exist.
- Do a test run of restoring a dump file to a database (try development not production :) Notice the exact steps, commands and checks you perform. Document them.
Cleaning up logs
- Create a cronjob to remove old postgres logs.
  Run these cronjobs as postgres to avoid colliding cronjobs run by various people.
  The time and the determination of old depends on your uptime and the number of files you would like to keep around.
Schema Maintenance
This is perhaps one of the more difficult tasks you will be constantly doing as long as your system is both in development and production.
- Establish a baseline schema
  Get the schema of the database as it goes into production. To many this is a lost cause, but put a stake in the ground as soon as possible.
  To get the schema, use pg_dump -s database or use the scripts you have that built the database.
- Additive changes
  From this point on, all schema changes will be additive.
  - Create a separate SQL script for each request. The request may have more than one change in it. If data changes are required for populating new columns or other things relating to the changes, include that in the script.
  - Scripts contain block transactions if possible.
  - Keep the script in source code control.
  - Apply the changes in stages and in sync with developers
    Although many changes may not affect developers at all, sometimes they do affect the middleware and require application bounces. Talk to your developers to schedule application of schema changes.
    Changes should be applied to development databases first for developer testing, then testing for quality assurance testing and finally to production. (The stages depend on your organization's organization.) As sure as you are of your perfect typing and excellent SQL skills, don't do them all at once. You will be sorry if you do.
  - Rigorously track which level of schema update has been applied to which database.
Document, Document, Document
As you can tell, a key component of good administration is documentation. This will ensure that you remember what you did and how things are done, but also communicates that information to others. Communicating with others is crucial for the dead programmer problem. Not communicating makes you obnoxious, not powerful or "job secure".
Use the documentation method that suits you and your organization the best. All of this is internal documentation, not for customers so within your organization use the easiest way to write things down. If it is hard you won't do it. I recommend text files in source code control and wikis. You may prefer .doc files or something else.
Organize your information in a way it is easy to find. You should be able to answer the following questions quickly.
- What update is machine x on?
- Have there been any changes to the postgresql.conf file on machine y?
- Where is the log file for installation a on machine z?
- Where is the backup for database foo?
- Whats my password?
Well, you get the idea. Good Luck!
Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top

Configuration

Maintenance

Schema Maintenance

Document, Document, Document