General Bits
|
Schedules
|
9-Jul-2005
| |
Again I extend my apologies for missing two weeks of new
General Bits columns.
From here to the foreseeable future
I expect to miss a Monday publication occasionally.
Life and paid work call.
I appreciate your patience
and encourage your perusal of the
archives and Tidbits.
Contributors:
elein at varlena.com
Multiple Conditional Aggregates
|
Originally 70-3 and 72-3
|
9-Jul-2005
| |
This is a re-edited version of articles
70-3 and 72-3.
In my rugby matches table I track matches played and scores. What I want to find out
is this. For each team, how many matches were played and how many matches were won.
There is a catch here in that a team many be mentioned in the matches table as
either team1 or team2. The matches table looks like this:
Table "public.tmatches"
Column | Type | Modifiers
---------+---------+-------------------------------------------------------------
matid | integer | not null default nextval('public.tmatches_matid_seq'::text)
team1 | text |
team2 | text |
score1 | integer |
score2 | integer |
venue | text |
descr | text |
To select the number of matches by team, one would "drive" the query by selecting
the team name, sname in this case, from the teams table and plug it into a
subquery.
select sname,
( select count(matid) from tmatches m where team1 = sname or team2 = sname ) as matches
from teams;
To select the number of wins by team, you would use the same strategy with a condition
indicating winning.
select sname,
( select count(matid) from tmatches m
where (team1 = sname and score1 > score2) or (team2 = sname and score2 > score1)
) as wins
from teams;
Because the queries have the same driver, they can be combined into one query:
select sname,
( select count(matid) from tmatches m where team1 = sname or team2 = sname ) as matches,
( select count(matid) from tmatches m
where (team1 = sname and score1 > score2) or (team2 = sname and score2 > score1)
) as wins
from teams;
Another way to think of the logic is to push the conditionals into the aggregates using
sum instead of count. While we were at it, we added a column to track the losses.
select sname,
sum(case when sname in (team1,team2) then 1 else 0 end) as tmatches,
sum(case when sname = team1 and score1 > score2 then 1
when sname = team2 and score2 > score1 then 1 else 0 end) as wins,
sum(case when sname = team2 and score1 > score2 then 1
when sname = team1 and score2 > score1 then 1 else 0 end) as losses
from teams join tmatches on (team1 = sname or team2 = sname )
group by sname order by sname;
In this last query, the driver is a join on teams and matches in order
to bring in the matches columns for reference in sum(). Group by is
necessary in this case since the sum() are in the same query as the
selection of sname.
It would appear to be the case that this last query scales better and
for some people it is easier to comprehend. EXPLAIN ANALYZE will tell
you, in your cases, whether this strategy is the better.
Contributors:
elein at varlena.com, Jeff Boes jboes at nexcerpt.com
Database Administration Checklist
|
Creating multiple machine installations
|
9-Jul-2005
| |
It is a critical milestone when a development project
goes into production with a database. From this time
forward you have a least two, maybe three or more
instances of your database to maintain consistently.
This article addresses an ad hoc checklist of items to
be considered for easier administration of databases
and database installations.
You might look at this list and say, "What an order!".
Do not be discouraged. You do not have to do everything at once.
Trust that when each item is reviewed and acted upon,
your overall job will be easier.
If you already have specific techniques to handle the various
issues described here, by all means continue to use what works
for you.
Configuration
- Network connectivity
This small bullet can encompass a lot of work depending
on the complexity of the network you are working within.
- Know what machines can accept connections from what machines
Make a list. Put it in the documents or on a wiki. Draw a picture.
This information will be invaluable.
- Know system logins and passwords to connect from one
machine to another via ssh and scp. Do not publish this
list. Keep it in a safe location as you do all passwords, right?
- Establish one machine, if at all possible given
your security setup, that can connect to every database server.
Database administration will be easier if done from a central location.
- Machine Usage
Understand which machines are production, testing, development,
etc., and who and what are affected by systems running on them.
For example, changes may be made on development often with little
notice to developers but changes to production machines must
be scheduled to synchronize with application updates.
- Configuration Files
- Establish a standard pg_hba.conf file.
- Establish a standard postgresql.conf file. You may want
to create base postgresql.conf files for machines with
different RAM, for example a 1gig and a 4gig RAM postgresql.conf file.
- It may be necessary to have a standard configuration files
for your production system(s) different from your development system(s)
and different again from your testing system(s).
- Keep copies of standard configuration files in source code control.
- Ensure each installation is running with standard configuration files.
- Users
- Establish a list of postgres users, their passwords and roles.
Do not publish the passwords :) Ensure all databases have
a consistent set of users and known passwords.
- Startup and PostgreSQL Logs
- Standardize the system start up scripts for all machines. These are
usually the init.d scripts. Create standard ones for each OS being used.
(Ideally all are on the same OS but the world is not ideal.)
- Only use the system startup/shutdown/reload to start, stop and reload.
This will ensure that postgres is always started in a consistent way.
For example the logging will be sent to the standard location instead
of stdout or /dev/null.
- Save your log files on a different disk than your data directory.
This will require a change to most startup scripts, but will ensure
that you can read the logs when a disk goes bad.
- Rotate your log files. This will also require a change to most
startup scripts. There are two methodologies for this:
Maintenance
- Vacuuming
- Create a cronjob to vacuum every database on every installation.
Run these cronjobs as postgres to avoid colliding cronjobs run
by various people.
- Document which machines run vacuum
and vacuum analyze at what times.
Getting vacuum timing perfect is an art and you will change the timing
occasionally. Document those changes and reasons, too.
- Backups without Replication or PITR
- Create a cronjob to dump every important database on every installation.
Run these cronjobs as postgres to avoid colliding cronjobs run
by various people.
Important means, if the database becomes corrupted then people will want it back.
The timing of the backups can be monthly, daily, hourly--depending on the
amount of data you are prepared to lose and the system drag time it imposes.
I recommend backing up to rotating files and depend on the timestamp of
the file to get the latest. For example, dump to file1 the first, third, fifth
times and then file2 at the second, fourth, sixth times.
- Document which machines run dumps at what times and where the
dump files exist.
- Do a test run of restoring a dump file to a database (try development not
production :) Notice the exact steps, commands and checks you perform.
Document them.
- Cleaning up logs
- Create a cronjob to remove old postgres logs.
Run these cronjobs as postgres to avoid colliding cronjobs run
by various people.
The time and the determination of old depends on your uptime
and the number of files you would like to keep around.
Schema Maintenance
This is perhaps one of the more difficult tasks you will be constantly doing
as long as your system is both in development and production.
- Establish a baseline schema
Get the schema of the database as it goes into production. To many
this is a lost cause, but put a stake in the ground as soon as possible.
To get the schema, use pg_dump -s database or use the
scripts you have that built the database.
- Additive changes
From this point on, all schema changes will be additive.
- Create a separate SQL script for each request. The request may have
more than one change in it. If data changes are required for populating
new columns or other things relating to the changes, include that in the
script.
- Scripts contain block transactions if possible.
- Keep the script in source code control.
- Apply the changes in stages and in sync with developers
Although many changes may not affect developers at all, sometimes they
do affect the middleware and require application bounces. Talk to your
developers to schedule application of schema changes.
Changes should be applied to development databases first for developer
testing, then testing for quality assurance testing and finally to production.
(The stages depend on your organization's organization.) As sure as you
are of your perfect typing and excellent SQL skills, don't do them all at
once. You will be sorry if you do.
- Rigorously track which level of schema update has been applied
to which database.
Document, Document, Document
As you can tell, a key component of good administration is documentation.
This will ensure that you remember what you did and how things are done,
but also communicates that information to others. Communicating with
others is crucial for the dead programmer problem. Not communicating
makes you obnoxious, not powerful or "job secure".
Use the documentation method that suits you and your organization the
best. All of this is internal documentation, not for customers so
within your organization use the easiest way to write things
down. If it is hard you won't do it. I recommend text files in
source code control and wikis. You may prefer .doc files or something
else.
Organize your information in a way it is easy to find. You should be
able to answer the following questions quickly.
- What update is machine x on?
- Have there been any changes to the postgresql.conf file on machine y?
- Where is the log file for installation a on machine z?
- Where is the backup for database foo?
- Whats my password?
Well, you get the idea. Good Luck!
Contributors:
elein at varlena.com
|