Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

30-Jan-2006 Issue: 127

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Favorite Articles
Administrivia	29-Jan-2006

First, as you can probably tell, I am not publishing my column weekly as I had previously done. I intend to keep publishing as time and resources permit. Ideally this will be every other week. I appreciate your patience.

Next, I would like to hear from people about their favorite or most helpful GeneralBit article. I am creating talks and writing containing collections of GeneralBits articles and want to use the ones that are really helpful, not the ones I think are helpful. Send me your suggestion to me. Your feedback is always appreciated.

Finally, I have converted all of the article files to a php format. You may have to fix your bookmarks. In general you can just delete the html suffix.

Editor: elein at varlena.com

Organizing Database Schema Updates
Practical Database Schema Updates 27-jan-2006

In a production environment it is critical to maintain control over the structure of your databases. If everyone has the ability to alter tables, change keys, add tables willy-nilly then the integrity of your data structures will deteriorate. If your site involves more than one database with the same schema you can easily get the databases schemas out of sync, causing both software bugs and data problems. Just think of the chaos if you had a thousand or more databases with the same schema to maintain (some people do). (I use the word schema to mean your database structure, not a namespace within your database.)

I am going to outline a simple, common sense way to manage database updates. Use it as a template and tune it to the needs of your organization or rewrite it using the languages of your choice. I'll use bash along with psql. Also feel free to change the names of objects, but keep in the spirit of the suggestions.

Note: This model is not directly applicable to slony clusters. With slony clusters the DDL is executed via a slonik script and only to one node.

The first requirement is that all production database schemas must be in sync. If they are not in sync you need to bring them into sync. Doing that is cruelly difficult most of the time. You want control over your environment so that you never, ever need to diff schemas. (Bringing your databases into sync is not the subject of this article, but may be the subject of a future article if there is need and interest.)

The first rule is that all schema change statements (DDL) must be edited into a file, say updateN.sql and applied as a SQL script. They are never issued via a GUI tool or by typing in psql directly. Just forget about those methods. They are not repeatable or scaleable. The SQL script should be written and tested on a development database before considered for application to production databases, of course.

You should have a directory updates in your source tree. Everything should be under source code control. In that directory, you will have a few objects.

The .../Update Directory

update[1..N].sql files
applyupdate.sh and any helper scripts
schema checkpoints:
/Canon directory
update.log

The Update Processes

Pre-Requisites:

Ensure all production databases are in sync.
Create directories .../Updates and .../Updates/Canon under source code control.
Pick a production database and dump its schema to .../Update/baseschema0.sql

For each updateN.sql

1. Create a temporary database.	`createdb tmp_db;`
2. Load the schema	`psql tmp_db;`
3. For updates 1 to (n-1) **	`psql tmp_db < update_n.sql`
4. Run update_n and save the result:	`psql tmp_db < update_N.sql > update_N.out 2>&1`
5. Check the output carefully.	If it is not perfect, fix it and go back to step 1.
6. Save the good output in the Canon directory.	`mv update_N.out Canon/`

applyupdate.sh

the actual script

We first apply the change using psql and capture STDIN and STDERR into an output file. Then we diff the output file with the known good output file. If they are different, raise an error. Log the update to the updatelog, successful or not.

      psql -h $host $db < update_$N.sql > update_$N.out 2>&1
      diff update_$N.out Canon/update_$N.out
      if [ $? != 0 ] ; then
         ERROR updating this host/machine
            info available here now: host/machine, update_$N.out, timestamp
            HANDLE ERROR: log, email, handle store lists
      fi
		echo `date` $host $db update_$N >> updatelog.out;
		cat update_N.out >> updatelog.out;
		echo >> updatelog.out;

There are many variations and niceties that could be added to this basic and ordinary process. A script could be written to update the test db to a particular update or you can keep the last one around if you are confident no one has messed with it. A change or wrapper script to applyupdate.sh should allow you to run a particular update on all of your production databases automatically. This model assumes a central machine has access to all other machines. You can change the method of connectivity to meet your network model.

With regards to update files: Sometimes, in spite of best efforts you will have to back out a change. Do it with a subsequent update script. Don't "fix" a script you have already applied. That is done and you cannot go back. If you find you are having to backout changes, make it a requirement that each update script have a corresponding undo script. This requirement is a bit excessive, however, it may encourage more care when writing and testing the update script in the first place.

Logging can be distributed as you see fit. Perhaps you only want to log the update script output if it is bad. Keep in mind that you need one place to look to see the latest update of all machines. Really. You will want that. It is also possible to have a table in each database recording the timestamp and update number of each update run to validate the update machine by machine.

Error handling is important. If you are watching each update, you can keep the exit on error condition. If you are updating many machines, you may want to collect the errors and address them each afterwards. You can automate the error raising, but you will still have to look at each machine with errors. Luckily you have tested well, so there should be no problems :)

Don't forget to check in your changes.

Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top