|
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
First, as you can probably tell, I am not publishing my column weekly as I had previously done. I intend to keep publishing as time and resources permit. Ideally this will be every other week. I appreciate your patience. Next, I would like to hear from people about their favorite or most helpful GeneralBit article. I am creating talks and writing containing collections of GeneralBits articles and want to use the ones that are really helpful, not the ones I think are helpful. Send me your suggestion to me. Your feedback is always appreciated. Finally, I have converted all of the article files to a php format. You may have to fix your bookmarks. In general you can just delete the html suffix.
In a production environment it is critical to maintain control over the structure of your databases. If everyone has the ability to alter tables, change keys, add tables willy-nilly then the integrity of your data structures will deteriorate. If your site involves more than one database with the same schema you can easily get the databases schemas out of sync, causing both software bugs and data problems. Just think of the chaos if you had a thousand or more databases with the same schema to maintain (some people do). (I use the word schema to mean your database structure, not a namespace within your database.) I am going to outline a simple, common sense way to manage database updates. Use it as a template and tune it to the needs of your organization or rewrite it using the languages of your choice. I'll use bash along with psql. Also feel free to change the names of objects, but keep in the spirit of the suggestions. Note: This model is not directly applicable to slony clusters. With slony clusters the DDL is executed via a slonik script and only to one node. The first requirement is that all production database schemas must be in sync. If they are not in sync you need to bring them into sync. Doing that is cruelly difficult most of the time. You want control over your environment so that you never, ever need to diff schemas. (Bringing your databases into sync is not the subject of this article, but may be the subject of a future article if there is need and interest.) The first rule is that all schema change statements (DDL) must be edited into a file, say updateN.sql and applied as a SQL script. They are never issued via a GUI tool or by typing in psql directly. Just forget about those methods. They are not repeatable or scaleable. The SQL script should be written and tested on a development database before considered for application to production databases, of course. You should have a directory updates in your source tree. Everything should be under source code control. In that directory, you will have a few objects. The .../Update Directory
The Update ProcessesPre-Requisites:
For each updateN.sql
Apply change N to each host/machine using applyupdate.sh. Below is the crux of the work by applyupdate.sh. See the actual script for details. We first apply the change using psql and capture STDIN and STDERR into an output file. Then we diff the output file with the known good output file. If they are different, raise an error. Log the update to the updatelog, successful or not. psql -h $host $db < update_$N.sql > update_$N.out 2>&1 diff update_$N.out Canon/update_$N.out if [ $? != 0 ] ; then ERROR updating this host/machine info available here now: host/machine, update_$N.out, timestamp HANDLE ERROR: log, email, handle store lists fi echo `date` $host $db update_$N >> updatelog.out; cat update_N.out >> updatelog.out; echo >> updatelog.out; There are many variations and niceties that could be added to this basic and ordinary process. A script could be written to update the test db to a particular update or you can keep the last one around if you are confident no one has messed with it. A change or wrapper script to applyupdate.sh should allow you to run a particular update on all of your production databases automatically. This model assumes a central machine has access to all other machines. You can change the method of connectivity to meet your network model. With regards to update files: Sometimes, in spite of best efforts you will have to back out a change. Do it with a subsequent update script. Don't "fix" a script you have already applied. That is done and you cannot go back. If you find you are having to backout changes, make it a requirement that each update script have a corresponding undo script. This requirement is a bit excessive, however, it may encourage more care when writing and testing the update script in the first place. Logging can be distributed as you see fit. Perhaps you only want to log the update script output if it is bad. Keep in mind that you need one place to look to see the latest update of all machines. Really. You will want that. It is also possible to have a table in each database recording the timestamp and update number of each update run to validate the update machine by machine. Error handling is important. If you are watching each update, you can keep the exit on error condition. If you are updating many machines, you may want to collect the errors and address them each afterwards. You can automate the error raising, but you will still have to look at each machine with errors. Luckily you have tested well, so there should be no problems :) Don't forget to check in your changes.
|
||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |