Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

23-Feb-2004 Issue: 63

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Slony Overview
Replication Solution Review	21-Feb-2004

Slony is the Russian plural for elephant. It is also the name of the new replication project being developed by Jan Weick. The mascot for Slony, Slon, is a good variation of the usual Postgres elephant mascot created by Jan.

Slony-I, the first iteration of the project is an asynchronous replicator of a single master database to multiple slaves which in turn may have cascaded slaves. It will include all features required to replicate large databases with a reasonable number of slaves. Slony-I is targeted toward data centers and backup sites, implying that all nodes in the network are always available. Cascading slaves over a WAN enable remote replications while minimizing bandwidth, enabling better scalability.

The master is the primary database with which the applications interact. Slaves are replications or copies of the primary database. Since the master database is always changing, data replication is the system which enables secondary or slave databases to be updated as the master database is updated. Synchronous replication systems are those where the master and the slave are consistent exact copies. Asynchronous replication loosens that binding and allows the slave to copy transactions from the master, or roll forward, at its own pace.

Assume you have a primary site, with a server and a slave as backup server. And you create a remote backup center with again a main server and its backup slave. You don't want to transfer the all of the transactions twice over the WAN to keep both servers updated and you don't want to fail over to the backup location (in case of a full center loss) and start there without a backup already on standby. Instead the remote primary server is a direct slave replicating from the master over the WAN, while the remote secondary server is a cascaded slave replicating from the primary server via the LAN.

Slony is differentiated from other replication systems by its design goals. It was planned to be enable a few very important key features as a basis for implementing these design goals. An underlying theme to the design is to update only that which changes, enabling scalable replication for a reliable failover strategy.

The design goals for Slony are:
1. The ability to install, configure, and create a slave and let it join and catch up with a running database.

This is required so that both master and slaves can be replaced. This idea also enables cascading slaves which in turn adds scalability, limitation of bandwidth and proper handling of failover situations.

2. In the case any node fails, its functionality can be resumed by another one.

In the case of a failure of a slave which provides data to other slaves, the other slaves can continue to replicate from another slave or directly from the master.

In the case where the master fails, a slave can be promoted to master and all slaves will continue to replicate from that. The new master can inherit the old master's other slaves by enabling them to share their possibly future state. The new master can be rolled forward to the state of the the most up to date slave. Thus, you have real promotion to "master".
In other replication solutions, this roll forward of the new master is not possible. In those solutions, if a slave is promoted to master, any other slaves that exist must be rebuilt from scratch in order to synchronize with the new master correctly. A failover of a 1 Terabyte database leaves the new master with no failover of its own for quite a while.
The Slony design handles the case where multiple slaves may be at different synchronization times with the master and be able to resynchronize when any slave is promoted to the master. Different slaves could be logically in the future compared to the new master, for example. There must be a way to detect and correct this, otherwise, all you can do is to dump and restore the other slaves from the new master to get resynchronize.
The new master can be rolled forward, if necessary, from other slaves because of the way the replication transactions are packaged and saved. Replication data is packaged into blocks of transactions and sent to each slave. Each slave knows what blocks it has "consumed." Each slave can also pass those blocks along to other servers--this is the mechanism of cascading slaves. A new master may be on transaction block 17 relative to the old master when another slave is on transaction block 20 relative to the old master. The switch to the new master is enabled by having the other slave send to the new master blocks 18, 19 and 20.
Jan, said, "This feature took me a while to develop even in theory."

3. Backup and Point-In-Time capability with a twist.

It is possible, with some scripting, to maintain a delayed slave as a backup which might, for example, be two hours behind the master. This is done by storing and delaying the application of the transaction blocks. With this technique it is possible to do a Point-In-Time-Recovery anytime within the last two hours on this slave. The time it takes to recover will only depends on the time to which you choose to recover. Choosing "45 minutes ago" would take about 1 hour and 15 minutes, for example, independent of database database size.

4. Hot PostgreSQL installation and configuration

For failover, it must be possible to put a new master into place and reconfigure the system so the any slaves can be reassigned the new master or reassigned to cascade from another slave. All of this must be done without taking down the system.
This means that a new slave must be able to be added to the system and synchronized without disruption of the master. Then when the new slave is in place, the switch of masters can be done.
This is particularly useful when the new slave is a different PostgreSQL version than the previous one. After adding, for example, a 7.5 slave to a 7.4 master, it should be possible to promote the new slave to master and have a hot upgrade of the new version.

5. Schema changes

Special consideration must be given to schema changes. The bundling of the replication transactions must be able to bundle all of the pertinent schema changes together, whether or not they were run in the same transaction. Identifying these change sets is a very difficult problem.
In order to address this issue, Slony-I will have a way to execute SQL scripts in a controlled fashion. This means that it is even more important to bundle and save your schema changes in scripts. Tracking your schema changes in scripts is a key DBA procedure for keeping your system in order and your database recreatable.

The first part of Slony-I also does not address any of the user interface features required to set up and configure the system. Once the core engine of Slony-I becomes available development of the configuration and maintenance interface can be started. There may be multiple interfaces available, depending on who develops the user interface and how.

Jan points out that "Replication will never be something where you type SETUP and all of the sudden your existing enterprise system will nicely replicate in a disaster recovery scenario." Designing how your replication will be set up is a complex problem.

The User Interface(s) will be key is clarifying and simplifying the configuration and maintenance of your replication system. Configuration of which tables replicate, the handling of sequence coordination and trigger coordination are among some of the issues that must be addressed in configuration of a replication system.

The Slony-I release does not address the issues of multi-master, synchronous replication or sporadically synchronizable nodes (the sales person on the road scenario). However, these issues are being considered in the architecture of the system so that future Slony releases may implement some of them. It is critical future features be designed into the system; analysis of existing replication systems has shown that it is next to impossible to add fundamental features to an existing replication system.

The primary question one should ask regarding the requirements for a failover system is how much down time can you afford. Is five minutes acceptable? Is one hour? Must the failover be read/write? Or is it acceptable to have a readonly temporary failover? The second question you must ask is whether you are willing to invest in the hardware required to support multiple copies of your database. A clear cost/benefit analysis is called for, especially for large databases.

(This overview is based on a chat between Elein Mustain and Jan Weick about Slony on 2/17/04 and Jan Weick's Slony-I Concepts White Paper. Any misstatements are mine. All the really good explanations are Jan's.) Contributors: elein at varlena.com jan weick at janweick at yahoo.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top