PostgreSQL Warm Fail Over using Write Ahead Logs

PostgreSQLA typical fail over method for any application is to have two identical machines with all data stored on a shared SAN. This falls short on ensuring the integrity of the database. Do we know that the database was properly shutdown on the primary node before failing over? If not, data loss can occurs and the only recovery method will be restoring from backup. This method also introduces a single point of failure for the database in the shared storage.

Luckily, postgres provides a transaction logging method known as write ahead logs (WAL). We can exploit these feature to create a live backup of the production system on the fail over node. We’ll do this by having the primary node copy it’s wal logs to the secondary node. The secondary node will be in continuous recovery mode, reading the wal logs and applying them to it’s database.

Pre-requisites

Before we begin, a few requirements:

  • Both machines must be the same postgresql version. This document assumes postgresql 8.x. Different solutions are available for postgresql 7.x and 9.x.
  • Both machines must be the same architecture (64 or 32 bit).
  • The primary node must be able to access the secondary node via ssh.
  • Install postgresql-contrib on secondary machine

Configure the primary node

  • Generate ssh key and place public key in authorized_hosts on secondary
    # ssh-keygen -t dsa
    # ssh-copy-id -i ~/.ssh/id_dsa.pub postgres@secondary
  • Enable WAL logging by editing postgresql.conf and setting the following values.

archive_mode = on
archive_command = 'rsync --delete-after -a %p postgres@secondary:/var/lib/pgsql/walfiles/%f'

Configure the secondary node

  • Create a directory for the WAL files
    # mkdir /var/lib/pgsql/walfiles
  • Create /var/lib/pgsql/recovery.conf with the following contents:

restore_command = '/usr/bin/pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/walfiles %f %p %r 2>>standby.log'

Initialize the cluster

  • On the primary run the following:
    Please note that ‘dhreplication’ is an arbitrary tag, and can be set to anything useful to your configuration.
    # psql -U postgres -c “SELECT pg_start_backup(‘dhreplication’);”
    # rsync -avz /var/lib/pgsql/data/* secondary:/var/lib/pgsql/data/
    # psql -U postgres -c “SELECT pg_stop_backup();”
  • On secondary run the following:
    Edit /var/lib/pgsql/data/postgresql.conf and set ‘archive_mode = off’
    # ln -s /var/lib/pgsql/recovery.conf /var/lib/pgsql/data/recovery.conf
    A symlink is used above, as /var/lib/pgsql/recovery.conf will be removed when recovery is disabled.
    # service postgresql start
  • Monitor for problems by watching /var/pgsql/pgsql.log and /var/lib/pgsql/data/standby.log on secondary

Testing fail over

  • On primary edit /var/lib/pgsql/data/postgresql.conf and set “archive_command = /bin/true” then reload postgresql to make the changes active.
    # service postgresql reload
  • On secondary
    # tail /var/pgsql/pgsql.log /var/lib/pgsql/data/standby.log
    # touch /tmp/pgsql.trigger.5432
  • You may now connect clients to the secondary server.

To resume running on primary

  • On secondary
    # psql -U postgres -c “SELECT pg_start_backup(‘dhreplication’);”
    # rsync -avz /var/lib/pgsql/data/* primary:/var/lib/pgsql/data/
    # psql -U postgres -c “SELECT pg_stop_backup();”
  • Preform steps above to initialize the secondary server again.

External sources

http://www.xtuple.org/replication-how-to
http://www.postgresql.org/docs/8.4/static/high-availability.html
http://www.postgresql.org/docs/8.4/static/warm-standby.html

Tagged with: , ,
Posted in TeamForge

Leave a Reply

Your email address will not be published. Required fields are marked *

*