Chapter 6. Rebooting a node


distreboot - reboot all servers in an object repository cluster

Before rebooting a server that's running a repository node, it's necessary to verify that the server is not the cluster master node:

[root@octopus ~]# stasher

Ready, EOF to exit.
> admin
Connected to, node
Maximum 10 objects, 32 Mb aggregate object size, per transaction.
Maximum 10 concurrent subscriptions.
octopus> status
  Status: master: (uuid dY6XWiGRtf5mV000QEDDK000002L
AGW00318n4AS), 1 slaves, timestamp 2012-09-10 08:56:08 -0400

[long output removed]

octopus> resign
Server resigned, new master is

The STATUS command shows the repository's current master node. No further action is needed if another node is the master node, but this example reboots the server named octopus which was the current master node. The resign command, executed on the master node, transfers the master node status to another server in the cluster. Afterwards, use the usual system administration process that reboots the server.

distreboot - reboot all servers in an object repository cluster

The distreboot command comes from a separate package, stasher-distreboot. stasher-distreboot gets installed separately, after installing stasher

distreboot servers as an example of an application that uses the stasher object repository cluster. The stasher-distreboot package gets installed on all servers that run an stasher object repository cluster. Upon request, they work together to reboot all servers, in an orderly fashion, in a way that minimizes the disruption and makes sure that the object repository cluster remains in quorum throughout the entire process.

Once this so-called distributed reboot gets initiated, all distreboot daemons on all nodes work together to perform a coordinated reboot. One server at a time gets rebooted. Once a server is rebooted, and the distreboot gets started again, after a reboot (presumably, as is the case with the default stasher installation package, distreboot together with stasher gets automatically restarted when the server boots), distreboot signals its peer on the next machine in the reboot list to begin rebooting it.

When a distributed reboot gets initiated, distreboot chooses the order in which the servers get rebooted according to the object repository's current master node. The master node always is the last server that gets rebooted. distreboot relies on the server's shutdown script to run stasher's default shutdown script that tries to resign the master node, before the shutdown proceeds, so that the master node gets transferred to another node, in an orderly fashion. Since the master node always is the last server to get rebooted, the master node transitions only once by a distributed reboot cycle.