First – work out if the loss is permanent. If it’s not it might be worth waiting until the data centre is back – that way you can probably avoid the risk of split brain since you can shutdown the remaining DAG members and wait for a managed recovery. If it is permanent then you have to do a bit of work – nothing that is going to take you too long but it’s not as simple as running a couple of PowerShell commandlets; and you have to consider what happens if you cannot manage the recovery of the lost DAG members – it is likely that you will have to do a full seed as opposed to an incremental reseed as at a minimum there is likely to be divergence which the store may not be able to recover from. The steps that worked for me in our test rig are as follows:
- Bring the cluster online - “net start clussvc /forcequorum”
- Evict the lost cluster nodes (I used cluster manager)
- Update the DAG membership by removing the failed servers - “Remove-DatabaseAvailabilityGroupServer –id <DAG> -mailboxserver <server> -configurationonly>”
- Create a new Witness Directory - “Set-DatabaseAvailabilityGroup –id <DAG> -witnesserver <server>”
- Reboot the remaining DAG members (might get away with restarting the cluster service and\or the information store and mounting the database)
- Databases should mount automatically according to AutoDatabaseMountDial
The better news is that Exchange Server 2010 SP1 is hopefully going to change the game. As Scott Schnoll writes…
“DAC mode has been extended to support DAGs that have all members deployed in a single Active Directory site, including Active Directory sites that have been extended to multiple locations.”Take notice of the note that accompanies the blog though:
(http://blogs.technet.com/scottschnoll/archive/2010/04/10/new-high-availability-features-in-exchange-2010-sp1.aspx)
But a quick note: everything in this post is based on pre-release software and preliminary information that is subject to change. These are things we are working on or are about to work on. The feature names, behaviors and descriptions used below might not be the final names, behaviors and descriptions. The behvaiors described may or may not make it into the final shipping version of SP1 or a future version of the product. Standard disclaimers apply regarding pre-Beta software and content.”
The other idea that was given to me to avoid split brain and the need for a full seed in the event of failback is to mark the databases not to mount at startup. This means that where you cannot manage the startup of the lost DAG members the databases will not mount. Unfortunately this prevents automated failover where there is data loss but where the lost logs are within AutoDatabaseMountDial since the replicas will not be activated and get left in a failed state. Nice idea but didn’t work in my testing..
No comments:
Post a Comment