I recently experienced an issue with a Microsoft Exchange 2010 Database Availability Group (DAG) failing over during a Veeam Backup & Replication job. The issue was occurring due to the snapshot committal process in VMware, which causes a brief pause in virtual machine I/O. This pause was causing the DAG member to lose sight of the file share witness, which in this case was housed on the customer CAS server, and subsequently fail over.
The resolution to this issue was to increase the CrossSubnetThreshold and CrossSubnetDelay of the cluster. The CrossSubnetThreshold specifies how many heartbeats can be skipped before the cluster fails over and the CrossSubnetDelay specifies the heartbeat interval. The threshold you set for both of these properties can depend on many factors, for example the speed of your underlying storage array or the size of the virtual machine that be being snapshot. In my case I needed to set both values to their maximum. This can be performed by carrying out the following:
1. Navigate to Start -> Administrative Tools and launch Windows PowerShell Modules
2. When the Powershell Window opens please enter the following command:
$cluster = Get-Cluster; $cluster.CrossSubnetThreshold = 10; $cluster.CrossSubnetDelay = 4000
3. Once the command has completed please run the following and ensure that the CrossSubnetDelay and CrossSubnetThreshold are set to 4000 and 10.
Get-Cluster | fl *
4. Re-run your Veeam backup job and see if the cluster fails over. If the backup completes correctly you can they reduce the CrossSubnetDelay and CrossSubnetThreshold to find the optimum values.
That’s it, your done.