In this blog I am sharing a HANA database Disaster Recovery exercise using the new option for takeover command, –suspendPrimary.
It is a much simpler option to failover from site to site, avoiding a “split brain” situation , which is relatively complex to fix.
The complete syntax : hdbnsutil -sr_takeover –suspendPrimary
The new option is applicable for system version on the primary is HANA 2.0 SPS 04 or greater.
Reference: SAP HANA High Availability (New and Changed) | SAP Help Portal
The environment which I have tested:
Three databases with two are running in a cluster for HA(High availability), Site1 and Site2. and one for disaster recovery, Site3. Site3 is placed in a remote location away from Site1 and Site2, therefore synchronization method is asynchronous.
Prerequisite:
Verify that system replication is active and that all services are in sync.
You can check that the column REPLICATION_STATUS in M_SERVICE_REPLICATION has the value ACTIVE for all services.
You also can find this from the O/S level using python script systemReplicationStatus.py located in /usr/sap/SID/HDBXX/exe/python_support – XX – Instance number
Initial replication site mapping as the following:
Site1 – Primary
Site2 – Secondary, replicating from Site1 synchrounously
Site3 – Secondary, replicating from Site2 asynchrounously
In a disaster situation, we want Site3 to be the primary database.
Steps to takeover Site3 to become a Primary:
1. Make sure replication are active and all the sites are fully synchronized.
2. Register Site3 to Site1 otherwise you will get the “no consumer error”
Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.
Command: hdbnsutil -sr_register –name=Site3 –online –remoteHost=Site1_Host –remoteInstance=00 –replicationMode=sync
3. Register Site2 to Site3.
If this is not done, Site2 will go down when the takeover is being executed, and cause cluster error.
Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.
Command: hdbnsutil -sr_register –name=Site2 –online –remoteHost=Site3_Host –remoteInstance=00 –replicationMode=sync
4. Perform takeover from Site3
Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.
hdbnsutil -sr_takeover –suspendPrimary
5. Register Site1 to Site3
After the takeover, the suspended primary is unblocked when you register it as the new secondary.
Make sure the servers are still fully synchronized. If synchronization is still in progress, wait until it finishes, or troubleshoot any error which might occur.
Command: hdbnsutil -sr_register –name=Site1 –online –remoteHost=Site3_Host –remoteInstance=00 –replicationMode=sync
6. Status after the takeover
Now Site3 becomes the new primary.
With the new takeover option –suspendPrimary, I did not observe any DB downtime, no cluster maintenance mode needed.
For failing back, follow the same steps in reverse, please let me know if you need the detail steps.
Thank you
Welly Sunarko