Split Brain Syndrome

Split Brain Syndrome, In a Oracle RAC environment all the instances/servers communicate with each other using high-speed interconnects on the private network. This private network interface or interconnect are redundant and are only used for inter-instance oracle data block transfers. Now talking about split-brain concept with respect to oracle RAC systems, it occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all physically up and running and the database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept user connections and work independently. So basically due to lack of communication the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instances running, the same block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. This situation is termed as Split Brain Syndrome.

I/O Fencing, there will be some situation where the leftover write operations from failed database instances (The cluster function failed on the nodes, but the nodes are still running at OS level) reach the storage system after the recovery process starts. Since these write operations are no longer in the proper serial order, they can damage the consistency of the data stored data. Therefore when a cluster node fails, the failed node needs to be fenced off from all the shared disk devices or disk groups. This methodology is called I/O fencing or failure fencing

Simple Majority Rule, According to Oracle – “An absolute majority of voting disks configured (more than half) must be available and responsive at all times for Oracle Clusterware to operate.” Which means to survive from loss of ‘N’ voting disks, you must configure atleast ‘2N+1′ voting disks.

Now we are in a state to understand the use of voting disk in case of heartbeat failure.

Example 1:- Suppose in a 3 node cluster with 3 voting disks, a network heartbeat fails between Node 1 and Node 3 & Node 2 and Node 3 whereas Node 1 and Node 2 are able to communicate via interconnect, and from the Voting Disk CSSD notices that all the nodes are able to write to Voting Disks thus spli-brain, so the healthy nodes Node 1 & Node 2 would would update the kill block in the voting disk for Node 3. Then when during pread() system call of CSSD of Node 3, it sees a self kill flag set and thus the CSSD of Node 3 evicts itself. And then the I/O fencing and finally the OHASD will finally attempt to restart the stack after graceful shutdown.

Example 2:- Suppose in a 2 node cluster with 3 voting disk, a disk heartbeat fails such that Node 1 can see 2 Voting Disks and Node 2 can see 1 Voting Disk, ( If here the Voting Disk wouldn’t have been odd then both the Nodes would have thought the other node should be killed hence would have been difficult to avoid split-brain), thus based on Simple Majority Rule, CSSD process of Node 1 (2 Voting Disks) sends a kill request to the CSSD process of Node 2 (1 Voting Disk) and thus the Node 2 evicts itself and then the I/O fencing and finally the OHASD will finally attempt to restart the stack after graceful shutdown.