The failover mechanism
In the cluster, shared resources can be seen from all computers, or nodes. Each node automatically senses if another node in the cluster has failed, and processes running on the failed node continue to run on an operational one. To the user, this failover is usually transparent. Depicted in the next figure is a two-node cluster; both nodes have local disks and shared resources available to both computers. The heartbeat connection allows the two computers to communicate and detects when one fails; if Node 1 fails, the clustering software will seamlessly transfer all services to Node 2.
The Windows service associated with the failover cluster named Cluster Service has a few components:
- Event processor;
- Database manager;
- Node manager;
- Global update manager
- Communication manager;
- Resource (or failover) manager.
The failover process is as follows:
- The resource manager in failover clustering detects a problem with a specific resource;
- Each resource has a specific number of retries within a specified time window in which that resource can be brought online. The resources are brought online in dependency order. This means that resources will attempt to be brought online until the maximum number of retries in the limited time window has been reached. If not all the resources can be brought online at this point, the group might come online in a partially online state with the others resources marked as failed; any resource that has a failed dependency will not be brought online and will remain in a failed state. However, if any resource that failed is configured to affect the resource group, things are escalated and the failover process (via the failover manager) for that resource group is initiated. If the resources are not configured to affect the group, they will be left in a failed state, leaving the group partially online;
- If the failover manager is contacted, it determines based on the configuration of the resource group who the best owner will be. A new potential owner is notified, and the resource group is sent to that node be restarted, beginning the whole process again. If that node also cannot bring the resources online, another node (assuming there are more than two nodes in the cluster) might become the owner. If no potential owner can start the resources, the resource group is globally left in a failed state;
- If an entire node fails, the process is similar, except that the failover manager determines which groups were owned by the failed node, and subsequently figures out which other node(s) to send them to restart.
For best results, use identical hardware in all nodes. Here are some simple rules:
- In each clustered server, at least two network adapters are needed;
- Carefully plan the requirements of your workload and choose hardware that is sufficient to sustain the expected loads associated with those services.
- Choose hardware that matches the foreseen demands of your workload. Don’t waste money on four quad-core CPU servers if your workload requires little CPU power but intensive disk IO.
- Add excessive capacity if deploying a cluster in which each node will be running an active workload. Otherwise, upon failure of one node the remaining one might become overloaded.
One of the most relevant ways to categorize high availability clusters is how storage is shared. The two predominant architectures for clustering are shared-disk and shared-nothing.
Shared nothing clusters
At any given time, only one node owns a disk. When a node fails, another node in the cluster has access to the same disk. In a shared nothing system CPU cores are solely responsible for individual data sets and only one of the clustered systems can “own” and access a particular resource, or subset of data, at a time. A “shared nothing” architecture means that each server system has its own private memory and private disk. The data is partitioned in some manner and spread across a set of machines with each machine having sole access, and hence sole responsibility, for the data it holds.
The clustered servers communicate by passing messages through a network that interconnects the computers so the requests from clients are automatically routed to the system that owns the resource. Of course, in the event of a failure, resource ownership may be dynamically transferred to another system in the cluster. The main advantage of shared-nothing clustering is scalability. In theory, a shared-nothing multiprocessor can scale up to thousands of processors because they do not interfere with one another – nothing is shared.
Shared disk clusters
All nodes have access to the same storage. A locking mechanism protects against race conditions and data corruption. The disk ownership is moved from one node to another. This procedure requires that the disk be shared between the two nodes, such that when node2 becomes active, it has access to the same data as node1.
To allow virtualization of names and IP addresses, a failover cluster provides or requires redundancy of nearly every component—servers, network cards, networks, and so on. This redundancy is the basis of all availability in the failover cluster. However, there is a single point of failure in any failover cluster implementation, and that is the single shared cluster disk array, which is a disk subsystem that is attached to and accessible by all nodes of the failover cluster.
Technologies that may be required to implement shared disk clusters include a distributed volume manager, which is used to virtualize the underlying storage for all servers to access the same storage; and the cluster file system, which controls read/write access to a single file system on the shared SAN. In this shared disk architecture where every node can write to every disk, a sophisticated lock mechanism prevents inconsistencies which could arise from concurrent access to the same data.
Typically, shared-disk clustering tends not to scale as well as shared-nothing for smaller machines. But with some optimization techniques shared-disk is well-suited to larger enterprise processing, such as is done in the mainframe environment.
Shared disk usually is viable for applications and services requiring only modest shared access to data as well as applications or workloads that are very difficult to partition. Applications with heavy data update requirements probably are better implemented as shared-nothing.
With either architecture (shared disk or shared nothing), from a storage perspective, the disk needs to be connected to the servers in a way that any server in the cluster can access it by means of a simple software operation.
There are other clustering architectures, namely:
- Shared-everything clusters: Not only the file system is shared, but memory and processors, thus offering to the user a single system image (SSI). In this model, applications do not need to be cluster-aware because the processes are launched on any of the available processors, and if a server/processor becomes unavailable, the process is restarted on a different processor.
- Clusters using mirrored disks: Volume manager software is used to create mirrored disks across all the machines in the cluster. Each server writes to the disks that it owns and to the disks of the other servers that are part of the same cluster.
In Windows Server 2008, storage devices should have multiple separate Logical Unit Numbers (LUN). These LUNs must be set up as basic disks and it is suggested they be formatted with NTFS.
Increasingly, clustered servers and storage devices are connected over SANs. These networks use high-speed connections between servers and storage devices that allow storage be used over the wire, that is, over the network instead rather than via cables inside the machine.
The SAN used for cluster storage must support persistent reservations and one set of storage for the cluster must be isolated from all other servers using LUN masking or zoning. SANs allow for the consolidation of storage by using a few highly reliable storage devices instead of many separate, less reliable devices (like individual hard disks in servers) and also allow storage sharing with non-Windows operating systems, improving cross platform cooperation.
Has we have seen before, each clustered server needs at least two network adapters because a failover cluster has two primary networks:
- A private cluster network: Sometimes known as the intracluster network, or more commonly, the heartbeat network, this is a dedicated network that is segregated from all other network traffic and is used for the sole purpose of running internal processes on the cluster nodes to ensure that the nodes are up and running—and if not, to initiate failover. The private cluster network does not detect process failure.
- A public network: This is the network that connects the cluster to the rest of the network and allows clients, applications, and other servers to connect to the failover cluster.
- All nodes of a given cluster must be either running 32-bit or all running a 64-bit version of Windows Server, with no mixing.
- All servers in the cluster must be in the same Active Directory domain and should preferably be member servers and not domain controllers
- The servers in the cluster must be using Domain Name System (DNS) for name resolution and the DNS dynamic update protocol can be used.
Technorati Tags: High Availability,Servers,Failover Clustering