Clustering Networks and Servers

Showing posts with label Clustering. Show all posts

How to use the Virtualization Lab (II)

Picking up from where I left, it was now time to change the setup into something very different. The first step was the creation of another VM inside Hyper-V to be used as an alternative source for iSCSI storage. I achieved this by installing the Microsoft iSCSI Target 3.3 on a new Server 2008 R2 x64 VM. I created this machine with two vhd files; one for the OS and the other one for the iSCSI storage.

I will now show you the steps taken to create three new iSCSI virtual disks:

Creation of the iSCSI target:

How to use the Virtualization Lab (I)

I finished last post on this series with a fully working cluster installed between two Hyper-V virtual machines (VM) using a virtual iSCSI solution installed on a Virtual Box VM as depicted in the next picture:

Before moving on in the process of adding complexity to the lab scenario, don't forget to safeguard your work; although this just a lab, it doesn't reduce the nuisance of having to reinstall everything in the event of any failure. So, create VM snapshots:

High Availability with Failover Clusters

Before moving on to the next chapter on my virtualization lab series, I think this might be a good opportunity to review some of the clustering options available today. I will use Windows Server Failover Clustering with Hyper-V because in today's world the trend is to combine Virtualization with High Availability (HA).

There are many ways to implement these solutions and the basic design concepts presented here can be adjusted to other virtualization platforms. Some of them will actually not guarantee a fault-tolerant solution, but most of them can be used in specific scenarios (even if only for demonstration purposes).

Two virtual machines on one physical server

In this scenario an HA cluster is built between two (or more) virtual machines on a single physical machine. Here we have a single physical server running Hyper-V and two child partitions where you run Failover Clustering. This setup does not protect against hardware failures because when the physical server fails, both (virtual) cluster nodes will fail. Therefore, the physical machine itself is a single point of failure (SPOF).

(Click to enlarge)

Failover Cluster Networking

The first step in the setup of a failover cluster is the creation of an AD domain because all the cluster nodes have to belong to the same domain. But before doing so, I changed the networks settings again in order to adjust them for this purpose.

LAB-DC:IP: 192.168.1.10
Gateway: 192.168.1.1 (Physical Router)
DNS: 127.0.0.1
Alternate DNS: 192.168.1.1

LAB-NODE1:
IP: 192.168.1.11
Gateway: 192.168.1.1
DNS: 192.168.1.10 (DC)
Alternate DNS: 192.168.1.1 (Physical Router)

LAB-NODE2:IP: 192.168.1.12
Gateway: 192.168.1.1
DNS: 192.168.1.10
Alternate DNS: 192.168.1.1

LAB-NODE3:
IP: 192.168.1.13
Gateway: 192.168.1.1
DNS: 192.168.1.10
Alternate DNS: 192.168.1.1

LAB-STORAGE:IP: 192.168.1.14
Gateway: 192.168.1.1
DNS: 192.168.1.10
Alternate DNS: 192.168.1.1

Therefore, I created a domain comprised of 5 machines; a DC and two member servers as Hyper-V VMs, a member server as a VMware VM and another member server as a VirtualBox VM.

So far I have demonstrated the possibility of integrating in the same logical infrastructure virtualized servers running on different platforms using different virtualization techniques; in this case we have VMs running in a Type 1 hypervisor (Hyper-V) and in two distinct Type 2 hypervisors (VMware Workstation and VirtualBox).

The option to create a network with static IP addresses is as valid as the alternative of using DHCP. Later on I plan to explore the several options provided by the cluster networking in Windows 2008 but for the time being I kept my network in a simple and basic configuration in order to proceed with the lab installation.

Cluster Node Configurations

The most common size for an high availability cluster is a two-node cluster, since that's the minimum required to assure redundancy, but many clusters consist of many more, sometimes dozens of nodes and such configurations can be categorized into one of the following models:

Active/Passive Cluster

In an Active/Passive (or asymmetric) configuration, applications run on a primary, or master, server. A dedicated redundant server is present to take over on any failure but apart from that it is not configured to perform any other functions. Thus, at any time, one of the nodes is active and the other is passive. This configuration provides a fully redundant instance of each node, which is only brought online when its associated primary node fails.
The active/passive cluster generally contains two identical nodes. Database applications single instances are installed on both nodes, but the database is located on shared storage. During normal operation, the database instance runs only on the active node. In the event of a failure of the currently active primary system, clustering software will transfer control of the disk subsystem to the secondary system. As part of the failover process, the database instance on the secondary node is started, thereby resuming the service.

This configuration is the simplest and most reliable but typically requires the most extra hardware.

Cluster Quorum Configurations

In simple terms, the quorum for a cluster is the number of elements that must be online for that cluster to continue running. Server clusters require a quorum resource to function and this, like any other resource, is a resource which can only be owned by one server at a time, and for which servers can negotiate for ownership. In effect, each element can cast one “vote” to determine whether the cluster continues running. The voting elements are nodes or, in some cases, a disk witness or file share witness. The quorum resource is used to store the definitive copy of the cluster configuration so that regardless of any sequence of failures, the cluster configuration will always remain consistent. Each voting element (with the exception of a file share witness) contains a copy of the cluster configuration, and the Cluster service works to keep all copies synchronized at all times.

When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this "split" situation, at least one of the sets of nodes must stop running as a cluster.
Negotiating for the quorum resource allows Server clusters to avoid "split-brain" situations where the servers are active and think the other servers are down.
To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many "votes" constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.

The failover mechanism

In the cluster, shared resources can be seen from all computers, or nodes. Each node automatically senses if another node in the cluster has failed, and processes running on the failed node continue to run on an operational one. To the user, this failover is usually transparent. Depicted in the next figure is a two-node cluster; both nodes have local disks and shared resources available to both computers. The heartbeat connection allows the two computers to communicate and detects when one fails; if Node 1 fails, the clustering software will seamlessly transfer all services to Node 2.

The Windows service associated with the failover cluster named Cluster Service has a few components:

Event processor;
Database manager;
Node manager;
Global update manager
Communication manager;
Resource (or failover) manager.

The resource manager communicates directly with a resource monitor that talks to a specific application DLL that makes an application cluster-aware. The communication manager talks directly to the Windows Winsock layer (a component of the Windows networking layer).
The failover process is as follows:

Clustering Basics

Clustering is the use of multiple computers and redundant interconnections to form what appears to be a single, highly available system. A cluster provides protection against downtime for important applications or services that need to be always available by distributing the workload among several computers in such a way that, in the event of a failure in one system, the service will be available on another.

The basic concept of a cluster is easy to understand; a cluster is two or more systems working in concert to achieve a common goal. Under Windows, two main types of clustering exist: scale-out/availability clusters known as Network Load Balancing (NLB) clusters, and strictly availability-based clusters known as failover clusters. Microsoft also has a variation of Windows called Windows Compute Cluster Server.

When a computer unexpectedly falls or is intentionally taken down, clustering ensures that the processes and services being run switch to another machine, or "failover," in the cluster. This happens without interruption or the need for immediate admin intervention providing a high availability solution, which means that critical data is available at all times.

How to use the Virtualization Lab (II)

How to use the Virtualization Lab (II)

How to use the Virtualization Lab (I)

How to use the Virtualization Lab (I)

High Availability with Failover Clusters

High Availability with Failover Clusters

Two virtual machines on one physical server

How to Setup a Virtualization Lab (III)

How to Setup a Virtualization Lab (III)

Failover Cluster Networking

Failover Clustering (IV)

Failover Clustering (IV)

Cluster Node Configurations

Active/Passive Cluster

Failover Clustering (III)

Failover Clustering (III)

Cluster Quorum Configurations

Failover Clustering (II)

Failover Clustering (II)

The failover mechanism

Failover Clustering (I)

Failover Clustering (I)

Clustering Basics