Networks and Servers

High Availability – Networks (II)

Redundant Protocols

If you read the previous posts, your network already has redundant links and now you must decide how packets on the network will select their paths and avoid loops. This isn't a new problem; redundant paths have been addressed by protocols like Spanning Tree Protocol (STP) at Layer 2 and routing protocols like Open Shortest Path First (OSPF) at Layer 3. But these protocols can take 40 seconds or more to resolve and converge and this is unacceptable for critical networks, especially those with real-time applications like VoIP and video.

STP is a link management protocol that provides path redundancy while preventing undesirable loops in the network. For an Ethernet network to function properly, only one active path can exist between two stations. This protocol should be used in situations where you want redundant links, but not loops. Redundant links are as important as backups in the case of a failover in a network. A failure of your primary router activates the backup links so that users can continue to use the network. Without STP on the bridges and switches, such a failure can result in a loop.

To provide path redundancy, STP defines a tree that spans all switches in an extended network and forces certain redundant data paths into a standby (blocked) state. If one network segment in the STP becomes unreachable, or if STP costs change, the spanning-tree algorithm reconfigures the spanning-tree topology and reestablishes the link by activating the standby path.

An upgraded version of STP called RSTP (Rapid Spanning Tree 802.1w) cuts the convergence time of STP to about one second. One disadvantage to RSTP (and STP) is that only one of the redundant links can be active at a time in an "active standby" configuration another is that STP when changes the active path to another router, so the gateway addresses of the clients must change as well. To avoid these problems, you must run Virtual Router Redundancy Protocol (VRRP) along with STP and RSTP on your routers, which emulates one virtual router address for the core routers and takes about three seconds to fail over.

The advantage of using VRRP is that you gain a higher availability for the default path without requiring configuration of dynamic routing or router discovery protocols on every end host. VRRP routers viewed as a "redundancy group" share the responsibility for forwarding packets as if they "owned" the IP address corresponding to the default gateway configured on the hosts. One of the VRRP routers acts as the master and others as backups; if the master router fails, a backup router becomes the new master. In this way, router redundancy is always provided, allowing traffic on the LAN to be routed without relying on a single router.

But because VRRP and RSTP work independently, it's possible VRRP will designate one router as master and RSTP would determine the path to the backup router as the preferred path. Worst case, this means if the backup VRRP router receives traffic, it will immediately forward it to the master router for processing, adding a router hop.

Redundant devices

Today’s networks are high-tech and most times high speed. Common to most Wide Area Network (WAN) designs is the need for a backup to take over in case of any type of failure to your main link. A simple scenario would be if you had a single T1 connection from your core site to each remote office or branch office you connect with. What if that link went down? How would you continue your operations if it did?

Adding redundancy is the most common way to increase your uptime. First, make sure there's redundancy within your core router; redundant CPU cards, power supplies and fans usually can be added to chassis-based routers and switches, and some router and switch vendors have equipment with dual backplanes. With redundant CPU cards, you can force a failover to one card while you upgrade the second one, instead of having to bring the whole router down for the upgrade.

The goal of redundant topologies is to eliminate network downtime caused by a single point of failure. All networks need redundancy for enhanced reliability and this is achieved through reliable equipment and network designs that are tolerant to failures and faults and networks should be designed to reconverge rapidly so that the fault is bypassed.

Network redundancy is a simple concept to understand. If you have a single point of failure and it fails you, then you have nothing to rely on. If you put in a secondary (or tertiary) method of access, then when the main connection goes down, you will have a way to connect to resources and keep the business operational.

The critical point is that highly reliable network equipment is expensive because it is designed not to break and this typically includes things like dual power supplies, watchdog processors and redundant disk systems.

A highly available system may be built out of less expensive network products but these components may lack the redundant power supplies or other features of high-reliability equipment, and therefore, they may fail more often than the more expensive equipment. However, if the overall network design takes into account the fact that equipment may fail, then end users will still be able to access the network even if something goes wrong.

High Availability – Solutions

Companies are under increased pressure to keep their systems up and running and make data continuously available. Being now held to a higher standard for application and data availability, the trick is to design server and storage systems that are highly available and almost bullet-proof against unplanned downtime. In order to achieve the highest levels of availability a company has to implement a complete solution that will address all the possible points of failure. But what are the available options if you want to design a high availability solution?

You can see in the chart the main solutions for the three areas to be addressed; storage, services and networks:

In the following posts I will explain these solutions in further detail. Keep reading, ok?

High Availability – Networks (II)

High Availability – Networks (II)

Redundant Protocols

High Availability - Networks (I)

High Availability - Networks (I)

Redundant devices

High Availability – Solutions

High Availability – Solutions