High Availability – Networks (II)

Redundant Protocols

If you read the previous posts, your network already has redundant links and now you must decide how packets on the network will select their paths and avoid loops. This isn't a new problem; redundant paths have been addressed by protocols like Spanning Tree Protocol (STP) at Layer 2 and routing protocols like Open Shortest Path First (OSPF) at Layer 3. But these protocols can take 40 seconds or more to resolve and converge and this is unacceptable for critical networks, especially those with real-time applications like VoIP and video.
STP is a link management protocol that provides path redundancy while preventing undesirable loops in the network. For an Ethernet network to function properly, only one active path can exist between two stations. This protocol should be used in situations where you want redundant links, but not loops. Redundant links are as important as backups in the case of a failover in a network. A failure of your primary router activates the backup links so that users can continue to use the network. Without STP on the bridges and switches, such a failure can result in a loop.
To provide path redundancy, STP defines a tree that spans all switches in an extended network and forces certain redundant data paths into a standby (blocked) state. If one network segment in the STP becomes unreachable, or if STP costs change, the spanning-tree algorithm reconfigures the spanning-tree topology and reestablishes the link by activating the standby path.
An upgraded version of STP called RSTP (Rapid Spanning Tree 802.1w) cuts the convergence time of STP to about one second. One disadvantage to RSTP (and STP) is that only one of the redundant links can be active at a time in an "active standby" configuration another is that STP when changes the active path to another router, so the gateway addresses of the clients must change as well. To avoid these problems, you must run Virtual Router Redundancy Protocol (VRRP) along with STP and RSTP on your routers, which emulates one virtual router address for the core routers and takes about three seconds to fail over.
The advantage of using VRRP is that you gain a higher availability for the default path without requiring configuration of dynamic routing or router discovery protocols on every end host. VRRP routers viewed as a "redundancy group" share the responsibility for forwarding packets as if they "owned" the IP address corresponding to the default gateway configured on the hosts. One of the VRRP routers acts as the master and others as backups; if the master router fails, a backup router becomes the new master. In this way, router redundancy is always provided, allowing traffic on the LAN to be routed without relying on a single router.
But because VRRP and RSTP work independently, it's possible VRRP will designate one router as master and RSTP would determine the path to the backup router as the preferred path. Worst case, this means if the backup VRRP router receives traffic, it will immediately forward it to the master router for processing, adding a router hop.

Common Address Redundancy Protocol (CARP) is an improvement over the VRRP standard and is also a tool to help achieve system redundancy, by having multiple computers creating a single, virtual network interface between them, so that if any machine fails, another can respond instead, and/or allowing a degree of load sharing between systems.
Another router redundancy option is to run OSPF in the core router as well as on the aggregator switches. OSPF is a routing protocol used to allow routers to dynamically learn routes from other routers and to advertise routes to other routers. An OSPF router keeps track of the state of all the various network connections (links) between itself and a network it is trying to send data to hence it’s considered to be a link state protocol, so if one of the links goes down, it usually fails over in less than one second. You don't need VRRP with OSPF if you don't have redundant aggregator switches, because the clients would use the single aggregator switch as their gateway address.
Most OSPF router and switch implementations now support Equal Cost Multipath (ECMP), a newer version of OSPF that load balances traffic equally across two links that are always active in an active/active configuration and, if there is a failure, only half the traffic will be affected. Load balancing also means that, theoretically, the total bandwidth of both links available. But, depending upon both links for bandwidth requirements is not full redundancy.
A simple way to add redundancy between any two switches is to use the IEEE 802.3ad (link aggregation) protocol. This trunking protocol takes multiple connections and combines them into one virtual pipe, to increase bandwidth and redundancy for higher availability. Packets are load-balanced across the connections so, if one of them goes down, traffic is directed to the remaining connection or connections.
The Hot Standby Router Protocol (HSRP) provides a mechanism which is designed to support non-disruptive failover of IP traffic in certain circumstances. In particular, the protocol protects against the failure of the first hop router when the source host cannot learn the IP address of the first hop router dynamically and but HSRP is not intended as a replacement for existing dynamic router discovery mechanisms and those protocols should be used instead whenever possible.

No comments: