Networks and Servers: High Availability

Definition of High Availability

High Availability (HA) is the ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. HA, then, is a trade-off between the cost of downtime and the cost of the protective measures that are available to avoid or reduce downtime.

The term High Availability, when applied to computer systems, means that the application or service in question is available all the time, regardless of time of day, location and other factors that can influence the availability of such an application.

In general, HA it is the ability to continue a service for extremely long durations without any interruptions. Hence, HA is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

HA systems should protect companies from two possible failures: system failures and site failures. Though true HA solutions should guard against both system and site failures, usually HA systems are regarded as protection from system failure, while site failure is typically protected by a Disaster Recovery (DR) system.

High availability can also be seen as the characteristic of a system to protect against or recover from minor outages in a short time frame with largely automated means. It does not matter if the failures that cause minor outages are in the systems themselves, or in the environment, or are the result of human errors. In the case of such a failure, highly available systems have the option to abort current sessions, i.e., the user will notice the failure. But they are expected to make the service available again, in a short time frame.

Different businesses require different levels of risk with regard to loss of data and potential downtime. A variety of technical solutions can be used to provide various levels of protection with respect to these business needs. The ideal solutions would have no downtime and allow no data to be lost. Such solutions do exist, but they are expensive, and their costs must be weighed against the potential impact to the business of a disaster and its effects.

Typical technologies for HA include redundant power supplies and fans for servers, RAID (redundant array of inexpensive/independent disks) configuration for disks, clusters for servers, multiple network interface cards and redundant routers for networks.

Adding to this complexity is the globalization of businesses, which ensures that there is no "quiet time" or "out of office hours" so essential to the maintenance requirements of these computer systems. Hence, businesses' computer systems -- the life blood of the organization -- must be available at all times: day or night, weekday or weekend, local holiday or workday. The term "24×7×forever" effectively describes business computer system availability and is so popular that this term is being used in everyday language to describe non-computer–based entities such as 911 call centers and other emergency services.

It is a popular belief that High Availability means 24x7 and is required for businesses that operate 24x7 only, but this is not always true. There are some businesses which do not operate 24x7, but require their systems to be available at all times during their core business operation times.

Importance of High Availability

Increased demands on the internet from business critical applications

The importance of high availability varies among applications; databases and the Internet have enabled worldwide collaboration and information sharing by extending the reach of database applications throughout organizations and communities. This reach emphasizes the importance of HA in data management solutions. Both small businesses and global enterprises have users all over the world who require access to data 24 hours a day. Without this data access, operations can stop, and revenue is lost.

However, the need to deliver increasing levels of availability continues to accelerate as enterprises re-engineer their solutions to gain competitive advantage. Most often, these new solutions rely on immediate access to critical business data. When data is not available, the operation can cease to function and the downtime can lead to lost productivity, lost revenue, damaged customer relationships, bad publicity and even lawsuits.

Users, who have become more dependent upon their solutions, now demand service-level agreements (SLA) from their Information Technology (IT) departments and solution providers.

However, with these benefits has come an increasing dependence on that infrastructure. If a critical application becomes unavailable, then the business can be in jeopardy. Revenue and customers can be lost, penalties can be owed, and bad publicity can have a lasting effect on customers and a company's stock price. It is important to examine the factors that determine how your data is protected and maximize availability to your users.

Computers are working faster and faster, and the businesses that depend on them are placing more and more demands on them. The various interconnections and dependencies in the computing fabric consisting of different components and technologies are becoming more complex every day. The availability of worldwide access via the Internet is placing extremely high demands on businesses and the IT departments and administrators that run and maintain these computers in the background.

Minimize lost revenue due to downtime

It seems that when things are running smoothly, we hardly notice these advanced systems in our everyday activities. Yet, when these systems fail to perform their expected functions, they get our immediate attention. A system failure could result in just an inconvenience, but some system failures result in loss of revenue and, at the worst, loss of life.

Lost work hours are indirect indicators for lost revenue. When 1000 office workers cannot work for 2 h because some server is down, or when goods cannot be delivered, the sales that could have been made in that time span might be lost forever.

Lost revenue may be directly attributed to an IT outage. This is the most important business consequence, but it is the hardest to measure directly though it is possible to estimate it. If our point of sales systems is down, or if our company sells goods or services on the Internet, any outage in these systems will cause customers to go to competitors whose systems are still working. In the long term, many customer-visible system outages will damage our reputation and will result in loss of clients.

Availability means money in today’s global, competitive business environment. Many organizations need almost continuous availability of their mission-critical server resources. Loss of service (sometimes called an outage) of an important server often translates directly into lost revenue or, even worse, lost customers.

Increasingly, availability is measured in dollars, euros, and yen, not just in time and convenience.

2 comments:

Unknown said...: Well put. With the increased stress in the infrastructure caused by the business’ dependency to technology, there is definitely a need for alternate streams and redundancy checks. This not only insures that the system does not go down on critical times, especially on high-volume peak hours, but can potentially increase productivity due to the bigger capacity and avenues of information that the infrastructure can handle.

Doug Leven; March 12, 2013 at 12:37 PM
Unknown said...: When you have to monitor information from your networks at any time, High Availability (HA) will be a great help for you to easily access your documents. There are certain transactions need to be done urgently and give you big income when the needed documents are accessed readily and without any backlog. This will be of help to many offices and businesses that use large networks for their daily transactions.

Metroffice.com; June 10, 2013 at 12:19 PM