High Availability - Objectives

The main objective in designing any High Availability (HA) and Disaster Recovery (DR) strategy is Business Continuity. Each business has its own level of tolerance for system failures and outages and depending upon that tolerance, a suitable strategy can be planned and implemented. If a business can accept 90% system availability, then there is no need to build any HA infrastructure.

Although HA solutions are frequently discussed in business environments, these considerations apply to any type of organization-defense, educational or non-profit-where HA is required. When it comes to organizations other than businesses, it is not very easy to calculate the costs associated with downtime. HA and DR systems in these organizations become more of a requirement from the service standpoint than from the perspective of cost associated with downtime.

Availability of the systems should be seen from the end user's perspective. Any time a user cannot connect to the system is considered as downtime but this does not necessarily mean the main computer system is going down because, in many cases, a poorly performing system is also considered an Unavailable System.

So, High Availability does not really mean to build redundancy into only one system or database or application, but it is a combination of redundancies being built into all areas of the process. For every business or organization, the database plays an important role; everything is built around the database, hence, most of the efforts for High Availability are concerned with making the database "Highly Available."
More specifically, a high availability architecture should have the following traits:
  • Tolerate failures such that processing continues with minimal or no interruption;
  • Be transparent to (or tolerant of) system, data, or application changes;
  • Provide built-in preventative measures;
  • Provide proactive monitoring and fast detection of failures;
  • Provide fast recoverability;
  • Automate detection and recovery operations;
  • Protect the data so that there is minimal or no data loss;
  • Implement the operational best practices to manage your environment;
  • Achieve the goals set in SLAs (for example, the RTO and the RPO) for the lowest possible total cost of ownership.
System designers often build reliability into their platforms by building in correction mechanisms for latent faults that concern them. These faults, when correctable, do not produce errors or failures since they are part of the design margins built into the system. They should still be monitored to measure their occurrence relative to designers anticipated frequency, since excessive occurrence of some correctable faults is often an indicator of a more catastrophic underlining latent fault.

Reliability, recoverability, timely error detection, and continuous operations are primary characteristics of a highly available solution.