A hardware load-balancing device, also known as a layer 4-7 router, is a computer appliance that is used to split network load across multiple servers based on factors such as CPU processor utilization, the number of connections or the overall server performance.
The use of an this kind of appliances minimizes the probability that any particular server will be overwhelmed and optimizes the bandwidth available to each computer or terminal. In addition, the use of an hardware load-balancing device can minimize network downtime, facilitate traffic prioritization, provide end-to-end application monitoring, provide user authentication, and help protect against malicious activity such as Denial-of-Service (DoS) attacks.
The basic principle is that network traffic is sent to a shared IP called a virtual IP (VIP), or listening IP and this address is attached to the load balancer. Once the load balancer receives a request on this VIP it will need to make a decision on where to send it and this decision is normally controlled by a load balancing algorithm, a server health check or a rule set.
The request is then sent to the appropriate server and the server will produce a response that, depending on the type of load balancer in use, will be sent either back to the load balancer, in the case of a Layer 7 device, or more typically with a Layer 4 device, directly back to the end user (normally via its default gateway). In the case of a proxy based load balancer, the request from the web server can be returned to the load balancer and manipulated before being sent back to the user. This manipulation could involve content substitution or compression and some top end devices offer full scripting capability.
Load Balancing Algorithms
Load balancers use different algorithms to control traffic and with the specific goal of intelligently distribute load and/or maximize the utilization of all servers within the cluster.
Random Allocation
In a random allocation, the traffic is assigned to any server picked randomly among the group of destination servers. In such a case, one of the servers may be assigned many more requests to process while the other servers are sitting idle. However, on average, each server gets an approximately equal share of the load due to the random selection. Although simple to implement it can lead to the overloading of one server or more while under-utilization of others.
Before we go any deeper into the abyss of all the techniques and algorithms used in the load balancing world it is important to clarify some concepts and notions and take a look at the most used load balancing terminology. The target audience of this blog is supposed to know what the OSI Model is and therefore I won’t even bother to explain what the layers are...
Server health checking
Server health checking is the ability of the load balancer to run a test against the servers to determine if they are providing service:
Ping: This is the most simple method, however it is not very reliable as the server can be up whilst the web service could be down;
TCP connect: This is a more sophisticated method which can check if a service is up and running like a service on port 80 for web. i.e. try and open a connection to that port on the real server;
HTTP GET HEADER: This will make a HTTP GET request to the web server and typically check for a header response such as 200 OK;
HTTP GET CONTENTS: This will make a HTTP GET and check the actual content body for a correct response. Can be useful to check a dynamic web page that returns 'OK' only if some application health checks work i.e. backend database query validates. This feature is only available on some of the more advanced products but is the superior method for web applications as its will check that the actual application is available.
Layer-2 Load Balancing
Layer-2 load balancing (also referred as link aggregation, port aggregation, ether channel or gigabit ether channel port bundling) is to bond two or more links into a single, higher-bandwidth logical link. Aggregated links also provide redundancy and fault tolerance if each of the aggregated links follows a different physical path.
It might be easier to make the client code and resources highly available and scalable than to do so for the servers; serving non-dynamic content requires fewer server resources. Before going into the details, let us consider a desktop application that needs to connect to servers on the internet to retrieve data. If our theoretical desktop application generates more requests to the remote server than it can handle, we will need a load balancing solution.
Instead of letting the client know of only one server from which to retrieve data, we can provide many servers—s1.mywebsite.com, s2.mywebsite.com, and so on. The desktop client randomly selects a server and attempts to retrieve data. If the server is not available, or does not respond in a preset time period, the client can select another server until the data is retrieved. Unlike web applications—which store the client code (JavaScript code or Flash SWF) on the same server that provides data and resource—the desktop client is independent of the server and able to load balance servers from the client side to achieve scalability for the application.