Load Balancing (IV)

Hardware based load balancing

A hardware load-balancing device, also known as a layer 4-7 router, is a computer appliance that is used to split network load across multiple servers based on factors such as CPU processor utilization, the number of connections or the overall server performance.

The use of an this kind of appliances minimizes the probability that any particular server will be overwhelmed and optimizes the bandwidth available to each computer or terminal. In addition, the use of an hardware load-balancing device can minimize network downtime, facilitate traffic prioritization, provide end-to-end application monitoring, provide user authentication, and help protect against malicious activity such as Denial-of-Service (DoS) attacks.

The basic principle is that network traffic is sent to a shared IP called a virtual IP (VIP), or listening IP and this address is attached to the load balancer. Once the load balancer receives a request on this VIP it will need to make a decision on where to send it and this decision is normally controlled by a load balancing algorithm, a server health check or a rule set.

The request is then sent to the appropriate server and the server will produce a response that, depending on the type of load balancer in use, will be sent either back to the load balancer, in the case of a Layer 7 device, or more typically with a Layer 4 device, directly back to the end user (normally via its default gateway).
In the case of a proxy based load balancer, the request from the web server can be returned to the load balancer and manipulated before being sent back to the user. This manipulation could involve content substitution or compression and some top end devices offer full scripting capability.

Load Balancing Algorithms

Load balancers use different algorithms to control traffic and with the specific goal of intelligently distribute load and/or maximize the utilization of all servers within the cluster.

Random Allocation

In a random allocation, the traffic is assigned to any server picked randomly among the group of destination servers. In such a case, one of the servers may be assigned many more requests to process while the other servers are sitting idle. However, on average, each server gets an approximately equal share of the load due to the random selection. Although simple to implement it can lead to the overloading of one server or more while under-utilization of others.

Round-Robin Allocation

In a round-robin algorithm, the traffic is sent to the destination server on a rotating basis in an attempt to distribute the load equally to each server, regardless of the current number of connections or the response time. The first request is allocated to a server picked randomly from the group of destination servers and for subsequent requests, the algorithm follows the circular order the remaining servers are listed. Once a server is assigned a request, the server is moved to the end of the list and the next server is chosen for the following request, keeping all the servers equally assigned.

Round-robin is suitable only when all the servers in the cluster have equal processing capabilities; otherwise, some servers may receive more requests than they can process while others are using only part of their resources. It is obviously much better than random allocation because the requests are equally divided among the available servers in an orderly fashion but not good enough if technical specification of the servers in the destination group differs greatly (making that the load each server can handle differs greatly). 

Weighted Round-Robin Allocation

Weighted Round-Robin is an advanced version of the Round-Robin that accounts for the different processing capabilities of each server. It is possible to manually assign a performance weight to each server in the destination group, and a scheduling sequence is automatically generated according to the server weight. Requests are then directed to the different servers according to a round-robin scheduling sequence.

For example, if the server group consists of 2 servers and one server is capable of handling twice as much load as the other, the powerful server gets twice the weight factor and in such a case, the load balancer would assign two requests to the powerful server for each request assigned to the weaker one. This algorithm has the advantage of takes into account the capacity of the servers in the group but still lacks the advanced load balancing requirements such as processing time for each individual request.

Least connections

A least-connection algorithm sends requests to the server currently serving the fewest connections. The load balancer will keep track of the number of connections the servers have and send the next request to the server with the least connections.

Server agent

In this case a client is installed on the server to communicates with the load balancer. This is sometimes required when using a basic load balancer that has direct server return. I.e. it does not know how many actual connections the server has or how well it is responding as it does not get the responses from the servers.

Load Balancing Methods

I will now illustrate some of the load balancing methods available in modern load balancers. Please keep in mind that I’m not showing any real examples and not considering network design specifications like servers in a DMZ or in an Intranet. My goal is just to show the basic concepts, ok?

Direct Routing

The direct routing (DR) mode is a high performance solution with little change to the existing infrastructure and it works by changing the destination MAC address of the incoming packet on the fly which is very fast.
Direct Routing

However, it means that when the packet reaches the real server it expects it to own the Virtual IP (VIP) so there is a need to make sure the real server responds to the VIP, but does not respond to ARP requests. Direct routing mode enables servers on a connected network to access either the VIPs or real IPs without the need for any extra subnets or routes but the real server must be configured to respond to both the VIP and its own IP address.

Network Address Translation

Sometimes it is not possible to use DR mode; either because the application cannot bind to RIP & VIP at the same time or because the host operating system cannot be modified to handle the ARP issue. In this event it’s possible to use the Network Address Translation (NAT) mode which is also a fairly high performance solution but it requires the implementation of an infrastructure with an internal and external subnet to carry out the translation.

Network Address Translation
In this mode the load balancer translates all requests from the external virtual server to the internal real servers and the real servers must have their default gateway configured to point at the load balancer. If real servers must be accessible on their own IP address for non-load balanced services, i.e. SMTP, there will be a need to set up individual firewall rules for each real server.

Source Network Address Translation

If the application requires that the load balancer handles cookie insertion then there is a need to use the Source Network Address Translation (SNAT) configuration which does not require any changes to the application servers. However, as the load balancer is acting as a full proxy it doesn't have the same raw throughput as the previous methods.
Source Network Address Translation

The load balancer proxies the application traffic to the servers so that the source of all traffic becomes the load balancer.

Transparent SNAT

If the source address of the client is a requirement then the balancer can be forced into transparent mode requiring that the real servers use the load balancer as the default gateway (as in NAT mode) and only works for directly attached subnets (also as in NAT mode).

Transparent SNAT

SSL Termination or Acceleration

All of the layer 4 and Layer 7 load balancing methods can handle SSL traffic in pass through mode i.e. the backend servers do the decryption and encryption of the traffic. However, to inspect HTTPS traffic in order to read or insert cookies there is the need to decode (terminate) the SSL traffic on the load balancer and this can be achieved this by importing the secure key and signed certificate to the load balancer giving it the authority to decrypt traffic.
SSL Termination

1 comment:

Hiren Khatri said...

You explanations are really clear and its really helping me on my project on load balacing approaches and algorithms. I have a question and if you can answer it would be really appreciated. All these algorithms distribute load based on requests for a page of file. But the time it will take for the server to complete these requests will not be the same. For example one server is completing a request for a 20MB file which can take much longer compared to a 200KB file. So servicing the 20MB file request will take a much longer processing time and will cause a build up of subsequent requests. Can any of the algorithms do balancing based on packet size as well and not just by requests? Also what happens with uneven packet sizes?