The fundamentals of System Design

Load Balancing

Documenting my learning Journey

Kumar Swapnil
5 min readJan 31, 2021

--

In the pursuit of designing Highly Available systems, or cost effectively scale your systems, it’s often best to add/remove new servers as and when the traffic increases/decreases.

Load balancers distribute incoming client requests to these servers in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance.

In each case, the load balancer returns the response from the server to the appropriate client. In this way, a load balancer has following benefits:

  • Distributes client requests or network load efficiently across multiple servers.
  • Ensures high availability and reliability by sending requests only to servers that are online.
  • flexibility to scale up or scale down as demand dictates.

Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.

Failover

Load balancers ensures servers are highly available and reliable. Having said that, what’ll happen if the load balancer itself fails. Since load balancer is the single point of failure, it’ll make your whole system unavailable. To prevent this, it’s common to set up multiple load balancers, either in active-passive or active-active mode.

  • Active-Active : In active-active, both load balancers are managing traffic, spreading the load between them.

In the given diagram, there are two VIP’s. for the VIP 192.168.254.100, first load balancer acts as master and for 192.168.254.200, it acts as slave (vice versa for second load balancer). Now, one point to note here is that DNS can hold multiple records for the same domain name. DNS can return the list of IP addresses for the same domain name. When a web-browser requests a web-site, it will try these IP addresses one-by-one, until it gets a response. The key thing here is to use round-robin DNS so that load gets distributed between both the load balancers.

  • Active-Passive : In the diagram, you can see that both Active and Passive load balancers are sharing the same VIP. In such an architecture, both the servers agree on which will route the requests (active) and which will just keep on standby (passive).

With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over and resumes service.

While active-active mode leverages both the load balancers and as a result it may be used to increase the capacity of your load‑balanced cluster, the setup is complex and there are some caveats / downsides to it.

DNS Cache ( ipconfig/displaydns )

The first being, DNS Cache. Before a browser issues its requests to the outside network, the computer intercepts each one and looks up the domain name in the DNS cache database. The database contains a list of all recently accessed domain names and the addresses that DNS calculated for them the first time a request was made. So, it’ll make the load distribution uneven (because till TTL expires, a client will request to same load balancer). This forces to use a shorter TTL which can be expensive since the DNS server needs to be requested more frequently. Second, if a single node in an active‑active pair were to fail, the capacity would be reduced by half.

Types of Load Balancers

  • L4 load balancer : Layer 4 load balancing operates at the intermediate transport layer, which deals with delivery of messages with no regard to the content of the messages. Transmission Control Protocol (TCP) is the Layer 4 protocol for Hypertext Transfer Protocol (HTTP) traffic on the Internet. Layer 4 load balancers simply forward network packets to and from the upstream server without inspecting the content of the packets. They can make limited routing decisions by inspecting the first few packets in the TCP stream.
  • L7 load balancer : Layer 7 load balancing operates at the high‑level application layer, which deals with the actual content of each message. HTTP is the predominant Layer 7 protocol for website traffic on the Internet. Layer 7 load balancers route network traffic in a much more sophisticated way than Layer 4 load balancers, particularly applicable to TCP‑based traffic such as HTTP. A Layer 7 load balancer terminates the network traffic and reads the message within. It can make a load‑balancing decision based on the content of the message (the URL or cookie, for example). It then makes a new TCP connection to the selected upstream server (or reuses an existing one, by means of HTTP keepalives) and writes the request to the server.

Layer 7 load balancing is more CPU‑intensive than packet‑based Layer 4 load balancing, but rarely causes degraded performance on a modern server. Layer 7 load balancing enables the load balancer to make smarter load‑balancing decisions, and to apply optimizations and changes to the content (such as compression and encryption). It uses buffering to offload slow connections from the upstream servers, which improves performance.

Ways of Request Routing

Different load balancing algorithms provide different benefits; the choice of load balancing method depends on your needs:

  • Random
  • Round Robin — Requests are distributed across the group of servers sequentially.
  • Session/cookies
  • Least Connections — A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.

The first (and perhaps most intuitive) approach is to always prefer the least loaded backend. In theory, this approach should result in the best end-user experience because requests are always routed to the least busy machine. Unfortunately, this logic breaks down quickly in the case of stateful protocols, which must use the same backend for the duration of a request. This requirement means that the balancer must keep track of all connections sent through it in order to make sure that all subsequent packets are sent to the correct backend. The alternative is to use some parts of a packet to create a connection ID , possibly using a hash function and some information(Session/cookies) from the packet, and to use the connection ID to select a backend.

References:

--

--