Load Balancing - PrepKit

Once you decide to scale horizontally by adding more servers, you need a mechanism to distribute incoming traffic across them. This is the job of a load balancer.

A load balancer is a device or service that acts as a "traffic police" for your servers. It sits in front of your application servers and routes client requests across all the servers capable of fulfilling those requests in a way that maximizes speed and capacity utilization and ensures that no one server is overworked.

Why Use a Load Balancer?

Scalability & Performance: By distributing the workload, a load balancer prevents any single server from becoming a bottleneck, improving the overall performance and responsiveness of your application. It allows you to scale out by simply adding more servers to the pool.
High Availability & Fault Tolerance: Load balancers perform regular health checks on the servers in the pool. If a server becomes unresponsive, the load balancer automatically stops sending traffic to it. This ensures that user requests are only sent to healthy servers, making your system resilient to individual server failures.
Flexibility & Maintenance: With a load balancer in place, you can perform maintenance on individual servers (like deploying new code or applying security patches) without impacting the application's availability. You can take a server out of the pool, update it, and then add it back in.

Common Load Balancing Algorithms

The load balancer needs a strategy to decide which server to send a request to. Here are some of the most common algorithms:

Round Robin: This is the simplest algorithm. The load balancer cycles through the list of servers and sends each new request to the next server in the list.
- Pros: Very simple and easy to implement.
- Cons: It assumes all servers are equal. If one server is more powerful than the others, or if some requests are more resource-intensive than others, it can lead to an uneven distribution of load.
Least Connections: The load balancer keeps track of how many active connections each server has and sends the next request to the server with the fewest active connections.
- Pros: It's a more intelligent algorithm that adapts to the current load on each server. It's a good choice when requests have varying levels of complexity.
- Cons: It's slightly more complex to implement than Round Robin.
Least Response Time: This algorithm sends the request to the server that has both the fewest active connections and the lowest average response time.
- Pros: It's even more sophisticated and takes into account not just the number of connections but also the server's performance.
- Cons: Requires more monitoring and calculation on the part of the load balancer.
IP Hash: The load balancer calculates a hash of the client's IP address and uses this hash to determine which server to send the request to.
- Pros: This ensures that requests from a specific user will always be sent to the same server. This is useful for applications that require session persistence or "sticky sessions" (i.e., storing user-specific data in the server's memory).
- Cons: It can lead to an uneven distribution of load if some IP addresses send many more requests than others. It also makes it harder to remove servers from the pool without disrupting user sessions.

Layers of Load Balancing

Load balancing can happen at different layers of the network stack.

Layer 4 (Transport Layer) Load Balancer: This type of load balancer makes its routing decisions based on information from the transport layer (TCP/UDP). It looks at the source and destination IP addresses and ports, but it doesn't inspect the content of the requests.
- Pros: Very fast and efficient because it doesn't need to understand the application-level data.
- Examples: AWS Network Load Balancer (NLB), many hardware load balancers.
Layer 7 (Application Layer) Load Balancer: This is a more sophisticated type of load balancer that can inspect the content of the requests, such as HTTP headers, URLs, and cookies.
- Pros: Allows for much more intelligent routing decisions. For example, you can route requests to different pools of servers based on the URL path (/api/videos goes to the video processing servers, while /api/images goes to the image processing servers). It can also handle things like SSL termination (decrypting HTTPS traffic).
- Cons: It's slower and more CPU-intensive than a Layer 4 load balancer because it has to do more work.
- Examples: AWS Application Load Balancer (ALB), Nginx, HAProxy.

Load Balancer Redundancy

A load balancer can itself become a single point of failure. If your load balancer goes down, your entire application is inaccessible.

To prevent this, you typically set up a pair of load balancers in a high-availability (HA) configuration.

One load balancer is active, handling all the traffic.
The other is passive (or standby), monitoring the active one.
If the active load balancer fails a health check, the passive one automatically takes over its IP address and begins handling traffic.

This ensures that your load balancing layer is also fault-tolerant.