From Gigenet Articles
In this article, we look at how load balancing is implemented on the web.
Contents
|
[edit] Overview of web load balancing
Resource management is a very difficult topic for any company. How many employees to have, how tables should a restaurant have, or how many rooms a hotel should have. The problem is not estimating the initial demand (Though that is riddled with problems as well), but determining how to respond to changes in demand.
The same problem translates over into the IT world. When setting up your dedicated server, it's very important to not only estimate how much traffic your site or application is going to receive. The aim of web load balancing, is to determine how you're going to respond to increased demand in the future. In our earlier article on How to choose a Server, we saw that some servers come with the possibility of adding additional processors to the motherboard. That is an example of using available capacity to it's maximum. But what do we do when we need yet another processor? The same goes for Internet Traffic. How do we ensure that we will be able to constantly, gracefully, and above all, cost efficiently meet growing traffic demands on our web applications?
Web load balancing is a set of techniques that configure the servers and other hardware in such a way that the workload is distributed equally amongst the servers, thereby reducing the dependency on any single server and making it easy to add resources in such a way that performance is scaled according to the resources added.
[edit] Goals of Web Load Balancing
Depending on what the goals are, the configurations of the servers and hardware will vary a bit. There are two potential goals that need to be met by web load balancing. Those of scalability and those of availability. The two goals are not exclusive though. Being scalable will inevitably increase the availability and vice versa. While determining your needs, your web hosting provider will help you understand your needs.
[edit] Web Load Balancing - Scalability of Infrastructure
Scalability of a web application refers to the ability of a website to smoothly handle increasing traffic and requests. Sometimes adding more resources to a single server doesn't work when the workload keeps on increasing. For example, adding more RAM to a single server can certainly help, but after a point, the performance suffers from diminishing returns. The same goes for CPU speed.
A web application is said to be scalable, if it is easy to add resources in such a way that increasing workload is taken care of without diminishing returns and without exponential expenditure. Coming back to our restaurant example, adding more waiters to handle increased customer flows works only upto a point. Soon there will be no more space to seat the patrons. This illustrates an important point in scalability. The system is only as scalable as it's least scalable component. In the case of the restaurant, the least scalable component is the space available, so adding more waiters has no effect until that is taken care of. The least scalable component therefore becomes the bottleneck.
There are several components to a web server system and it is important to identify the bottlenecks in the system while designing the infrastructure. A network is said to be linearly scalable when the performance increases in direct proportion to the resources added. This means that doubling the resources also doubles the capacity. It goes without saying, that in order to achieve this, the least scalable component or the bottleneck must itself have linear scalability.
In practice however, linear scalability can never really be achieved. This is due to management and synchronization load. More resources are more difficult to manage and this takes a small bite out of the performance.
[edit] High availability
This goal deals with the concept that the network will always be available. Since network failure is a very real possibility for many reasons, all solutions that focus on high availability deal with backup systems. These are meant to take over from the original in case of failure. Disaster recovery is very important for a business and a good colocation center will provide advanced techniques to achieve this. Load balancing provides solutions that implement these goals. In this article, we will focus more on scalability than availability, though as mentioned above, the two incorporate each other to a certain degree.
[edit] Scalability concepts in web load balancing
Scalability is achieved in web applications by creating a server cluster or a server farm. These are collections of servers that distribute the workload amongst themselves. The solution is designed to make adding more servers easy and efficient in order to handle increasing workloads. This is far more preferable to adding more resources to a single server for reasons of cost, reliability and efficiency.
A typical web application has two layers or tiers. The first layer is the application layer (usually on the same server as the web server), and the other is the data storage layer. These are separated because their functions are so different. The application layer has the responsibility for interacting with the client who is sending the request and the data layer has the responsibility of storing and serving up the data efficiently.
Web applications can have the same physical server for both the (logical or virtual) application and data servers. This is a very typical arrangement and is suitable only for small applications. It suffers from what is called a "Single Point of Failure", meaning that if that one server goes down, the entire application is lost. In addition, adding more resources to a single server is very difficult and becomes cost inefficient very quickly.
The solution is to use what is called a "Load Balanced Cluster".
[edit] Creating a basic Load balancing cluster
To handle web load balancing, the accepted solution is to stagger the load amongst many different servers. In the example cited above, the first step is to separate the application server from the database server. As the loads on each increase, we find that the requirements of an application server are somewhat different from those of a database server and separating the two yields scalability and efficiency benefits.
In addition, with increasing workload we might need to expand this simple cluster to include backup application and database servers, and this is messy if they're both sitting on the same machine. In this article, we focus more on the application server aspect of web load balancing.
[edit] Introducing Load Balancers
So say you decide to share the load of your application server amongst many different servers. How do you decide which server handles what? In order to make the whole system work efficiently, you need a fair means by which you can distribute the workload amongst them.
Load balancers like the Zeus are specific pieces of hardware that accept incoming requests from clients on the Internet and route them to one of the servers in the cluster. Remember that the end user never knows that he or she is dealing with more than one server. So the load balancer listens to a single port on which all the requests come over the Internet and internally allocates the request to one server or the other.
When the chosen server responds, the load balancer has to decide whether or not the next request from the same client should go to a different server or not. This simple question becomes quite complex due to something called Session State Management which we will look at a bit later down the line.
[edit] Hardware or software web load balancing?
One can implement web load balancing using software or hardware. Software load balancing requires the installation of software on all the different servers. This software synchronizes the various instances over the network and manages the server resources. Software load balancing involves a steep performance price. This is logical since the application consumes resources and these have to come from the servers themselves. In addition, installing a new server is a bit more complex.
Hardware load balancing on the other hand places a piece of dedicated hardware between the incoming requests and the servers and does all the heavy work of deciding which packet goes where and how to manage the whole set. This solution is more elegant than the software solution as it rightly dedicates and encapsulates a complex task on a dedicated resource that has a much lesser impact on the whole. In addition, this piece of hardware can double up as a firewall as it is physically sitting in front of the server farm. The benefits are efficiency, easier troubleshooting, security, and more scalability.
[edit] Techniques of using Load Balancers in web load balancing
There are several different methods which a load balancer can use to decide about which server to send a particular request to. Depending on the specific network you are using as well as the type of content your application is hosting, each of these techniques may be used.
[edit] The Round Robin Technique
The Round Robin technique simply takes each request and assigns it to the next server in it's list. The servers are arranged sequentially and each gets it's turn. This technique allocates the load equally to all servers. It assumes that each server is equally powerful and equally busy at all times. If your network consists of a large number of similar servers, then the Round Robin technique might be your choice because it's also the easiest to implement.
[edit] Weighted Round Robin
The weighted Round Robin technique takes into account the different capabilities of each server by assigning a number to each. The algorithm then developers a scheduling sequence that reflects this variance. For example, a particular server with a lower capability can receive a request once every two rounds instead of once every round. This is better than the simple round robin technique as it more accurately implements the purpose of web load balancing - graceful and efficient scalability.
[edit] Least Connection Technique
The load balancer takes each request and sends it to the server that has the least number of connections. Again, this technique doesn't take into account the fact that different servers have different capabilities and that powerful servers might perform better with a higher load than a weak server with a lower load.
[edit] Load Based technique
This technique examines the load on each server and makes the choice based on that. An interesting question arises as to what the meaning of load is, and how does a load balancer determine whether one server is more overloaded than the other. One possible technique is to monitor the availability of various resources on each server to determine which server has the most resources available.
Or it could be measured as a percentage of resources that are available on each server. This will take into account the fact that large servers need more free resources than smaller ones. However the load is measured, the last technique is the best implementation (only in principle) of the load balancing solution because it's the most fair. However, determining the availability of resources on all servers on a continuous basis isn't easy and can lead to performance degradation, as well as needing a more sophisticated load balancer.
Gigenet uses the Zeus Load Balancer which provide the most advanced and sophisticated algorithms that are available for the most efficient web load balancing solutions.
[edit] The problem of Session State Management
How many times do you think it's necessary for a web site to "know" who you are? Consider an elaborate sign up process that has many forms - say you're filing your income tax returns. During each "page load" that takes you to the next form on a new page, the server has to remember what you filled into the previous form. When you make a new request to the server for a new form by clicking the "next" button, the server has to know that your requests are related.
However, suppose there is a load balancer sitting between 10 servers. What happens if in the middle of your form-filing process (say at form 4), the load balancer decides that the server your requests have been going to till now has too much load and sends in your request for the 5th form to a new server in the cluster? This new server doesn't know about the previous four pages that you have filled up! Does this mean that your information that you have filled in till now has been lost?
Most of the time the server only needs to keep track of who you are between requests, and in such cases, the new server will simply ask you to validate yourself again (probably with a username/password combo) which may not be a big deal, but in scenarios like the above, it's unacceptable. This information that needs to be "retained" about you between requests is called Session Information, and it is one of the goals of web load balancing to ensure that session information is not lost between requests - even if each request goes to a different server.
[edit] Various solutions to the Session State Problem
There are different ways to deal with the problem of session state management and as with all such solutions, there is a balance that needs to be met with regards to scalability, availability, efficiency, and security.
[edit] Storing Session State on the Client
This approach relies on storing the persistent information on the client's machine. While this is quite effective and resource friendly, there are serious problems when the session information is more than just a few items of information. These problems mainly deal with security. Users and browsers don't look to kindly on programs that make user of the client's resources too much as this is a common tactic used by spyware and malware. At the most, it is acceptable to store a cookie or two that contains basic information. In addition, information that is stored on a client's computer is vulnerable to misuse as the client machines are not secured like the Servers are.
[edit] Storing Session State on a Dedicated server
This approach makes use of a separate server to store session state. Whenever any server receives a request from a client, it pulls the session information from this centralized server that is available to all the servers in the farm, thus ensuring that the user encounters a smooth transition between requests to different servers. It goes without saying that such a solution must be configured for high availability. If the single server responsible for storing the session state of all the others blacks out, then the entire server farm will lose it's session information! Backup and failover servers are the norm in this scenario.
[edit] Storing Session state in the Database
All applications already use a centralized database for storing long term data. This approach makes use of the existing infrastructure to store session state on the database. The disadvantage of this approach is an increase in the load of the database which is now being queried for trivial information. Also, databases are optimized for long term data storage and this solution is a bit inelegant.
[edit] Keeping the same server between requests
One obvious solution to the session state problem is to ensure that once a server receives a request, all other requests from the same client should be directed to that server alone which maintains the session state for that client. This is called server affinity and is one of the easiest to implement. However, should that server crash, it takes all the session information of all the users connected to it and is thus a single point failure for those instances of the application.
An alternative approach is to maintain session state amongst all the servers continuously so that if any one server crashes, the others will be able to gracefully take over. This is an elegant solution only suffering from the problems of scalability when the number of servers becomes too large after which it won't be feasible to replicate all the session state on each server.
Depending on the chosen solution, the architecture of the network will have to determined.
[edit] Creating a Tiered Network
Now that we have understood the theory involved in web load balancing and scalability, it is time for us to look at the exact architecture by means of which these goals can be achieved. By architecture, I mean how exactly are the resources arranged and what is the technique by which scalability is achieved. For this, we need to introduce the concept of a Tier.
[edit] Understanding Tiers
A tier is a set of servers that share common characteristics amongst themselves and benefit from being grouped together. For example, a set of web servers share their resource consumption profile which is very different from that of Database servers. Web servers are network intensive and consume network resources whereas Database servers consume a whole lot of disk space and perform many more file operations.
By physically grouping Web servers together, it is possible to optimize their operations. When you group a bunch of web servers together, you can maximize the flow of network resources to that group which needs it most, ensuring that it will be used efficiently. No point sending precious resources to servers that don't need it. Similarly, a group of database servers can share the valuable hard disk space that they all need. For example we can use an Internap FCP network to optimize all the web servers. This arrangement also makes it easier to add more servers to the tier when necessary as all the major infrastructure is already in place.
Also, different groups of servers can share different goals. Web servers for example have the goal of scalability, whereas Database servers have the goal of high availability. By grouping servers having similar goals together, management of the servers becomes easier.
[edit] Specific Tiered Solutions to implement web load balancing
In this section, we examine the various ways to implement tiered hosting and give solutions for single tiered, two tiered, three tiered, and four tiered architectures.
[edit] Single Tiered solution
Most business start out with just one server when hosting applications over the Internet. One big server which runs the web server, the database server, the mail server and the file server.
This approach is very easy to implement and is very efficient for small business that don't require many guarantees like high availability. Also, a little downtime is acceptable.
[edit] Two Tiered Solution
This typically involves separating the database server from the other servers. This happens because the database server is typically used by more than just one application. In addition, most if not all applications use the database server. The separation of the database server yields two benefits:
1. It allows scalability. The database server can be easily extended and more resources added if necessary if it is a separate server
2. If the non-database server crashes, other applications can continue using the database server thereby not getting interrupted.
3. Since the database usually hosts very sensitive data, it can be made more secure by not keeping it within one step of the public Internet.
In a standard web setup, we now have a web server (which also hosts the email server, file server and the application server) and the database sitting below that. A typical two-tiered solution.
[edit] Three Tiered Solution
In a three tiered solution the web server is generally separated from the application server, thereby improving performance. Because of this distinction, one can change any aspect of the either tier without it affecting the other. This provides additional flexibility and scalability to the entire system.
[edit] n-Tiered solution
The basic principles can be extended to group together any set of servers that perform a distinct task and therefore have similar resource utilization profiles. For example, you can have a separate set of email servers or a separate set file servers depending on your needs. You can even have a layer dedicated for security such as a firewall or a dedicated DDoS system
[edit] Summary of Web Load Balancing and Scalability
To summarize, the basic purpose of implementing scalability in web based system is to ensure that you can gracefully increase performance by adding new components. Through web load balancing, this is possible by utilizing a various set of tiers that have been explained in this article. We hope you found it useful.