Load testing websites with siege

Web applications can be quite complex, depending on multiple components like caches, databases and proxies to deliver the final product to the user. Noticing performance issues in such infrastructure can be complex, so benchmarking is used as a first step in ensuring user experience is within expectations.

An introduction to siege

The siege program is a long-time favorite for benchmarking the performance of http-based applications and servers. It differs from other tools in the way it handles benchmarking, focusing on simulating real-world usage rather than plain HTTP requests. It supports cookies to simulate user sessions, hits specified urls at random and will also parse HTML responses and request linked dependencies like images, CSS and javascript files.

Before you run a benchmark

Before executing a benchmark against a web application, make sure you have the right to do so: Just renting a server from a provider is typically not enough to run a benchmark against it. Check the contract terms of service to see if benchmarking is even allowed. Even when the contract allows benchmarking the server, you need to access more network infrastructure to reach it (for example the network of your ISP), which may also prohibit sending such traffic, or throttle/block your connection upon receiving it - because legitimate benchmarking traffic is almost impossible to distinguish from hostile DDOS attack traffic.

Running benchmarks on a local machine does not have these legal concerns, but is also less reliable, because making network requests to localhost allows modern kernels to bypass most of the tcp/ip stack, showing much better performance than would be achievable in reality.

The best way would be to run benchmarks within controlled virtual infrastructure (virtual machines with virtual networking, all running on a single host machine so traffic does not spill into outside networks). Setting this up can be automated with tools like virtualbox or vagrant, but takes a little more effort to get going.

You should also ensure the machine running the benchmark is not doing any other work (like concurrent tasks for a ci/cd pipeline) to reduce interference.

Running a simple benchmark

To run a basic benchmark against a web server, you need two options: -c to specify how many concurrent requests should be made, and -t to define a duration for the benchmark:

siege -c 10 -t 1m http://localhost/

The example above will maintain 10 clients making requests concurrently for one minute, then produces benchmark results:

{  "transactions":                 191176,
   "availability":                 100.00,
   "elapsed_time":                  59.83,
   "data_transferred":           27994.81,
   "response_time":                  0.00,
   "transaction_rate":            3195.32,
   "throughput":                   467.91,
   "concurrency":                    9.26,
   "successful_transactions":      191176,
   "failed_transactions":               0,
   "longest_transaction":            0.03,
   "shortest_transaction":           0.00
}

The returned values are averages for the duration, including some key metrics to focus on:

availability is the percentage of successful requests. if this is lower than 100, some requests ran into errors which you should look into
response_time is the average response time in seconds, while concurrency shows the average number of concurrent requests. Large response times or concurrency values heavily mismatching the configured concurrent clients can hint at problems with processing requests in parallel within a reasonable time frame
transaction_rate is the number of requests served per second, useful to compare multiple test results and find changes in performance between software versions. You also need throughput, the bandwidth usage in MB/s, to estimate where slowdowns come from: if transactions per second goes down, but throughput remains the same, the size of the response may simply have gotten larger (making the network the bottleneck); if not, the application is likely performance-bound outside the networking area.

Understanding these key metrics allows you to gain a rough picture of how well your web application is performing during a stress testing scenario.

Stress testing multiple urls at once

Since a meaningful stress test will need to simulate users browsing through different pages of the web application, a single url benchmark is rarely useful for a complete picture into application performance. Instead of supplying a single url at the command line, you could write a file containing multiple urls, one per line:

urls.txt

http://localhost/offers
http://localhost/about
http://localhost/contact
http://localhost/team

Then provide the file to siege with the -f flag:

siege -c 10 -t 1m -f urls.txt

siege will now hit the specified urls in a random order to simulate user traffic, but you can turn that into sequentially requesting the urls in order with the -i flag.

Benchmarking with authorization

Since many web applications require users to log in to access the important logic, a stress test may need to do this as well. There are two options you have here:

Option #1: Provide a cookie for the requests:

Simply log into the web application manually, then fetch the resulting session cookie and write it's contents to a file named cookie.txt, then pass that to siege during testing:

siege -c 10 -t 1m -b cookie.txt http://localhost/

The contents of the cookie.txt file should be formatted like this:

session_id=abcdef0123; secure; httponly; path=/
username=bob

The session cookie would be reused by all clients, essentially turning the stress test into the session of one single user. If you need more realistic stress testing, the second option is a better fit for you.

Option #2: Using the automatic login feature

While not specified in the docs or man page, siege can dynamically send authentication requests for clients before starting the stress test. This can be achieved by supplying the login-url parameter in a config file:

siege.rc

login-url = http://localhost/login POST name=bob&pass=123
login-url = http://localhost/login POST name=john&pass=secret
login-url = http://localhost/login POST name=jane&pass=password

The example provides logins for three users, by sending http POST requests to http://localhost/login with their credentials. You may need to adjust this to your specific login form, and possibly turn off security features like csrf tokens for this to work properly.

During the test, tell siege to use the custom config file with -R:

siege -c 10 -t 1m -R siege.rc http://localhost/

The login-url parameter may be specified any number of times, with each thread grabbing the next login in line before starting the stress test. If you specify less logins than concurrent clients, they will simply wrap around to the top of the list.

Interpreting benchmark results

The results of running a benchmark are subject to many different contextual factors, like hardware and networking capabilities of the server hosting the web application, potential backends involving databases or storage, and caches being empty or outdated during runs. Performance can vary even when re-running the same test in the same environment with the same settings, simply due to background processes like regular cleanup tasks or thermal throttling.

A benchmark is a good tool to find performance issues, but it is by no means an indicator where the issue comes from, or if the issue is even real at all. Finding bad performance on a single benchmark should first and foremost compel you to re-run the test and ensure the issue is reproducible, then find out what is causing the slowdown.

Many things can cause a genuine performance bottleneck in modern applications, from hardware issues or aging to network security (firewalls, rate-limiting etc), the application's dependencies (caches, proxies, databases, ...) or the application itself (request handling, blocking calls in http handlers etc). Identifying the cause of a bottleneck is beyond the abilities of a benchmarking tool, but it has it's uses in testing different versions of the same software.

One such use case is regression testing within a ci/cd pipeline, where each new version of an application is tested automatically and decreased performance for a new release causes an alert for the developers, so they can look into why the system suddenly became slower, and if that is a problem. Since these tests only prompt further investigation and only compare the results against previous benchmarks on the same setup, they are reliable enough to provide value without their drawbacks becoming an issue.

When running benchmarks from within a ci/cd pipeline, make sure that the runner executing the job isn't executing other pipeline parts in parallel, as this could seriously interfere with the stress test and render the results useless.