Serving a half billion requests per day with Rust + CGI

In my previous post Serving 200 million requests per day with a cgi-bin, I did some quick performance testing of CGI using a program written in Go.

Go works excellently for CGI programs, for many of the same reasons it works so well for CLI programs and system daemons.

But, out of curiosity, I decided to do a bit more CGI testing with other languages.

CGI is good technology, actually

There’s a misconception that because CGI is old or because many CGI scripts had security vulnerabilities, CGI itself is somehow insecure or bad.

That’s just not the case. CGI is a simple protocol that works very well.

It’s not any more or less difficult to write secure CGI programs than it is to write any other kind of HTTP handler code.

The common alternatives to CGI, FastCGI and reverse proxies, aren’t a free lunch and have their own security complications.

The benchmarking server

This time I used an AMD Genoa-based 60 vCPU / 240 GB RAM virtual machine to serve as a reasonable medium-sized machine.

Running benchmarks in VMs isn’t ideal because of noisy neighbor problems and other sources of variable performance.

However, when doing macrobenchmarking, it’s less of a concern and the results are fairly consistent. This is even more true when using a larger VM, where there are fewer neighbors on the host.

Still, I do always prefer bare metal, but sometimes you leave your servers behind for other people to enjoy.

I miss “my” beautiful servers but they’re in good hands and at least I can still post to them and visualize the disk and network IO and CPU usage, which isn’t creepy to do, it’s actually perfectly normal.
I do not miss Nandos in DC and their unrefrigerated sauces!

[image or embed]
— Jake Gold (@jacob.gold) November 12, 2024 at 8:11 PM

Standard benchmarking disclaimer

Benchmarking of any kind is fraught, and it’s easy to make mistakes, which is why there’s no substitute for real-world testing in your own environment.

The CGI programs written in each language are broadly similar but their implementations do vary. Some use well-tested libraries while others do manual parsing and are minor abominations.
The HTTP load testing tool vegeta was used this time for improved accuracy.
Only gohttpd webserver was used this time because getting Apache to stop being the bottleneck proved somewhat difficult.
The updated code and Dockerfiles are on GitHub https://github.com/Jacob2161/cgi-bin

Benchmarking Bash `guestbook-sh.cgi`

No one should ever run a Bash script under CGI. It’s almost impossible to do so securely, and performance is terrible.

But it’s kind of funny to see that it does actually work.

Bash reached just 40 requests per second before saturating all available CPUs.

Requests      [total, rate, throughput]         600, 40.07, 36.34
Duration      [total, attack, wait]             16.509s, 14.975s, 1.534s
Latencies     [min, mean, 50, 90, 95, 99, max]  838.76ms, 1.908s, 1.924s, 2.48s, 2.547s, 2.655s, 2.77s
Bytes In      [total, mean]                     6756600, 11261.00
Bytes Out     [total, mean]                     31200, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:600  
Error Set:

Benchmarking Perl `guestbook-pl.cgi`

Perl shows decent performance for a scripting language, managing 500 requests per second. The latency distribution is quite consistent.

Requests      [total, rate, throughput]         7500, 500.04, 497.25
Duration      [total, attack, wait]             15.083s, 14.999s, 84.166ms
Latencies     [min, mean, 50, 90, 95, 99, max]  72.106ms, 96.842ms, 98.021ms, 102.438ms, 103.292ms, 104.728ms, 113.681ms
Bytes In      [total, mean]                     81585000, 10878.00
Bytes Out     [total, mean]                     390000, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:7500  
Error Set:

Benchmarking JavaScript `guestbook-js.cgi`

JavaScript with Node.js surprised me a lot by performing much better than I would have expected in a CGI environment, hitting 600 requests per second with very consistent latencies.

Requests      [total, rate, throughput]         9000, 600.07, 597.56
Duration      [total, attack, wait]             15.061s, 14.998s, 62.961ms
Latencies     [min, mean, 50, 90, 95, 99, max]  57.999ms, 76.306ms, 76.271ms, 78.824ms, 79.563ms, 80.983ms, 84.569ms
Bytes In      [total, mean]                     96858000, 10762.00
Bytes Out     [total, mean]                     468000, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9000  
Error Set:

Benchmarking Python `guestbook-py.cgi`

Python manages 700 requests per second, which seems respectable.

Requests      [total, rate, throughput]         10500, 700.11, 695.36
Duration      [total, attack, wait]             15.1s, 14.998s, 102.49ms
Latencies     [min, mean, 50, 90, 95, 99, max]  44.186ms, 66.602ms, 62.544ms, 78.77ms, 93.006ms, 142.416ms, 590.895ms
Bytes In      [total, mean]                     113001000, 10762.00
Bytes Out     [total, mean]                     546000, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:10500  
Error Set:

Benchmarking Go `guestbook-go.cgi`

Even though Go is a very fast compiled language, it does have a runtime that must be initialized on startup.

Despite this initialization overhead, Go reached 3,400 requests per second with low latencies, which still places it in the “very fast” tier of languages.

Requests      [total, rate, throughput]         51000, 3399.39, 3396.04
Duration      [total, attack, wait]             15.017s, 15.003s, 14.786ms
Latencies     [min, mean, 50, 90, 95, 99, max]  10.456ms, 21.817ms, 20.458ms, 29.03ms, 33.001ms, 43.833ms, 202.566ms
Bytes In      [total, mean]                     559062000, 10962.00
Bytes Out     [total, mean]                     2652000, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:51000  
Error Set:

Benchmarking Rust `guestbook-rs.cgi`

Rust hits nearly 5,700 requests per second!

The tail latency appears oddly high (probably SQLite database contention?), but the median latency is extremely good.

Requests      [total, rate, throughput]         85493, 5699.52, 5660.27
Duration      [total, attack, wait]             15.104s, 15s, 103.997ms
Latencies     [min, mean, 50, 90, 95, 99, max]  4.35ms, 26.28ms, 15.223ms, 47.883ms, 79.299ms, 186.667ms, 1.444s
Bytes In      [total, mean]                     928624966, 10862.00
Bytes Out     [total, mean]                     4445636, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:85493  
Error Set:

Benchmarking C `guestbook-c.cgi`

C performance is very similar to Rust, just slightly better, which is the natural order of things.

Requests      [total, rate, throughput]         87000, 5799.88, 5750.31
Duration      [total, attack, wait]             15.13s, 15s, 129.309ms
Latencies     [min, mean, 50, 90, 95, 99, max]  3.741ms, 26.052ms, 14.375ms, 47.567ms, 84.977ms, 196.932ms, 1.547s
Bytes In      [total, mean]                     946125000, 10875.00
Bytes Out     [total, mean]                     4524000, 52.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:87000  
Error Set:

My takeaways

It’s clear that CGI is fast enough with compiled languages that it can be used for real work, even if it’s almost never going to be the highest performance option.

It was also very fun to see the relative performance of the different languages play out in the now uncommon environment of CGI.

I love elegant, simple, and powerful technologies like CGI!

CGI is good technology, actually#

The benchmarking server#

Standard benchmarking disclaimer#

Benchmarking Bash guestbook-sh.cgi#

Benchmarking Perl guestbook-pl.cgi#

Benchmarking JavaScript guestbook-js.cgi#

Benchmarking Python guestbook-py.cgi#

Benchmarking Go guestbook-go.cgi#

Benchmarking Rust guestbook-rs.cgi#

Benchmarking C guestbook-c.cgi#

My takeaways#

Links#