He will be up for surprise. HTTP with connection: keep-open can serve 100k req/s...

trishume · on Jan 7, 2023

I agree most HTTP server benchmarks are highly misleading in that way, and mention in my post how disappointed I am at the lack of good benchmarks. I also agree that typical HTTP servers would fall over at much lower new connection loads.

I'm talking about a hypothetical HTTPS server that used optimized kernel-bypass networking. Here's a kernel-bypass HTTP server benchmarked doing 50k new connections per core second while re-using nginx code: https://github.com/F-Stack/f-stack. But I don't know of anyone who's done something similar with HTTPS support.

jck · on Jan 8, 2023

I once built a quick and dirty load testing tool for a public facing service we built. The tool was pretty simple - something like https://github.com/bojand/ghz but with traffic and data patterns closer to what we expected to see in the real world. We used argo-workflows to generate scale.

One thing which we noticed was that there was a considerable difference in performance characteristics based on how we parallelized the load testing tool (multiple threads, multiple processes, multiple kubernetes pods, pods forced to be distributed across nodes).

I think that when you run non-distrubuted load tests you benefit from bunch of cool things which happen with http2 and Linux (multiplexing, resource sharing etc) which might make applications seem much faster than they would be in the real world.

lossolo · on Jan 7, 2023

TLS handling would dominate your performance, kernel bypassing would not help here unless you would also do TLS NIC offloading, you still need to process new TLS sessions from OP example and they would dominate your http processing time (excluding application business logic processing).

sayrer · on Jan 7, 2023

Userspace networking is pretty common. The chair of the IETF even wrote one: https://github.com/NTAP/quant

"Quant uses the warpcore zero-copy userspace UDP/IP stack, which in addition to running on on top of the standard Socket API has support for the netmap fast packet I/O framework, as well as the Particle and RIOT IoT stacks. Quant hence supports traditional POSIX platforms (Linux, MacOS, FreeBSD, etc.) as well as embedded systems."

pixl97 · on Jan 7, 2023

And I would say real life Twitter involves mostly cell phone use where we see companies like Google try to push HTTP/3 to deal with head of line issues on lossy connections. Serving at the millions of hits per day on lossy networks is going to leave you with massive numbers of connections that have been abandoned but you don't know it yet. Or connections that are behaving like they are tar pitted and running at bits per second.

hinkley · on Jan 8, 2023

Vertical scaling doesn't have to be a single machine. You can do a lot with a half dozen machines split for different responsibilities, like we did in the 90's and 00's. Database, web servers, reverse proxy.

ilyt · on Jan 8, 2023

That's so low overhead compared to everything else needed that it is near-irrelevant

saagarjha · on Jan 8, 2023

I don’t believe Twitter ever got around to rolling out HTTP/3 to their clients.

lossolo · on Jan 7, 2023

> Use HTTPS and watch it fall down to only 400 req / sec under load test [ without connection: keep-alive ].

I'm running about 2000 requests/s in one of my real-world production systems. All of the requests are without keep-alive and use TLS. They use about one core for TLS and HTTP processing.

habibur · on Jan 8, 2023

Fascinating. Any special optimization you are using, or is it from off the shelf software and with standard configuration?

kijin · on Jan 8, 2023

Sounds totally off-the-shelf.

I have a basic LAMP server running on a 4-core VM on a laptop. I just threw ApacheBench at it (not the fastest benchmarking tool, either -- it eats up 1 core all by itself), and it handles 1200 req/s TLS with no keepalive, and 3400 req/s with keepalive. This stuff scales linearly with core count, so I wouldn't be surprised to see much higher numbers in real servers.

tpetry · on Jan 8, 2023

Are these all new TLS connections? Because most benchmarks use TLS resumption which means the TLS handshake was only done once!

lossolo · on Jan 8, 2023

In my case, all of TLS connections are new and from real clients (not a benchmark/test)

kijin · on Jan 8, 2023

AFAIK apachebench is so old it doesn't support TLS resumption, but I might be wrong.

brazzledazzle · on Jan 8, 2023

Is AB running on the same machine during the tests?

kijin · on Jan 8, 2023

Same machine, pegging a CPU core the whole time. So the server side only has 3 cores left to use.

lossolo · on Jan 8, 2023

For offloading SSL I use haproxy with some custom settings and a few non standard kernel settings.

brazzledazzle · on Jan 8, 2023

Is this static or dynamic content? Are the simulated load test clients requesting the exact same pages/resources?

lossolo · on Jan 8, 2023

All dynamic content, all hitting data storage. There are no simulated clients, this is all real traffic from real clients, a lot of requests do writes, some do only reads.

JanisErdmanis · on Jan 8, 2023

I guess that the biggest chunk of a slowdown from TLS comes due to group operations alone. So wouldn't it be practical to configure TLS for session resumption and limit the number of handshakes per second it could do?

ilyt · on Jan 8, 2023

using what ? That numbers are on low side even for my old desktop