I don't know. My understanding is that that highest performance webserver is ngi...

simiones · on Jan 16, 2024

NGINX is a native C implementation, so it has to be carefully written to use the OS's native high-performance IO and native OS threads.

The purpose of project Loom is to abstract that away from Java application code. The runtime can use the most efficient IO for the given platform (ideally io_uring on Linux or IOCP on Windows, for example) even if the application code calls the old blocking File.Write(). The application can then use simple APIs and code patterns, but still get massive performance.

With Loom, you can easily have 20,000 virtual threads servicing 20,000 concurrent HTTP requests and each "blocked" in IO, while only using, say, 100 OS threads that are polling an IOCP. A normal Linux box can typically only handle around maybe 1000 threads across all running processes.

incrudible · on Jan 16, 2024

Servicing 20,000 concurrent requests on a single box where somehow threads are the bottleneck, is that not a problem that approximately no one has?

bitzun · on Jan 16, 2024

Most application webservers (by default) handle one request per thread. For mostly IO bound stuff (which many projects are), it makes sense to me that threads become a bottleneck in relatively ordinary scenarios.

incrudible · on Jan 17, 2024

The scenario where your IO could handle way more than a thousand concurrent requests if only the thread overhead was reduced? When does that ever happen?

simiones · on Jan 17, 2024

Each OS thread costs memory. With the version of Java I have, the default is to allocate 1MB of stack for each thread. So, 10,000 threads would require 10,000 MB of RAM even if we configured ulimit to allow that many threads. In contrast, asking the kernel to do buffered reads of 10,000 files in parallel requires much less memory - especially if most of those are actually the same physical file. Of course, they won't be read fully in parallel.

For example, this program:

  var threads = new Thread[20000];
  for (int i = 0; i < 20000; i++) {
    threads[i] = Thread.ofVirtual().start(() -> {
      try {
        Files.copy(FileSystems.getDefault().getPath("abc.txt"), System.out);
      } catch (IOException e) {
        System.err.println("Error writing file");
        e.printStackTrace();
      }});
    }
  for (int i = 0; i < 20000; i++) {
   threads[i].join();
  }

Run as `java Test > ./cde.txt` takes about 4.5s to run on my WSL2 system with 2 cores, writing a 2 GB file (with abc.txt having 100KB); even this would be within the HTTP timeout, though users would certainly not be happy. Pretty sure a native Linux system on a machine beefy enough to be used as a web server would have no problem serving even larger files over a network like this.

incrudible · on Jan 17, 2024

1. You are not solving a real problem. The use case you describe (basically a CDN) is already exotic, the scenario where such a system would have already been implemented with Java and its basic IO seems implausible.

2. You did not compare against fewer threads to see if threads are actually the bottleneck rather than IO. Also, all your threads are competing for stdout.