More

krizhanovsky · 2025-12-03T19:11:13 1764789073

Most open-source L7 DDoS mitigation and bot-protection approaches rely on challenges (e.g., CAPTCHA or JavaScript proof-of-work) or static rules based on the User-Agent, Referer, or client geolocation. These techniques are increasingly ineffective, as they are easily bypassed by modern open-source impersonation libraries and paid cloud proxy networks.

We explore a different approach: classifying HTTP client requests in near real time using ClickHouse as the primary analytics backend.

We collect access logs directly from Tempesta FW (https://github.com/tempesta-tech/tempesta), a high-performance open-source hybrid of an HTTP reverse proxy and a firewall. Tempesta FW implements zero-copy per-CPU log shipping into ClickHouse, so the dataset growth rate is limited only by ClickHouse bulk ingestion performance - which is very high.

WebShield (https://github.com/tempesta-tech/webshield/), a small open-source Python daemon:

* periodically executes analytic queries to detect spikes in traffic (requests or bytes per second), response delays, surges in HTTP error codes, and other anomalies;

* upon detecting a spike, classifies the clients and validates the current model;

* if the model is validated, automatically blocks malicious clients by IP, TLS fingerprints, or HTTP fingerprints.

To simplify and accelerate classification — whether automatic or manual — we introduced a new TLS fingerprinting method.

WebShield is a small and simple daemon, yet it is effective against multi-thousand-IP botnets.

krizhanovsky · 2025-10-23T21:14:36 1761254076

It's useful to store a web server access logs in an analytics database, e.g. to fight against bot attacks. We store structured access logs in Clickhouse, which is already good, but compression and data ordering from the post may improve performance even more - we'll try this.

The thing is that a web server, especially under DDoS, may produce much more records than Clickhouse can ingest. But there is good news: for Nginx, if you build a fast pipeline to feed access logs to Clickhouse, you can increase performance, I'd say up to x2, thanks to faster access logging.

krizhanovsky · 2025-10-21T11:52:36 1761047556

uRPF prevents IP spoofing used in volumetric DDoS attacks. However, it seems uRPF is vulnerable to route hijacking on its own

krizhanovsky · 2025-10-15T14:00:32 1760536832

This is quite insightful, thank you.

This particular project, WebShield, is simple and it didn't take too long to develop. Basically, with this project we're trying to figure out what can be built having fingerprints and traffic characteristics in an analytic database. It's seems easy to make PoCs with these features.

For now, if this tool can stop some dummy bots, we'll be happy. We definitely need more development and more sophisticated algorithms to fight against some paid scrapping proxies.

It's more or less simple to classify DDoS bots because they have clear impact - the system performance degrades. For some bots we also can introduce the target, for the bots and the protection system, e.g. the booked slots for a visa appointments. For some scrappers this is harder.

Another opportunity is to dynamically generate classification features and verify resulting models, build web page transition graphs and so on.

This is a good point about possible blocking of ~50% of the Internet. For DDoS we _mitigate_ an attack, not _block_ it, so probably for bots we should do the same - just rate limit them instead of full blocking.

Technically, we can implement verification of client side certificates, but, yes, the main problem of adoption on the client side.

krizhanovsky · 2025-10-15T12:59:10 1760533150

That's a good advice, thank you.

In our approach we do our best to not to affect user experience. E.g. consider an example of a company website with a blog. The company does it's best to engage more audience to their blog, products whatever. I guess quite a part of the audience will be lost due to requirement of authentication on website, which they see first time.

However, for returning, and especially regular, clients I think that is a really simple and good solution.

krizhanovsky · 2025-10-15T12:48:41 1760532521

Hi,

thank you for the reply!

You can read about JA5 at https://tempesta-tech.com/knowledge-base/Traffic-Filtering-b... .

But the thing is that the hashes were just inspired by the work of John Althouse and there is no any relation.

Unfortunately, we didn't realize what "JA" stands for at the time we were designing the feature. We will rename it https://github.com/tempesta-tech/tempesta/issues/2533 .

Sorry for the confusion.

krizhanovsky · on Nov 4, 2020

The benchmarks https://github.com/ncm/computed-goto/blob/master/benchmarks/... benchmark is not applicable to this discussion because it compares _too_ small state machines. I reference my talk and presentation once more: http://www.tempesta-tech.com/research/http_str.pdf - slide 23 discusses that the goto FSM makes sense for _hundreds_ of states.

ncmncm · on Nov 5, 2020

The number of states is irrelevant. An optimized tail call is achieved with, exactly, a single branch instruction.

krizhanovsky · on Nov 2, 2020

Thank you, I'm glad that you enjoyed the article!

Regarding computed and simple goto I'd like to reference our early article discussing the parser in standard goto https://natsys-lab.blogspot.com/2014/11/the-fast-finite-stat... . My recent talk at scale https://www.socallinuxexpo.org/scale/17x/presentations/fast-... (watch videohttps://www.youtube.com/watch?v=LQc4er8ng64&feature=youtu.be... and slides at http://www.tempesta-tech.com/research/http_str.pdf ) discusses the parser with the compiler extensions.

We also found hat Ragel parser generator (using the goto state machine) generated somewhat faster large state machines than Bison (switch-drived state machine). However, as pointed out in the SCALE talk goto works better only on big enough state machines.

krizhanovsky · on Nov 2, 2020

While I addressed safety in a separate section in the article, I wouldn't argue about that: it seems Rust designers made the perfect work in safety. However, C++ is moving in this directly, bu there is the "gap" as it was described in the cited talk from the CppCon 2020.

However, there article is about FAST programming languages. Which means, and I stated this explicitly at the beginning of the article, that the main factor for the article is the speed of the generated code.

This is why I compared the single Rust implementation with 3 C/C++ implementations. The question was: whether Rust does something unreachable for C/C++? And the answer is "NOT". Also please keep in mind that all the benchmark programs, using the same algorithms, are still coded in bit different ways. And the differences impact performance significantly. I analyzed two programs, in Rust and C, to show the differences.

virgilp · on Nov 3, 2020

What is a "FAST" programming language? It can't be "it's theoretically possible to write the absolute fastest code using this" because then the answer to everything must be assembly.

Look e.g. at Rich Hickey's interviews - when building actual systems, the "theoretically fastest implementation" doesn't even matter that much in practice[1] - actual practical C++ code (written by experts!) can be slower than even Lisp code. I've witnessed this first hand, too; you've witnessed it too - in a micro-benchmark, no less - where Rust surprised you by being significantly faster than 3 state-of-the-art C++ compilers. But instead of concluding that Rust (the language) maybe did something terribly good, you conclude "meh, it's not that the compiler is better, it just got lucky in this case".

[1] I know there are a few domains where this matters (e.g. embedded systems). I'm not saying that "Lisp is better than C++" - I'm saying that your conclusions about Rust are... surprising, at least to me.

krizhanovsky · on Nov 2, 2020

I still don't see any misconceptions. The reason why the kernel uses SIMD with FPU save/restore is to optimize context switches. We addressed the topic in https://netdevconf.info/0x12/session.html?kernel-tls-handsha... . I guess early versions of WireGuard used the same approach: save FPU context at the beginning of softirq, process may packets with SIMD in one shot, and restore FPU state.

There are also other issues with the kernel code and I addressed them in the article, why it doesn't makes sense (while still possible) to use C++ in the kernel code.