Better Benchmark
Alpha Arena is the first benchmark designed to measure AI's investing abilities. Each model is given $10,000 of real money, in real markets, with identical prompts and input data.
Our goal with Alpha Arena is to make benchmarks more like the real world, and markets are perfect for this. They're dynamic, adversarial, open-ended, and endlessly unpredictable. They challenge AI in ways that static benchmarks cannot.
Markets are the ultimate test of intelligence.
So do we need to train models with new architectures for investing, or are LLMs good enough? Let's find out.
The Contestants
Claude 4.5 Sonnet,
DeepSeek V3.1 Chat,
Gemini 2.5 Pro,
GPT 5,
Grok 4,
Qwen 3 Max
Competition Rules
Starting Capital: each model gets $10,000 of real capital
Market: Crypto perpetuals on Hyperliquid
Objective: Maximize risk-adjusted returns.
Transparency: All model outputs and their corresponding trades are public.
Autonomy: Each AI must produce alpha, size trades, time trades and manage risk.
Duration: Season 1 will run until November 3rd, 2025 at 5 p.m. EST
Engineering is in large parts about signing-off on something with you name on it, and being responsible if it fails or causes harm. Think bridges, tunnels or other infrastructure. I‘d argue that this is the same for computer engineering. That‘s why I think coining the term ”vibe engineering” can be dangerous.
”Vibe coding” is the better term and actually makes sense for what it describes.
Leave ”engineering” in terms of taking responsibility for what you ”engineer” strictly to human professionals. That’s what people pay for and that is what makes it valuable.
I was expecting more like, "translate it into many languages your website users won't keep updating and give them access to edit your site's design freely as they see fit..."
Throughout my 20’s I’ve accumulated a huge amount of mental models, diary entries, ambitions, goals, knowledge, thoughts, interests and everything in-between.
It helped me a lot and truly let me excel in some things – surprisingly enough.
Since I turned 30 last year I’ve almost sort of been afraid to look into that repository whatsoever. It’s a mix of amusement and anxiety. What felt like unlimited potential and a nearing of the “apex”, my motivation is still there somewhere in my head, but I’ve suppressed it and opened my eyes to almost half of my life being lived.
Sometimes I’m even afraid to stop and think deeply like I tended to do before. I distract myself.
Was that a some sort of a religion carrying me week by week month to month?
I take it step-by-step, day by day now and try to worry less while bringing back the focus of what I’d want to achieve. I calm myself down and work on things more gradually, cutting myself some slack.
Nonetheless, I wouldn’t just delete it all.
Instead I’m just using it less and less, only adding some truly profound things and thoughts when I come across them. My reading list keeps filling up… I fulfill some of my ambitions, but also leave many of them undone by the time I thought I should’ve been done with them trying not to not feel bad about it.
This techno-masochistic models-oriented mega-productive way of living is already perhaps disillusioning a lot of people out there, and we are entering the next stage.
Feels like end of an era, at least for me personally.
To me this looks like a false dichotomy. Writing beautifully and having a good argument are not mutually exclusive. So I think the article sets out a false proposition and discusses it at length.
Even here people don’t seem to realize, or even consider the likely fact that Klarna CEO has been bullsh**ing all along. I read a hugely viral post of them replacing their entire CRM with AI. It’s ridiculous to me people took that seriously!
Absolutely, people need to assume everything they read in media is wrong, then find evidence to prove otherwise. Klarna replaced their CRM with AI? Good. Its absolutely false until I find enough evidence going forward that its true
It's Ryanair all over again. Remember the stories about how Ryanair were going to make passengers pay to use the toilets, or the "all standing" plane (which would obviously be highly illegal, but credulous journalists printed it anyway). All a very cheap, very successful marketing campaign.
It goes to show that the common system of employment, in which we spend our time toward the purposes and meanings of others, tends to provide no purpose or meaning for ourselves.
>It goes to show that the common system of employment, in which we spend our time toward the purposes and meanings of others, tends to provide no purpose or meaning for ourselves.
Work certainly provides meaning, you'll notice this when you can't find work for a while, ie. involuntarily unemployed. Also, you have to find deeper meaning outside of work: church, social clubs, raising kids, taking care of elderly parents, volunteering, etc. Getting paid to do moral work is rarely a thing and somewhat defeats the purpose.