* Yes, I clearly know what tcpdump is / how to capture network traffic
* It has been several years since I have looked at a pcap
* I don't have wireshark installed on this computer
* I've done the thing where you decrypt TLS with wireshark exactly once, years ago, and I found it frustrating for reasons I can't remember[1]. Wasn't sure if I could do this with ssh
* When I started investigating this, I didn't remotely think that ssh was the root cause. I thought it was a quirk of my game
* I *did* make a client that printed out all the data it was receiving, but it was useless because it was operating at the wrong layer (e.g. it connected over SSH and logged the bytes SSH handed it)
* I'm experimenting with Claude Code a lot because it has a lot of hype and I would like to form an opinion
* Looking up flags is annoying
* Being able to tell an agent "look at this pcap and tell me what you see" is *cool*
So idk. I'm sure that you would have solved this much more quickly than I did! I'm not sure that (for me) opening up the packet in Wireshark would have solved this faster. Maybe reading the SSH spec would have, but debugging also just didn't take that long.
And the big leap here was realizing that this was my SSH client and not a quirk of my game. The time at which I would have read the SSH spec was after I captured traffic from a regular SSH session and observed the same pattern; before that I was thinking about the problem wrong.
I don't think that this is unfortunate. In fact, I think I got what I wanted here (a better sense of Claude Code's strengths and weaknesses). You're right that an alternative approach would have taught me different things, and that's a worthy goal too.
[1] I suspect this is because I was doing it for an old job and I had to figure out how to run some application with keys I controlled? It would have been easier here. I don't remember.
Thanks for taking the time to respond, and apologies for the contentiousness. I'm a jaded old man suffering from severe LLM fatigue, so I may have come off a bit harsh. Your write-up was a good read, and while I might be critical of your methodology, what you did clearly worked, and that's what matters in the end. Best of luck with your project, especially the go lib fork.
> Or you could use anycasting to terminate SSH sessions on the moral equivalent of one of a number of geography based reverse proxies and then forward the packet over an internal network to the app server over a link tuned for low latency.
I've been thinking about some stuff like this! Not being able to put my game behind Cloudflare[1] is a bummer. Substantial architectural overhead though.
> The idea of letting Claude loose on my crypto[graphy] implementation is about the most frightening thing I've heard of in a while [though libnss is so craptastic, I can't see how it would hurt in that case.]
I hear you, but FWIW the patch I was reverting was trivial (and it's also in the go crypto library, which is pretty easy to read). It's a couple-of-line change[2], and Claude did almost exactly what I would have done (I was tired and would have forgotten to shrink the handshake payload).
[1] This isn't strictly true, Cloudflare spectrum exists, but its pricing is an insane $1/GB last I checked.
Nice, but shouldn't the behaviour change be behind a config setting? And it's not clear what the intent of the change is. Implementing PING/PONG seems different from what you said you were trying to do. And it's section 1.8 of the OpenSSH [PROTOCOL] reference, not section 1.9.
But... before you think I'm trying to be negative... good on you. I wish you well. Getting crypto/security code into open source projects can be a slog as people frequently come out of the woodwork, so don't get discouraged.
And the more I think about this... there's plenty of examples out there about doing HTTP based reverse proxying, but essentially zero for SSH proxying, so if you do that, it would make a great blog post.
And of course it totally doesn't work if the client doesn't have JavaScript at all. I read the HN front-page through an AI summary and it also got censored when it scraped the article.
Yes! While this post was written entirely by me, I wouldn't be surprised if I had "smoking gun" ready to go because I spent so much time debugging with Claude last night.
Serious question though, since AI seems to be so all capable and intelligent. Why wouldn't it be able to tell you the exact reason that I could tell you just by reading the title of this post on HN? It is failing even at the one thing it could probably do decently, is being a search engine.
Oh wow - I've never heard of TCP_CORK before. Without disabling pings I'd still pay the cost of receiving way more packets, but maybe that'd be tolerable if I didn't have to send so many pongs. This is super handy; excited to play around with it.
I am aware of TCP_NODELAY (funny enough I recently posted about TCP_NODELAY to HN[1] when I was thinking about it for the same game that I wrote about here). But I think the latency hit from disabling it just doesn't work for me.
I missed that thread originally, the post and the comments where a good read, thank you for sharing.
I got a kick out of this comment [0]. "BenjiWiebe" made a comment about the SSH packets you stumbled across in that thread. Obviously making the connection between what you were seeing in your game and this random off-hand comment would be insane (if you had seen the comment at all), but I got a smile out of it.
But no, the python output is correct (although I do round the values). It's counterintuitive but these are two different questions:
1. What are the odds that both players lie? (4%)
2. Given that both players say tails, what are the odds that the coin is heads (~6%)
Trivially, the answer for question (1) is 0.2 * 0.2 = 4%
The answer for question (2) is 0.02 / 0.34 = 6%
One way of expressing this is Bayes Rule: we want P(both say tails | coin is heads):
* we can compute this as (P(coin is heads | both say tails) * P(coin is heads)) / P(both say tails)
* P(coin is heads | both say tails) = 0.04 (both must lie)
* P(coin is heads) = 0.5
* P(both say tails) = 0.04 * 0.5 + 0.64 * 0.5 = 0.34
This gives us (0.04 * 0.5) / 0.34 = 0.02 / 0.34 ~= 6%
I think that might not be convincing to you, so we can also just look at the results for a hypothetical simulation with 2000 flips:
* of those 2000 flips, 1000 are tails
* 640 times both players tell the truth
* 40 times both players lie
* 680 times (640 + 40) both players *agree*
* 320 times the players disagree
We're talking about "the number of times they lie divided by the number of times that they agree"
40 / 680 ~= 6%
We go from 4% to 6% because the denominator changes. For the "how often do they both lie" case, our denominator is "all of our coin flips." For the "given that they both said tails, what are the odds that the coin is heads" case, our denominator is "all of the cases where they agreed" - a substantially smaller denominator!
The three players example is just me rounding 89.6% to 90% to make the output shorter (all examples are rounded to two digits, otherwise I found that the output was too large to fit on many screens without horizontal scrolling).
Ah! Right after submitting 1000 Blank White Cards to HN I thought of Finchley Central[1] which (I think) is the game that Mornington Crescent comes from. I learned about Finchley Central a few years ago while reading about the history of The Game[2] (sorry).
I've had ideas off and on for the last 2 years about how to translate Finchley Central into something you could play over the internet with strangers but I've never quite figured out how to make it work; I think a key aspect of these games is having shared context with your friends and trying to make them laugh.
Anyway, fun that your mind went to the same place!
I think there's an annoying thing where by saying "hey, here's this neat problem, what's the answer" I've made you much more likely to actually get the answer!
What I really wanted to do was transfer the experience of writing a simulation for a related problem, observing this result, assuming I had a bug in my code, and then being delighted when I did the math. But unfortunately I don't know how to transfer that experience over the internet :(
(to be clear, I'm totally happy you wrote out the probabilities and got it right! Just expressing something I was thinking about back when I wrote this blog)
I, erroneously, thought that "when Alice and Bob agree there's a 96% chance of them being correct, then surely you can leverage this to get above the 80% chance. What if we trust them both when they agree and trust Alice when they disagree?" Did some (erroneous) napkin math and went to write a simulation.
As I was writing the simulation I realized my error. I finished the simulation anyway, just because, and it has the expected 80% result on both of them.
My error: when we trust "both" we're also trusting Alice, which means that my case was exactly the same as just trusting Alice.
PS as I was writing the simulation I did a small sanity test of 9 rolls: I rolled heads 9 times in a row (so I tried it again with 100 million and it was a ~50-50 split). There goes my chance of winning the lottery!
My thinking was:
So idk. I'm sure that you would have solved this much more quickly than I did! I'm not sure that (for me) opening up the packet in Wireshark would have solved this faster. Maybe reading the SSH spec would have, but debugging also just didn't take that long.And the big leap here was realizing that this was my SSH client and not a quirk of my game. The time at which I would have read the SSH spec was after I captured traffic from a regular SSH session and observed the same pattern; before that I was thinking about the problem wrong.
I don't think that this is unfortunate. In fact, I think I got what I wanted here (a better sense of Claude Code's strengths and weaknesses). You're right that an alternative approach would have taught me different things, and that's a worthy goal too.
[1] I suspect this is because I was doing it for an old job and I had to figure out how to run some application with keys I controlled? It would have been easier here. I don't remember.
reply