Storing a chess *position* in 26 bytes. Storing a game is also interesting. The ...

ThomasCM · on Sept 15, 2023

Surprisingly, storing a game (with all moves) can take less space than encoding a single board. This is because you can effectively encode a move in a single byte (as there are less than 255 moves possible in each position). Applying compression to the resulting binary string will allow you to reduce the space even more.

Check out this great blog post of Lichess for more information: https://lichess.org/blog/Wqa7GiAAAOIpBLoY/developer-update-2...

And shameless plug: Using this encoding, I'm storing millions of games on https://www.chessmonitor.com/ There you can link your Chess.com or Lichess account and view all kind of statistics of your games.

seabass-labrax · on Sept 15, 2023

That is a beautifully designed website. I don't think I've ever used an OAuth authentication flow as smooth as the one that your site uses to access my Lichess credentials (same as my HN name, btw, in case you want to beat me at a correspondence game).

ChessMonitor is practically a work of art!

robertlagrant · on Sept 15, 2023

I created an account just to see that flow, and it did not disappoint! How's it this fast?

ThomasCM · on Sept 15, 2023

Thanks for your feedback!

I guess OAuth is relatively fast as I don't have a middleman (like Auth0) in there. It's just Passport.js.

omoikane · on Sept 15, 2023

ChessMonitor looks very nice, I like how you can link to a particular opening:

https://www.chessmonitor.com/u/XqaFNTHcR61WpiMfOhEY/games?po...

foota · on Sept 16, 2023

Heh, I guess there are board states that are not possible to reach through a valid sequence of moves, but I guess otherwise it's not possible that games are more compressable by definition, since any valid board state could be represented as a sequence of moves.

This does raise the question of the efficiency of reverse engineering a series of minimal moves for some board state.

gowld · on Sept 15, 2023

Are you saying that, to encode a valid position, ignoring the cost to encode both the starting position and the move definitions, you can encode the move sequence using less information than encoding a single position?

ThomasCM · on Sept 15, 2023

Yes, ignoring compression each (half-)move takes up one byte. So if you want to store a sequence of 10 (half-)moves it would take only 10 bytes.

SideQuark · on Sept 15, 2023

Average game length seems to be around 80 half moves though.

https://chess.stackexchange.com/questions/2506/what-is-the-a...

I'd expect that ordering moves by popularity at each half move index, using say the above dataset, would allow you to select lower indexed values at each step, allowing a nice context based arithmetic compression to really shrink them well.

ThomasCM · on Sept 15, 2023

Yes, in contrast to storing the board (which can be a fixed size), this depends on the length of the game of course.

That said, there are some optimizations you can apply that will compress moves even further (for example the order of the moves as explained in the Lichess blog post is important). In the end it's a tradeoff between (de)serializing the game fast and reducing the size (even more).

chipsrafferty · on Sept 16, 2023

It seems, then, if you wanted to efficiently store lots of positions, it would be smart to store a large list of openings, a large list of following sequences, and then your positions can be stored as a list of sequence ids.

esrauch · on Sept 15, 2023

That suggestion seems very reminiscent of the notion that optimally compressing wikipedia is the same problem as what generative LLMs are doing, by virtue of predicting the next letter/word/segment with high precision lets you compress it extremely well.

I think for chess you could get most of the benefit with a relatively naive engine; chessbase has a weak engine built in where you can hit space bar and it does a move which is incredibly useful since there's just one obvious move for a lot of positions anyway; if the move was especially tricky/nonobvious than it's also not what you would want when hitting spacebar to just predictably proceed anyway).

TylerE · on Sept 15, 2023

I suspect it would be hard to beat pgn run through a good compression algorithmn.

Or similarly essentially a binary version of pgn. Probably the optimalist of optimal is a binary specifying the starting square (6 bits), and then a minimal-width for the specified piece number that indexes against a standard set of move offsets.

So, for instance, a knight has (ignoring potential exposed checks, board boundaries, etc, 8 possible moves, so to fully encode a knight move you need 6+3=9 bits. 8 also works for pawns (and annoying due to e.p. 4 doesn't). Bishops, queens, and rooks would need a 4/5 bit field. Encode castling as starting from the rook as they have 'spare' moves in their bit set, and kings don't. Encode the end state at the begining.

This is going to use 2 bits for the end state, and then either 9, 10, or 11 bits per move.

thaumasiotes · on Sept 16, 2023

> Bishops, queens, and rooks would need a 4/5 bit field.

Ignoring board boundaries, a queen has 56 possible moves requiring 6 bits. At any given position on the board, most of those aren't possible because the board is too small, but cramming that into 5 bits will make the encoding much more annoying.

Same thing goes for rooks; ignoring board boundaries there are 28 possible moves, but including the board boundaries there are 14. You can fit that in four bits, but you pick up some context sensitivity.

> 8 also works for pawns (and annoying due to e.p. 4 doesn't)

I don't see the problem? A pawn can move forward two spaces, it can move forward one space, it can capture diagonally to the left, or it can capture diagonally to the right. Those are the only possibilities and they fit into two bits.

En passant enables a pawn to capture a piece that isn't located on the space being captured, and you need to know the state of the board on the previous turn (or, equivalently, what the previous move was) in order to know whether en passant is a legal move... but to encode that it happened, you don't need anything you didn't already have.

TylerE · on Sept 16, 2023

Hours forgetting pawns capturing en passant.

thaumasiotes · on Sept 16, 2023

What?

thechao · on Sept 15, 2023

You only need 5b to encode the piece to move: you know whose turn it is. Also, as the game progresses, you can reindex to reduce the number of bits to define which piece is moving.

thaumasiotes · on Sept 16, 2023

> You only need 5b to encode the piece to move: you know whose turn it is.

If you want to encode the piece (rather than the square), and you're comfortable not counting "whose turn is it?" against the information requirement, you don't need 5 bits. Each player has only 16 pieces, so you can give them all four-bit names.

You won't know what kind of piece they are, though.

robertlagrant · on Sept 15, 2023

> I suspect it would be hard to beat pgn run through a good compression algorithmn.

I once[0] tried to spread this message :)

[0] https://stackoverflow.com/a/1831841/61938

TylerE · on Sept 15, 2023

A standard archiver like 7zip is probably a loss due to the metadata/format headers etc. When your plain text is only a few hundred bytes to start with...

I suspect the winner would be something like an LZMA derivative with a fixed dictionary. I doubt not using an adaptive encoder would be a big loss as PGN (exlcuding the metadata) is quite far from random bytes.

dang · on Sept 16, 2023

Ok, we've positioned the title above. Thanks!

kurinikku · on Sept 16, 2023

Thank you!

zeagle · on Sept 15, 2023

Finally, a good use case for blockchain technology!

r3trohack3r · on Sept 15, 2023

I can't tell if this is in jest or not but blockchains are prolific in software.

Nearly every company uses them to prove the authenticity of deployments in production at one or more layers.

l33t7332273 · on Sept 15, 2023

I feel like signatures and maybe even Merkle trees are used, but not blockchain.

episteme · on Sept 15, 2023

Nearly every company, what is your source for that? I've never even heard of this before.

zeagle · on Sept 16, 2023

It was in jest because blockchain is a solution looking for a problem, but as others have pointed out what you say isn't really the case. I'm glad it generated some discussion. Hashing previous steps for verification isn't blockchain.

flangola7 · on Sept 16, 2023

No, no they do not.

r3trohack3r · on Sept 16, 2023

IIUC a blockchain is a structure where each node has a content addressable hash that includes the hash of the previous node in the chain. If you change any historical node, every subsequent hash is updated.

This structure is used inside your .git directory, your docker manifest, etc.

Blockchains are incredibly useful structures.

rcxdude · on Sept 16, 2023

That's a Merkle tree. Blockchains are a Merkle tree plus some form of consensus algorithm, often a trustless one. It's the usefulness of the consensus algorithm which is usually called into question by critics (generally because it's often expensive and cannot reference anything outside the system).

r3trohack3r · on Sept 16, 2023

IIUC a merkle tree is a data structure where the leaf nodes are the hashes of data. The inner nodes of the tree are hashes of the child hashes up to a root hash.

Merkle trees are helpful for quickly validating the integrity of a chunked file. Both IPFS and BitTorrent use merkle trees to validate files.

I do not understand how git could represent its history using a merkle tree.

nextaccountic · on Sept 17, 2023

git commits themselves are merkle trees (or more like, DAGs) that contain filesystem trees, a set of parent commits, and metadata

each parent commit in turn contains their own parents etc, until you reach an initial commit

the funny thing here is that parent commits don't have references to children, it's children that have references to parents (like the union find data structure) so the relationship here is inverted. that's because git objects are immutable

JambalayaJim · on Sept 16, 2023

In context blockchain is referring to a distributed set of hash structures + a consensus algorithm