Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Storing a chess position in 26 bytes.

Storing a game is also interesting. The number of legal moves varies depending on the position. You could try to define a variable length encoding by giving more likely moves a shorter encoding, but the ordering would need to be deterministic so it could be decoded (running Stockfish for a second isn't).



Surprisingly, storing a game (with all moves) can take less space than encoding a single board. This is because you can effectively encode a move in a single byte (as there are less than 255 moves possible in each position). Applying compression to the resulting binary string will allow you to reduce the space even more.

Check out this great blog post of Lichess for more information: https://lichess.org/blog/Wqa7GiAAAOIpBLoY/developer-update-2...

And shameless plug: Using this encoding, I'm storing millions of games on https://www.chessmonitor.com/ There you can link your Chess.com or Lichess account and view all kind of statistics of your games.


That is a beautifully designed website. I don't think I've ever used an OAuth authentication flow as smooth as the one that your site uses to access my Lichess credentials (same as my HN name, btw, in case you want to beat me at a correspondence game).

ChessMonitor is practically a work of art!


I created an account just to see that flow, and it did not disappoint! How's it this fast?


Thanks for your feedback!

I guess OAuth is relatively fast as I don't have a middleman (like Auth0) in there. It's just Passport.js.


ChessMonitor looks very nice, I like how you can link to a particular opening:

https://www.chessmonitor.com/u/XqaFNTHcR61WpiMfOhEY/games?po...


Heh, I guess there are board states that are not possible to reach through a valid sequence of moves, but I guess otherwise it's not possible that games are more compressable by definition, since any valid board state could be represented as a sequence of moves.

This does raise the question of the efficiency of reverse engineering a series of minimal moves for some board state.


Are you saying that, to encode a valid position, ignoring the cost to encode both the starting position and the move definitions, you can encode the move sequence using less information than encoding a single position?


Yes, ignoring compression each (half-)move takes up one byte. So if you want to store a sequence of 10 (half-)moves it would take only 10 bytes.


Average game length seems to be around 80 half moves though.

https://chess.stackexchange.com/questions/2506/what-is-the-a...

I'd expect that ordering moves by popularity at each half move index, using say the above dataset, would allow you to select lower indexed values at each step, allowing a nice context based arithmetic compression to really shrink them well.


Yes, in contrast to storing the board (which can be a fixed size), this depends on the length of the game of course.

That said, there are some optimizations you can apply that will compress moves even further (for example the order of the moves as explained in the Lichess blog post is important). In the end it's a tradeoff between (de)serializing the game fast and reducing the size (even more).


It seems, then, if you wanted to efficiently store lots of positions, it would be smart to store a large list of openings, a large list of following sequences, and then your positions can be stored as a list of sequence ids.


That suggestion seems very reminiscent of the notion that optimally compressing wikipedia is the same problem as what generative LLMs are doing, by virtue of predicting the next letter/word/segment with high precision lets you compress it extremely well.

I think for chess you could get most of the benefit with a relatively naive engine; chessbase has a weak engine built in where you can hit space bar and it does a move which is incredibly useful since there's just one obvious move for a lot of positions anyway; if the move was especially tricky/nonobvious than it's also not what you would want when hitting spacebar to just predictably proceed anyway).


I suspect it would be hard to beat pgn run through a good compression algorithmn.

Or similarly essentially a binary version of pgn. Probably the optimalist of optimal is a binary specifying the starting square (6 bits), and then a minimal-width for the specified piece number that indexes against a standard set of move offsets.

So, for instance, a knight has (ignoring potential exposed checks, board boundaries, etc, 8 possible moves, so to fully encode a knight move you need 6+3=9 bits. 8 also works for pawns (and annoying due to e.p. 4 doesn't). Bishops, queens, and rooks would need a 4/5 bit field. Encode castling as starting from the rook as they have 'spare' moves in their bit set, and kings don't. Encode the end state at the begining.

This is going to use 2 bits for the end state, and then either 9, 10, or 11 bits per move.


> Bishops, queens, and rooks would need a 4/5 bit field.

Ignoring board boundaries, a queen has 56 possible moves requiring 6 bits. At any given position on the board, most of those aren't possible because the board is too small, but cramming that into 5 bits will make the encoding much more annoying.

Same thing goes for rooks; ignoring board boundaries there are 28 possible moves, but including the board boundaries there are 14. You can fit that in four bits, but you pick up some context sensitivity.

> 8 also works for pawns (and annoying due to e.p. 4 doesn't)

I don't see the problem? A pawn can move forward two spaces, it can move forward one space, it can capture diagonally to the left, or it can capture diagonally to the right. Those are the only possibilities and they fit into two bits.

En passant enables a pawn to capture a piece that isn't located on the space being captured, and you need to know the state of the board on the previous turn (or, equivalently, what the previous move was) in order to know whether en passant is a legal move... but to encode that it happened, you don't need anything you didn't already have.


Hours forgetting pawns capturing en passant.


What?


You only need 5b to encode the piece to move: you know whose turn it is. Also, as the game progresses, you can reindex to reduce the number of bits to define which piece is moving.


> You only need 5b to encode the piece to move: you know whose turn it is.

If you want to encode the piece (rather than the square), and you're comfortable not counting "whose turn is it?" against the information requirement, you don't need 5 bits. Each player has only 16 pieces, so you can give them all four-bit names.

You won't know what kind of piece they are, though.


> I suspect it would be hard to beat pgn run through a good compression algorithmn.

I once[0] tried to spread this message :)

[0] https://stackoverflow.com/a/1831841/61938


A standard archiver like 7zip is probably a loss due to the metadata/format headers etc. When your plain text is only a few hundred bytes to start with...

I suspect the winner would be something like an LZMA derivative with a fixed dictionary. I doubt not using an adaptive encoder would be a big loss as PGN (exlcuding the metadata) is quite far from random bytes.


Ok, we've positioned the title above. Thanks!


Thank you!


Finally, a good use case for blockchain technology!


I can't tell if this is in jest or not but blockchains are prolific in software.

Nearly every company uses them to prove the authenticity of deployments in production at one or more layers.


I feel like signatures and maybe even Merkle trees are used, but not blockchain.


Nearly every company, what is your source for that? I've never even heard of this before.


It was in jest because blockchain is a solution looking for a problem, but as others have pointed out what you say isn't really the case. I'm glad it generated some discussion. Hashing previous steps for verification isn't blockchain.


No, no they do not.


IIUC a blockchain is a structure where each node has a content addressable hash that includes the hash of the previous node in the chain. If you change any historical node, every subsequent hash is updated.

This structure is used inside your .git directory, your docker manifest, etc.

Blockchains are incredibly useful structures.


That's a Merkle tree. Blockchains are a Merkle tree plus some form of consensus algorithm, often a trustless one. It's the usefulness of the consensus algorithm which is usually called into question by critics (generally because it's often expensive and cannot reference anything outside the system).


IIUC a merkle tree is a data structure where the leaf nodes are the hashes of data. The inner nodes of the tree are hashes of the child hashes up to a root hash.

Merkle trees are helpful for quickly validating the integrity of a chunked file. Both IPFS and BitTorrent use merkle trees to validate files.

I do not understand how git could represent its history using a merkle tree.


git commits themselves are merkle trees (or more like, DAGs) that contain filesystem trees, a set of parent commits, and metadata

each parent commit in turn contains their own parents etc, until you reach an initial commit

the funny thing here is that parent commits don't have references to children, it's children that have references to parents (like the union find data structure) so the relationship here is inverted. that's because git objects are immutable


In context blockchain is referring to a distributed set of hash structures + a consensus algorithm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: