I'd expect that ordering moves by popularity at each half move index, using say the above dataset, would allow you to select lower indexed values at each step, allowing a nice context based arithmetic compression to really shrink them well.
Yes, in contrast to storing the board (which can be a fixed size), this depends on the length of the game of course.
That said, there are some optimizations you can apply that will compress moves even further (for example the order of the moves as explained in the Lichess blog post is important). In the end it's a tradeoff between (de)serializing the game fast and reducing the size (even more).
https://chess.stackexchange.com/questions/2506/what-is-the-a...
I'd expect that ordering moves by popularity at each half move index, using say the above dataset, would allow you to select lower indexed values at each step, allowing a nice context based arithmetic compression to really shrink them well.