In a game of perfect information and no randomness, there are ultimately only two kinds of moves: those that preserve your current best forcible outcome (win or draw in chess*), and those that blunder that into a worse result given continued perfect play by the opponent.
Everything else like a positional score or centipawns or even classic material points is an abstraction, that we use to summarize because we don't have unbounded or sufficient computing power to solve all possible continuations. That score apparently going down is only an artifact of our limited ability to evaluate it; the only real scores are 0/½/1 for lose/draw/win. If you make mistakes, your score will evetually drop by those quantizations; we just typically don't know exactly when, except in endgame situations pared down enough to be computationally tractable.
And it's impossible to raise your estimated score, because that estimation assumes you continue to play perfectly. There's no such concept as a better-than-perfect move to raise your expectation over what was already calculated, since that calculation already includes all your best possible moves.
* (Other gradiations between win/lose/draw are possible in such a game. Chess doesn't have such, but imagine playing Go for a dollar per point, where nuances smaller than swinging a win or draw still matter.)
> And it's impossible to raise your estimated score,
No, what I was saying is that it's absolutely possible to raise your estimated score, because your estimated score is only an estimation of who has the best position.
If AlphaZero is better than Stockfish, then by definition it will make moves that sometimes raise its estimated score, because Stockfish is only as good as its ability to estimate the score of a position better. So Stockfish must occasionally underestimate a position, and then later (after another move or two) is forced to reevaluate (because while it's worse, it's not stupid).
AlphaZero wins because, and precisely because, it believes some positions are more favorable than Stockfish does. You can almost see it as an arbitrage between the two estimations. That's what I was finding cool, and the point of my post.
You're right, of course. What we're really talking about is the fallibility of estimations (and arbitraging between them) - you can't raise your score as projected by an omniscient computing power, but you can as estimated by real engines limited by their fallibility (and AlphaZero is less fallible.)
Mostly I'm pointing out that these estimations represent the best guess of an ultimately limited engine. People tend to treat those engine evaluations as actual numbers, like scores in a sport like baseball or some such, but they're not.
There is a way to objectively distinguish moves from the same category though (drawing ones or losing ones). It's similar to Kolmogorov complexity. Let's say moves A and B both draw but the shortest algorithm that draws vs A is much longer then one that draws against B. We can say A is objectively a better move.
In practice instead of formal definition we could use a benchmark engine. What's minimum CPU time/RAM requirements for an engine to draw the resulting positions (or convert them to a win in case of losing moves).
A move that requires serious hardware to defend against is better than one a 10 years old laptop can hold a draw against.
In a world with perfect play this is true, but in the real world there could still be moves that are good assuming imperfect play by the opponent. That's where things get really interesting.
For example I remember the original alphazero model that had been trained specifically against stockfish, would often take a material sacrifice for some advantage that stockfish couldn't see (e.g. sacrifice a pawn, but their bishop gets locked out of the game). I don't know if these moves were objectively good given perfect play, but they could be the only way to win now (chess is very drawish at the top computer level).
Everything else like a positional score or centipawns or even classic material points is an abstraction, that we use to summarize because we don't have unbounded or sufficient computing power to solve all possible continuations. That score apparently going down is only an artifact of our limited ability to evaluate it; the only real scores are 0/½/1 for lose/draw/win. If you make mistakes, your score will evetually drop by those quantizations; we just typically don't know exactly when, except in endgame situations pared down enough to be computationally tractable.
And it's impossible to raise your estimated score, because that estimation assumes you continue to play perfectly. There's no such concept as a better-than-perfect move to raise your expectation over what was already calculated, since that calculation already includes all your best possible moves.
* (Other gradiations between win/lose/draw are possible in such a game. Chess doesn't have such, but imagine playing Go for a dollar per point, where nuances smaller than swinging a win or draw still matter.)