Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That quote was intended to mean --

"artificial" maybe I should have said "synthetic"? I mean the computer can teach itself.

"constrained" the game has rules that can be evaluated

and as to the other -- I don't know what to tell you, I don't think anything I said is inconsistent with the below quotes.

It's clearly not just a generic LLM, and it's only possible to generate a billion training examples for it to play against itself because synthetic data is valid. And synthetic data contains training examples no human has ever done, which is why it's not at all surprising it did stuff humans never would try. A LLM would just try patterns that, at best, are published in human-generated go game histories or synthesized from them. I think this inherently limits the amount of exploration it can do of the game space, and similarly would be much less likely to generate novel moves.

https://en.wikipedia.org/wiki/AlphaGo

> As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network", both implemented using deep neural network technology.[5][4] A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks.[4] The networks are convolutional neural networks with 12 layers, trained by reinforcement learning.[4]

> The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves.[21] Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play.[5] To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the match against Lee, the resignation threshold was set to 20%.[64]



Of course, not an LLM. I was just referring to AI technology in general. And that goal functions can be complicated and not-obvious even for a game world with known rules and outcomes.

I was miss-remembering the order of how things happened.

AlphaZero, another iteration after the famous matches, was trained without human data.

"AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version without human data and stronger than any previous human-champion-defeating version.[52] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[53]"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: