They say that the dataset is hundreds of Gigs worths of games, so the net must s...

nl · on July 5, 2021

> They say that the dataset is hundreds of Gigs worths of games, so the net must still be really pretty big.

This isn't true. The size of the training data doesn't imply anything about the size of the neural network.

In the case of Stockfish, the NN is quite shallow, and implemented using a custom framework designed to to run fast on CPUs.

See https://news.ycombinator.com/item?id=26746160 for previous commentary on this.

> Though definitely not directly comparable, dataset of GPT2-xl is 8 million web-pages.

This is irrelevant. You can train GPT3 on a smaller dataset, or a smaller model on the same dataset as GPT3.

> What I mean to say is that this is clearly deep learning.

It's been clear that neural network models are superior since Alpha Go. There's not "Deep Learning vs <something else>" anymore because the <something else> isn't competitive and no one is really working on it.

nestorD · on July 5, 2021

Its actually really small, mostly because bigger networks take longer to evaluate which slows down the search making it shallower and ending in a less clever algorithm.

make3 · on July 5, 2021

Are you involved in the project ? Can I ask what your source is? Great if it's the case.

nl · on July 5, 2021

NNUE is a 4 layer (1 input + 3 dense) integer only neural network.

It's just over 82,000 parameters.[1] That's a very shallow, small NN - by comparison something like EfficientNet-B1[2] is 7.8M parameters, and that's considered a small network.

[1] https://www.chessprogramming.org/Stockfish_NNUE#NNUE_Structu...

[2] https://proceedings.mlr.press/v97/tan19a/tan19a.pdf

adgjlsfhk1 · on July 5, 2021

I am involved in lc0 development and fairly aware of SF dev. NNUE is a very small (3 layer dense) cpu only net.

aidenn0 · on July 5, 2021

Size of training set is not enough to make it deep learning, right? Doesn't deep learning imply at least one hidden layer?

make3 · on July 5, 2021

Are you saying you read that it didn't have a hidden layer?

My point is that having such a huge dataset would not be extremely useful without using a deep neural net (of at least one hidden layer)

l33t2328 · on July 5, 2021

NNs without at least one hidden layer are rarely used.

solveit · on July 5, 2021

They're used all the time, we just call it logistic regression.

ummonk · on July 5, 2021

You can have a relatively small model and still benefit from using a gigantic training set to train the model.