> Though definitely not directly comparable, dataset of GPT2-xl is 8 million web-pages.
This is irrelevant. You can train GPT3 on a smaller dataset, or a smaller model on the same dataset as GPT3.
> What I mean to say is that this is clearly deep learning.
It's been clear that neural network models are superior since Alpha Go. There's not "Deep Learning vs <something else>" anymore because the <something else> isn't competitive and no one is really working on it.
Its actually really small, mostly because bigger networks take longer to evaluate which slows down the search making it shallower and ending in a less clever algorithm.
NNUE is a 4 layer (1 input + 3 dense) integer only neural network.
It's just over 82,000 parameters.[1]
That's a very shallow, small NN - by comparison something like EfficientNet-B1[2] is 7.8M parameters, and that's considered a small network.
Though definitely not directly comparable, dataset of GPT2-xl is 8 million web-pages. What I mean to say is that this is clearly deep learning.