Paper seems interesting but I don't like the question title. I think the answer to the question would just be that tabular data is not fully in the "big data" regime yet so there is no reason a priori to expect deep NNs to do better. Factor in computational simplicity of tree-based models and I think the deck is stacked against deep learning from the start.
I've worked on models trained on ultra-large tabular data. It still took substantial effort to beat tree models (custom architecture specifically for this particular domain, something I haven't seen elsewhere out in the open).
When tabular data is mentioned, one of the unspoken applications is finance. There, my guess is that one of the issues is that data is not very IID and thus latent "events" are fairly sparse. Combine that with the humongous amount of raw data, and you get models that overfit.
I think there are certain types of tabular data that lend themselves naturally to tree models. But when you're talking about tabular data for finance I guarantee you very few hedge funds are running tree models for trading strategies. When your scale of data is the past X quarters of all stock prices and trade volumes you have enough data that you can fit an NN and there are a number of techniques you can use to reduce overfitting (large amount of data, good regularization, dropout, etc.)
> But when you're talking about tabular data for finance I guarantee you very few hedge funds are running tree models for trading strategies
What do you base this on? Having only neural nets on tabular data is mostly done due to laziness of the creator since neural nets are much easier to use, not because neural nets perform better even with large amounts of data. In general you want both since they are good at finding different kinds of patterns.
Do you know of any (families of) examples of tabular datasets of any size (you can choose what "big" means) where deep learning convincingly outperforms traditional methods? I would love some quality examples of this nature to use in my teaching.
Regression targets where extrapolation may be needed. Decision tree methods cannot extrapolate, the predictions are have to be a mean of a subgroup of the data.
Consider: Predicting how much a customer might pay by end of month, with information we have at the start of the month.
In this example, if a customer had a record $10m of open invoices due by EoM and the largest payment amount received in prior months of $5m, the decision tree cannot possibly predict the payment amount will be ~$10m, even when the best feature indicates the payment will be $10m.
There are some hacks/techniques which can maybe reduce this issue, but they don't always work.
What? Can you explain the mechanism than a NN can “extrapolate” an invoice where a tree model couldn’t? This is all just how the modeler builds the features.
Also all models are a “mean of the subgroup of the data.” The prediction is by definition the conditional mean as a function of the input values.
Recommendation engines: search, feeds (tiktok / youtube shorts / etc), ads, netflix suggestions, doordash suggestions, etc etc. Also happens to be my specialty.
I worked with search and ads model at Google, for most things tree models were better. What evidence do you have that neural nets are better there? I worked with large parts of Google search ranking so I know what I'm talking about, some parts you want a neural net but most of the work is done by tree models and similar, they both perform better and run faster.
I'm not sure that is true. I think inference speed is often the bottleneck for the use cases stated, as is the need for frequent re-training. As a result algorithms like catboost are very popular in those domains. I think catboost was actually invented by Yandex.
PS: Its weird that you are being down-voted. I think your opinion is reasonable.
Inference speed: more sophisticated stacks use multiple stages. Early stage might be a sublinear vector search, and the heavy hitting neural nets only rerank the remainder. Bytedance has a paper on their fairly fancy sublinear approach.
Retraining - online training solves this for the most part.
Frameworks - the only battle-tested batteries-included one I've seen is Vespa. Noone else publishes any of interesting bits. KDD is the most relevant conference if you're interested in the field. IIRC Xiaohongshu has some papers that can only really be done with NNs.