I didn’t see it linked, but a paper by my coworker at Netflix, Michael Lindon, seems relevant: Anytime-Valid Inference for Multinomial Count Data. https://arxiv.org/abs/2011.03567
It becomes a regression problem. You use ML to estimate the conversion rate for each item. You still have to do feature engineering, but fast regressors like XGBoost make this much easier. https://improve.ai is one such scorer/ranker built on XGBoost.