One promising approach is to encode each feature key and feature value as embedding vectors, concatenate them into "feature tokens", then feed them into a Transformer (without positional encodings). This takes advantage of column-order invariance. See:
https://arxiv.org/abs/2403.01841 (ICLR 2024 spotlight)