> To mitigate auto-regressive drift during inference, we corrupt context frames by adding Gaussian noise to encoded frames during training. This allows the network to correct information sampled in previous frames, and we found it to be critical for preserving visual stability over long time periods.
I get this (mostly). But would any kind soul care to elaborate on this? What is this "drift" they are trying to avoid and how does (AFAIU) adding noise help?
I get this (mostly). But would any kind soul care to elaborate on this? What is this "drift" they are trying to avoid and how does (AFAIU) adding noise help?