> When doing HMC MCMC we typically don't start sampling right away (or, more precisely we throw out those samples) because we may be initializing the sampler in a part of the distribution that involves pretty low probability density.
And how that applies to LLMs? Since they don't do MCMC.
And how that applies to LLMs? Since they don't do MCMC.