Nit: the author says that supervised fine tuning is a type of RL, but it is not. RL is about delayed reward. Supervised fine tuning is not in any way about delayed reward.
RL is about getting numerical feedback of outputs, in contrast to supervised learning where there are examples of what the output should be. There are many RL problems with no delayed rewards, e.g. multi-armed bandits.
Well they can be used together in some contexts so while they are different, you could also say RL can help Supervised Fine Tuning for further optimization