The original transformer, from the "attention is all you need paper" had an enco...

The original transformer, from the "attention is all you need paper" had an encoder-decoder architecture because it was designed for language translation use where you are mapping one sequence to another, and are able to use both the preceding and following context of words when performing this mapping. The encoder utilizes the forward context.

In contrast to seq-2-seq use, for generative language models such as ChatGPT you only have access to preceding (not forward) context in order to decide what to generate next, so the encoder part of the architecture is not applicable and a decoder-only transformer is used.