Yes. All of them are poisoned metrics, just in different ways.
GPT-4o's endless sycophancy was great for retention, GPT-5's style of ending every response in a question is great for engagement.
Are those desirable traits though? Doubt it. They look like simple tricks and reek of reward hacking - and A/B testing rewards them indeed. Direct optimization is even worse. Combining the two is ruinous.
Mind, I'm not saying that those metrics are useless. Radioactive materials aren't useless. You just got to keep their unpleasant properties in mind at all times - or suffer the consequences.
GPT-4o's endless sycophancy was great for retention, GPT-5's style of ending every response in a question is great for engagement.
Are those desirable traits though? Doubt it. They look like simple tricks and reek of reward hacking - and A/B testing rewards them indeed. Direct optimization is even worse. Combining the two is ruinous.
Mind, I'm not saying that those metrics are useless. Radioactive materials aren't useless. You just got to keep their unpleasant properties in mind at all times - or suffer the consequences.