The importance of having good metrics cannot be overstated.
On the "applying X" problem - this almost feels to me like another argument against fine tuning? Because it seems like Applying can be a surprisingly broad skill, and frontier lab AIs are getting good at Applying in a broad fashion.
On the "applying X" problem - this almost feels to me like another argument against fine tuning? Because it seems like Applying can be a surprisingly broad skill, and frontier lab AIs are getting good at Applying in a broad fashion.