It’s literally addressed in his first bullet point…
“ Response: While there is evidence that some tasks that appear emergent under exact match have smoothly improving performance under another metric, I don’t think this rebuts the significance of emergence, since metrics like exact match are what we ultimately want to optimize for many tasks.”
“ Response: While there is evidence that some tasks that appear emergent under exact match have smoothly improving performance under another metric, I don’t think this rebuts the significance of emergence, since metrics like exact match are what we ultimately want to optimize for many tasks.”