Because AI are good at devolving to the highest score, regardless of test intent. For most problems "ask_hooman", or especially the plural, would be much more effective. So, the degenerate case would dominate and tell you precisely zero about the intelligence of the AI. If a specific "tool" is more adept than the "AI" then "choose tool" will always be the correct answer. But I agree, a tight time constraint would help.