The inherently chaotic nature of system makes stable results very difficult. Com...

The inherently chaotic nature of system makes stable results very difficult. Combine that with the non deterministic nature of all the major production models. Then you have the fact that new models are coming out every few months, and we have no objective metrics for measuring software quality.

Oh and benchmarks for functional performance measurement tend to leak into training data.

Put all those together and I’d bet half of my retirement accounts that the we’re still in the reading chicken entrails phase 20 years from now.