We can't know for every possible program if it halts or not, but the complexity of programs we can determine is increasing as tools and techniques get better
That clearly may be doing some heavy lifting. It is assumed that trivial answer wasn’t what was intended for the problem, but unless someone asked Erdos, I don’t think we know.
Considering that he did some work towards the problem, tackling non-trivial cases, I think we do know. There's no way he wouldn't have perceived trivial solutions at some point.
You missed a whole section - a person creates a Lean formalization of #1 and Lean promptly says the AI proof is wrong because it doesn’t prove that formal problem statement.
The question is in the person (or AI) creating the formal problem statement - how do you know it represents the problem the proof is supposed to be for? And the answer is for people in the field, in this case, formalizing the problem and verifying the formalization is easy. It is like generating an public key versus factoring it.
I thought that's what I was trying to express between lines 1 and 2 above, but I may have failed to get it across. my understanding is that the danger is that the llm will create a proof that is correct but isn't about what the person thinks he's proving?
Compiling isn’t sufficient because it doesn’t tell you if the program matches the specification. A program that always says the temperature is 80 F will compile but is a terrible solution to what is the temperature outside at this location right now.
And I would say there is no way to ask that question in good faith. (Tedious proof by cases left as an exercise for readers.)
The correct question would have been, does anyone else agree with the statement.
In this particular case, the amount knowledge needed (of e.g. Lean language, math and Erdos problems) means any credible statement about the difficulty requires an expert.
It doesn't require an expert though. That was kind of my point. If you've taken an intro proof class (so the intro class for math majors/the topic), and if you've fiddled with Lean a bit (e.g. played some of the Natural Numbers Game another commenter linked), you'll know it's easy (source: I did math in my undergrad, and have fiddled with Lean a bit). Honestly I expect intro proof classes will start to be centered around something like Lean soonish if some aren't already incorporating it, and we'll see math majors more explicitly making the connection between program and proof.
Like if someone were incredulous that we could reasonably analyze running time and memory usage of something like merge sort and I said that's a standard example in an intro algorithms course, presumably people would be like "yeah it is".
Carrot does this, but Dark Sky was more than just a repeat of weather data from other sources. They built their own geographic models with feedback to create their hyperlocal forecasts. Apple has that technology but their implementation hasn’t seemed to be as good (possibly because their feedback loop is slow).
reply