Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lean is awesome and this is an impressive new feature, but I can't help but notice that the proof is significantly longer and more complex than the program itself. I wonder how well this will scale to real-world programs.


Real-world programs can be verified by formally proving properties on a small part of the code (called the kernel) in a way that transitively guarantees those for the remaining code.

For example, Rust's borrow checker guarantees* memory safety of any code written in Rust, even a 10M+ LOC project. Another example is sel4, a formally-verified micro-kernel (https://sel4.systems/About/seL4-whitepaper.pdf).

* Technically not; even if the code doesn't use `unsafe`, not only is Rust's borrow checker not formally verified, there are soundness holes (https://github.com/rust-lang/rust/issues?q=is%3Aopen%20is%3A...). However, in theory it's possible to formally prove that a subset of Rust can only encode memory-safe programs, and in practice Rust's borrow checker is so effective that a 10M+ LOC project without unsafe still probably won't have memory issues.


What's a memory issue?

If I access beyond the end of an array in Rust, the panic handler runs and starts unwinding my stack. If I access beyond the end of an array in C++ with .at() the excwption handler runs and starts unwining my stack. If I access beyond the end of an array in C the SIGSEGV handler may (*) run and I could, if I wanted to, start unwinding my stack.

Ah, but in C, sometimes if I access the wrong memory, I get garbadge instead of a panic.

Sure, and if I store my data in a Rust array and store indexes into that array around the place as sort of weak references (something I've seen Rust programmers use and talk about all the time), I can easily fetch the wrong data too.

Rust provides a robust type system and a borrow checker which avoids a lot of common problems at the expence of adhering to a particular programming style. That's fine. That's worth advocating for.

But it's no pannacea. Not even close.

My favorite memory about this is a programmer lambasting Go's strings (which are basically immutable byte vectors) for not enforcing UTF-8, like Rust strings.

He then said that this means that in Go you can print filenames to the screen that can break your terminal session because of this if they contain invalid UTF-8, which Rust forces you to escape explicitly. The irony, of couse, is that the characters that can break your terminal session are perfectly valid UTF-8.

Rust's type safety convinced this guy that his Rust program was immune to a problem that it was simply not immune to.


> Sure, and if I store my data in a Rust array and store indexes into that array around the place as sort of weak references (something I've seen Rust programmers use and talk about all the time), I can easily fetch the wrong data too.

Maybe, but you'll do so in an understandable way that you can catch in testing. You won't suddenly start fetching different wrong data when someone builds your program with a newer version of the compiler, which is a very real risk with C. To say nothing of the risk of arbitrary code execution if your program is operating on attacker-supplied data.

> The irony, of couse, is that the characters that can break your terminal session are perfectly valid UTF-8.

Terminals that can't handle the full UTF-8 range are a problem with those terminals IMO. And terminals implemented in Rust probably don't have that problem :).


> Terminals that can't handle the full UTF-8 range are a problem with those terminals IMO. And terminals implemented in Rust probably don't have that problem :).

No, it isn't, and yes, they would. The problem is that the terminal accepts certain valid UTF-8 characters (typically from the ASCII subset) as output control characters. This is how you get things like programs that can output colored text.

This is a part of the fundamental design of how a terminal device is supposed to work: its input is defined to be a single stream of characters, and certain characters (or sequences of characters) represent control sequences that change the way other characters are output. The problem here is with the design of POSIX in general and Linux in particular - the fact that, despite knowing most interaction will be done through a terminal device with no separate control and data channels, they chose to allow control characters as part of file names.

As a result of this, it is, by design, impossible to write a program that can print out any legal file name to a terminal without risking to put the terminal in a display state that the user doesn't expect. Best you could do is recognize terminal control sequences in file names, recognize if your output device is a terminal, and in those cases print out escaped versions of those character sequences.


> the terminal accepts certain valid UTF-8 characters (typically from the ASCII subset) as output control characters. This is how you get things like programs that can output colored text.

The terminal should not allow such a sequence to break it. Yes, being able to output colour is desirable, but it shouldn't come at the cost of breaking, and doesn't need to. (Whereas it is much less unreasonable for a terminal to break when it's sent invalid UTF-8).

> This is a part of the fundamental design of how a terminal device is supposed to work: its input is defined to be a single stream of characters, and certain characters (or sequences of characters) represent control sequences that change the way other characters are output.

"Design" and "supposed to" are overstating things. It's a behaviour that some terminals and some programs have accreted.

> it is, by design, impossible to write a program that can print out any legal file name to a terminal without risking to put the terminal in a display state that the user doesn't expect

I would not say by design, and I maintain that the terminal should handle it.


I believe you're misunderstanding the problem. The terminal doesn't "break" in the sense that it crashes or does something undefined for those cases. The terminal is doing something that is completely meaningful and well defined and probably has some realistic use cases, such as switching to a different character encoding.

The only problem is that it's not what the user wanted to happen. For a simple example, if a file name contains the control sequence for starting a block of red text, and you print that file name as is in a terminal, you'll, (1), see a truncated file name (that is, copying the text from terminal will not give you the actual file name, since the control characters will be entirely missing), and (2) all future text will be red.

The terminal has done nothing wrong in this case: it used its normal logic for turning text red. The file name is not in any way wrong - it's a completely valid Linux and ext4 file name. The program is not necessarily doing anything wrong - perhaps it was never designed to print to a terminal. But the overall interaction produces the wrong results.


> I believe you're misunderstanding the problem. The terminal doesn't "break" in the sense that it crashes or does something undefined for those cases. The terminal is doing something that is completely meaningful and well defined and probably has some realistic use cases, such as switching to a different character encoding.

I'm aware of the details, but I think sometimes that knowledge leads people to miss the forest for the trees. If the user perceives the terminal as having "broken", that's a case of poor UX design at a minimum. Given that users can readily distinguish between legitimate coloured output etc. and terminals getting into a poor state, it really shouldn't be too hard for the terminal itself to do so. (E.g. it's pretty normal for today's terminals to display some kind of visible warning (complete with resume button) when you press Ctrl-S, rather than simply silently stopping). And while this is a much fuzzier and more contentious claim, I think the Rust community's mentality (as seen in e.g. their approach to compiler errors) nudges people towards such approaches.


This is an entirely new claim - that the Terminal should try to understand what its input means. You can go ahead and try to implement this - I think you'll easily see that it's extremely hard to do so.

A terminal is basically a function foo(char_stream) = formatted_char_stream. It has no idea whatsoever what the input means, or what the output is supposed to mean. Your Ctrl-S example is completely in line with this: it's one control code that the Terminal chooses to display/interpet in a certain way (older terminals would just stop, newer terminals display some warning text and wait for user input). Recognizing that the start-red-output control sequence should not be interpreted as a control sequence if it's coming from the output of `ls` is a fundamentally different type of change.

Would it be nice to have a different concept, a CmdDisplayer that takes as input (commands, command text, control text) and outputs formatted text while understanding its input? Maybe. But it wouldn't be a terminal, and it would require a fundamental redesign of every single program that wants to meaningfully interact with it - especially all shells and any TUI program that might make the most use of such a tool.


> This is an entirely new claim - that the Terminal should try to understand what its input means.

I don't see it as a different claim. My position is that the terminal should not move into a bad state for any valid input, and that a state that the user understands as being "broken" is probably a bad state. To my mind this should not require detailed understanding of what each program is doing, which I agree is difficult, but just some more basic things like making it clearer to the user why their terminal is in a funny state (and how to undo it), or perhaps ensuring that the terminal reverts to a good state whenever a program that had changed its state exits.


Having a way to return to the default would be nice, but it wouldn't fix the base problem that a program can't simply print a file name to stdout if it thinks stdout might be connected to a terminal. Even if the terminal didn't "break", it would still not display the file name in a useful way (e.g. a user couldn't copy paste it from there).


> it wouldn't fix the base problem that a program can't simply print a file name to stdout if it thinks stdout might be connected to a terminal. Even if the terminal didn't "break", it would still not display the file name in a useful way (e.g. a user couldn't copy paste it from there).

Well, I'm all for having something like a "raw copy" function in the terminal, but if a filename contains characters that can't reasonably be copy-pasted then there's not really anything that can be done about that (other than "don't give your files silly names"). I think having the terminal get into a bad state is a far worse problem; weird/corrupted filenames are something that happens and people generally figure out a way to deal with them.


I don't think it's Linux so much as it is any given filesystem implementation. As I understand it validation is entirely up to the filesystem itself. I could be mistaken but I don't believe there's anything stopping you from implementing a filesystem that uses raw binary data for filenames.

There's also the question of what happens if the data structures on disk become corrupted. The filesystem driver might or might not validate the "string" it reads back before returning it to you.


Linux itself exposes various syscalls that operate with filenames, userland programs can't interact directly with the FS driver. But Linux chose to implement only 2 restrictions at the syscall level (slash used to separate elements of the path and NULL used to mark the end of the input). The kernel will resolve the path to a particular file system and send a subset of the path to the corresponding FS driver exactly as it received them, and the FS can choose whether to accept or reject them. Most Linux FSs don't apply any extra restrictions either. The main exceptions are FSs written to interface with other systems, such as CIFS or SMB, which additionally apply DOS/Windows filename restrictions by necessity.

If Linux had chosen to standardize file names at the syscall level to a safe subset of UF-8 (or even ASCII), FS writers would never have even seen file names that contain invalid sequences, and we would have been spared from a whole set terminal issues. Of course, UTF-8 was nowhere close to as universally adopted as it is today at the time Linux was developed, so it's just as likely they might have standardized to some subset of UTF-16 or US-ASCII and we might have had other kinds of problems, so it's arguable they took the right decision for that time.


It's even worse than that. If a newer version of the compiler is able to leverage knowledge of the array bounds it could "optimize" away an entire chunk of your program. It probably won't do that because compiler authors supposedly aren't openly hostile towards compiler users but it isn't so easy to write an algorithm that will flag such "obviously" wrong things.

The control characters are themselves valid (but unprintable) UTF-8. They are also, against all common sense and reason, permitted within filenames by many filesystems. Rust won't save you here. https://www.compart.com/en/unicode/category/Cc


> Maybe, but you'll do so in an understandable way that you can catch in testing. You won't suddenly start fetching different wrong data when someone builds your program with a newer version of the compiler, which is a very real risk with C. To say nothing of the risk of arbitrary code execution if your program is operating on attacker-supplied data.

What guarantees it? Literally nothing. You can catch errors in testing in C as well. Yeah, in C you get "different" data on a different version of the compiler, but you get garbage data in all versions of the compiler and Valgrind flags that in testing.

And, of course, you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

Sure, Rust fixes a lot of easy to commit bugs in C and C++, absolutely. But there are no absolutes.


> What guarantees it? Literally nothing.

The behaviour of looking up index x in array y in rust is well-defined, and consistent between compiler versions. Maybe you use the wrong index and get the wrong customer's data or something, but you'll still get the data at that index (or a panic if the index is invalid).

> You can catch errors in testing in C as well. Yeah, in C you get "different" data on a different version of the compiler, but you get garbage data in all versions of the compiler and Valgrind flags that in testing.

Not always. It's very common for code to be broken according to the standard but do the right thing in some compilers, and then in a different compiler it does something completely bizarre. The code might not do the same thing in testing as it does in release. And while Valgrind does a lot of good for the minority of C programmers who use it, it's far from 100% reliable.

> you can get arbitrary code execution if your program is operating on data coming from multiple users of different privilege levels in Rust if you use vectors like that.

You might in some cases, but it's a lot tricker. You can get the program to operate on different parts of its data than it was intended to, but to go from there to running arbitrary code will still be a significant leap and require an exploitation technique specific to that particular program. Whereas the techniques for going from common classes of C vulnerabilities (e.g. buffer overflow or use after free) to arbitrary code execution are practically textbook at this point.


Funny thing is that you can get undefined behavior and segfaults using only "safe rust", and the rust compiler has subtle bugs that allow you to disable important checks (like type checking), which can leave your code completely broken.

But for some crazy propaganda, rust devs believes that any rust code is safe and sound no matter what.

https://github.com/Speykious/cve-rs/issues/49


Research points to there being a quadratic relationship between automated proof and code size: https://trustworthy.systems/publications/nictaabstracts/Mati....

Specifically, the relationship is between the _specification_ and the proof, and it was done for proofs written in Isabelle and not Lean.

The good news is that more and more automation is possible for proofs, so the effort to produce each proof line will likely go down over time. Still, the largest full program we've fully verified is much less than 100,000 LOC. seL4 (verified operating system) is around 10,000 lines IIRC.


A small note, but seL4 is more of a kernel than a full OS - though in the embedded world for which it is targeted, the difference is pretty fuzzy.


My belief is that the core part of a lot of programs where you want formal proofs are tricky enough to where you need hand holding but the higher up the stack you go the simpler your proofs get since you did the hard stuff in the core.

Though it really does feel like we're still scratching the surface of proof writing practices. A lot of proofs I've seen seem to rely only on very low level building blocks, but stronger practitioners more immediately grab tools that make stuff simpler.

I would say, though, that it feels likely that your proofs are always going to be at least within an order of magnitude of your code, because in theory the longer your code is the more decision points there are to bring up in your proof as well. Though automatic proof searches might play out well for you on simpler proofs.


> My belief is that the core part of a lot of programs where you want formal proofs are tricky enough to where you need hand holding but the higher up the stack you go the simpler your proofs get since you did the hard stuff in the core.

I don't think this is true at all. For many kinds of programs that it would be good to have formal verification for, all of the details are very important. For example, I'd love to know that the PET scan code was formally verified, and that it's impossible to, say, show a different radiation dose on the screen than the dose configured in the core. I very much doubt that it's easy to write a proof that the GUI controls are displaying the correct characters from a font, though.

Or, it would be good to know that the business flows in the company ERP are formally verified to actually implement the intended business processes, but even the formal specification for the business processes in a regular large company would likely take longer to produce than it takes for those same business processes or external laws to change.


Long-term this would be done using LLMs. It would also solve LLMs' code quality issues - they could simply proof that the code works right.


> simply proof that the code works right

Combining LLMs + formal methods/model checkers is a good idea, but it's far from simple because rolling the dice on some subsymbolic stochastic transpiler from your target programming language towards a modeling/proving language is pretty suspect. So suspect in fact that you'd probably want to prove some stuff about that process itself to have any confidence. And this is a whole emerging discipline actually.. see for example https://sail.doc.ic.ac.uk/software/


Maybe very long term. I turn off code assistants when doing Lean proofs because the success rate for just suggestions is close to zero.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: