Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the entire history of the standard library, we have never once seen a single report of anyone attempting to recover from poison.


I've used recovering from poisoned state in impl Drop in quite a few places.

In my case it's usually waiting for the GPU to finish some asynchronous work that's been spun up by CPU threads that may have panicked while holding the lock. This is necessary to avoid freeing resources that the GPU may still be using.

I usually prefix this with `if !std::thread::panicking() {}`, so I don't end up waiting (possibly forever) if I'm already cleaning up after a panic.


Thank you for mentioning this; I'd be really interested in hearing more about this, and seeing some examples.


Hi, I don't have public examples to share but I can give an explanation of a simple scenario.

I have a container of resources, e.g. textures. When the GPU wants to use them, CPU will lease them until a point of time in the future denoted by a value (u64) of a GPU timeline semaphore. The handle and value of the semaphore is added to a list guarded by a mutex. Then GPU work is kicked off and the GPU will increment semaphore to that value when done.

In the Drop implementation of the container, we need to wait until all semaphores reach their respective value before freeing resources, and do so even if some thread panicked while holding the lock guarding the list. This is where I use .unwrap_or_else to get the list from the poison value.

It's not infeasible to try to catch any errors and propagate them when the lock is grabbed. But this is mostly for OOM and asserts that are not expected to fire. The ergonomics would be worse if the "lease" function would be fallible.

This said, I would not object to poisoning being made optional.


Oh, I don't think recovery from poison is why poisoning is good. The reason poisoning is good is that at the moment you've acquired a lock on a mutex, you should be able to assume that the invariants guarded by the mutex are upheld (and panic if not).


Mutex doesn't promise to uphold any more invariants than `&mut T` does. If the state can be corrupted by a panic while holding `&mut T`, I don't think there's any good reason to expect that obtaining it through `MutexGuard` should make any difference.

Panic propagation is typically handled much better at thread `join()` boundaries.


A panic in single-threaded, non-parallel code will either terminate the program or be recovered cleanly, so the potential for side effects to be silently observed in a way that breaks invariants is unique to Mutex<>. This is the reason for mutex poisoning,


I fail to see that there is any material difference. Whether you catch-unwind within a single thread or in a separate thread such that the panic can be resumed on join makes zero difference.

Heck, you can have Drop impls observing the state while unwinding.

A true panic-safe data structure requires serious thought, and mutex poisoning does nothing here - it is neither necessary nor sufficient.


This is a false dichotomy. Not every technique needs to work in all cases in order to be useful.

This seems analogous to arguing that because seat belts don't save the lives of all people involved in car crashes, and they're kind of annoying, then they shouldn't be factory-standard.


This is a case of a feature that is actively harmful for the things it tries to prevent, because it increases the risk in practice of panics "spreading" throughout a system, even after the programmer thought she had finished handling it, and because it gives a false impression what kind of guarantee you actually have.


This is exactly the problem. Poison is enough to be painful but not enough to fully solve the problem.

> Heck, you can have Drop impls observing the state while unwinding.

Yeah, this is really painful and regularly forgotten. And one reason it'd be nice to not have unwinding.


I understand what you mean, but you're saying has not been true for me in practice. Mutexes absolutely are used to uphold invariants in a way that &mut T is much less often.

There's something to be said here about what I've sometimes called the cancellation blast radius. The issues with cancellation happen when the data corruption/invariant violation is externally visible (if the corrupt data is torn down, who cares.) Mutexes make data corruption externally visible very often.


In projects I've worked on, this just hasn't been the case. Mutexes, especially in Rust, can grant you a `&mut T` when what you have is `&Mutex<T>`, and that's it - failing to uphold invariants in the API surface of `T` is a bug whether or not it lives inside a mutex.

Lots of data structures need to care about panic-safety. Inserting a node in a tree must leave the tree in a valid state if allocating memory for the new node fails, for example. All of that is completely orthogonal to whether or not the data structure is also observable from multiple threads behind a mutex, and I would argue especially in the case of mutex, whose purpose it is to make an object usable from multiple threads as-if they had ownership.


Acknowledging that panic safety is a real issue with data structures that mutex poisoning does not solve, I don't think we're going to agree on anything else here, unfortunately. We probably have entirely different experiences writing software -- mutex poisoning is very valuable in higher-level code.


That’s not surprising to me, but it’s not much of an argument for changing the default to be less safe. Most people want poisoning to propagate fatal errors and avoid reading corrupted data, not to recover from panics.

Edit: isn’t that an argument not to change the default? If people were recovering from poison a lot and that was painful, that’s one thing. But if people aren’t doing that, why is this a problem?


Because right now everyone writes `.lock().unwrap()` everywhere without really thinking about it, and it just makes Mutex more painful to work with.


If the issue is that everyone has to write an extra unwrap, then a good step would be to make lock panic automatically in the 2027 edition, and add a lock_or_poison method for the current behavior. But I think removing poisoning altogether from the default mutex, such that it silently unlocks on panic, would be very bad. The silent-unlock behavior is terrible with async cancellations and terrible with panics.


You seem to keep making the implicit assumption that because people are using `unwrap()`, they must not care about the poisoning behavior. I really don't understand where this assumption is coming from. I explicitly want to propagate panics from contexts that hold locks to contexts that take locks. The way to write that is `lock().unwrap()`. I get that some people might write `lock().unwrap()` not because they care about propagating panics, but because they don't care either way and it's easy. But why are you assuming that that's most people?


https://news.ycombinator.com/item?id=46051602

I'm suggesting that the balance of pain to benefit is not working out enough to inflict it on everyone by default. I'm not suggesting it has no value, just not enough to be worth it.


I hear that, but it feels kind of empty because I haven't seen much discussion of that cost/benefit analysis (both of poisoning itself and of the change to the default behavior, which has its own costs and benefits).

I take it as uncontroversial that an important function of Mutexes is to ensure that invariants about data are maintained when the data is modified and that very bad things can happen when a program's data invariants are violated at runtime and the program doesn't notice. Maybe folks disagree about whether a program should always panic when invariants are violated at runtime (though there's certainly plenty of precedent in Rust itself for doing this, like with array bounds checking). Probably the bigger question mark is that panicking with a Mutex held doesn't necessarily mean an invariant is violated. But it does mean that the mechanism for ensuring the invariant has itself failed. I can see different choices about what to do here. For myself, the event itself is so rare and the impact of getting an invariant wrong so high that I absolutely do want to panic -- the false positive rate is just too small to matter.


Is that not because there is not much to do, and therefore people use .unwrap() — because crashing is actually quite sane?

Correctness trumps ergonomics, and the default should definitely be poisoning/panicking unless handled. There could definitely be an optional poison-eating mutex, but I argue the current Mutex does the right thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: