> Unfortunately "it's complicated" to implement well, especially when you try to...

convolvatron · on May 31, 2022

you still have to migrate the kernel state of underlying resources, so containers don't really buy you very much. as you say, in checkpoint restart you start with the big stuff like tcp sockets, but after a while you get tired of tracking down little flecks of state.

distributed shared memory though does deserve a new look now that baseline network speeds are 100x what the max was when it was first investigated. unfortunately temporal consistency is still going to be a major factor. some workloads will run great with some heuristics, and some won't. you'll almost certainly need to migrate threads along with pages to try to keep them running co-local without exhausting per-node resources.

VWWHFSfQ · on May 31, 2022

> you still have to migrate the kernel state of underlying resources

DragonflyBSD has been experimenting with this concept for several years now. At the moment it's mostly only useful for snapshotting the kernel for debugging purposes. But there's no reason why it couldn't be extended to transport the state of the computer to another computer and resume execution!

https://www.dragonflybsd.org/docs/handbook/vkernel/

emmelaich · on June 1, 2022

https://criu.org/ indicates it supports migration already; at least at the container level.