Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Unfortunately "it's complicated" to implement well, especially when you try to tele-spawn and manage resources beyond compute cycles (network connections, files, file handles, ...)

Aren't all of these resources namespaced/containerized in modern Linux? This should make it feasible to checkpoint and restore them on the same machine (via, e.g. the CRIU patchset) and true location-independence is not that much harder. One of the hardest parts (not even implemented in plan9, AFAICT) is distributed shared memory (allowing for sharing a single virtual address space across cluster nodes), but even that AIUI has some research-level implementations.



you still have to migrate the kernel state of underlying resources, so containers don't really buy you very much. as you say, in checkpoint restart you start with the big stuff like tcp sockets, but after a while you get tired of tracking down little flecks of state.

distributed shared memory though does deserve a new look now that baseline network speeds are 100x what the max was when it was first investigated. unfortunately temporal consistency is still going to be a major factor. some workloads will run great with some heuristics, and some won't. you'll almost certainly need to migrate threads along with pages to try to keep them running co-local without exhausting per-node resources.


> you still have to migrate the kernel state of underlying resources

DragonflyBSD has been experimenting with this concept for several years now. At the moment it's mostly only useful for snapshotting the kernel for debugging purposes. But there's no reason why it couldn't be extended to transport the state of the computer to another computer and resume execution!

https://www.dragonflybsd.org/docs/handbook/vkernel/


https://criu.org/ indicates it supports migration already; at least at the container level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: