Here's an interesting comment I read on Slashdot a while back that summarizes the paper:
This is really a little operating system, with 44 system calls. Those system calls are the same on Linux, MacOS (IA-32 version) and Windows. That could make this very useful - the same executable can run on all major platforms.
Note that you can't use existing executables. Code has to be recompiled for this environment. Among other things, the "ret" instruction has to be replaced with a different, safer sequence. Also, there's no access to the GPU, so games in the browser will be very limited. As a demo, they ported Quake, but the rendering is entirely on the main CPU. If they wanted to support graphics cross-platform, they could put in OpenGL support.
Executable code is pre-scanned by the loader, sort of like VMware. Unlike VMware, the hard cases are simply disallowed, rather than being interpreted. Most of the things that are disallowed you wouldn't want to do anyway except in an exploit.
This sandbox system makes heavy use of some protection machinery in IA-32 that's unused by existing operating systems. IA-32 has some elaborate segmentation hardware which allows constraining access at a fine-grained level. I once looked into using that hardware for an interprocess communication system with mutual mistrust, trying to figure out a way to lower the cost of secure IPC. There's a seldom-used "call gate" in IA-32 mechanism that almost, but not quite, does the right thing in doing segment switches at a call across a protection boundary. The Google people got cross-boundary calls to work with a "trampoline code" system that works more like a system call, transferring from untrusted to trusted code. This is more like classic "rings of protection" from Multics.
Note that this won't work for 64-bit code. When AMD came up with their extension to IA-32 to 64 bits, they decided to leave out all the classic x86 segmentation machinery because nobody was using it. (I got that info from the architecture designer when he spoke at Stanford.) 64-bit mode is flat address space only.
I'm not sure I'd call X86 segmentation "elaborate", at least in the context of X86 programming (sure, it's very elaborate compared to MIPS).
I don't think I've heard the word "call gate" used with the definite article before, as if there was just one of them... but I'm an X86 autodidact and that could be my mistake. My understanding is that a call gate is anything that vectors a program from one context to another. In most X86 operating systems, there are 2-3 basic call gates that will get you from userland to kernel: the INT instruction (the interrupt handler will check your program state and dispatch the right system call) and the SYSCALL instruction (which does the same thing without the interrupt overhead).
NaCl disallows both of these instructions, along with the FAR CALL opcode that would let you jump between segments and the segment override prefix that does the same (note this was the epic fail Dowd found in the contest).
The trampoline mechanism that NaCl uses is not at all dissimilar from how Win32 and BSD libc issue system calls; the library exports a stub interface and hides the mechanics of actually issuing a system call.
Note: not trying to be pedantic here. Just love geeking out on this stuff.
The x86 instruction set has a mechanism called "call gates" for system calls. Basically, the OS puts the entry point of the system call handler into a segment descriptor with the call gate bits set. The unprivileged user program then performs a far call to an address consisting of a segment selector for that descriptor and an offset which does not matter. Execution resumes at the system call handler, with a privilege level as encoded in the call gate descriptor.
That way, you could have thousands of system call entry points and avoid the overhead of an int instruction and the syscall-number dispatch. I believe OS/2 used that mechanism extensively (and all the other elaborate segmentation stuff).
Presumably the restriction on RET applies to things like calls using function pointers and therefore, by extension, vtables? I could see how they might maintain return addresses safely out-of-band somewhere, but will every function pointer call have to go through vetting? Will we have to go back to the days when virtual functions were evil?
http://nativeclient.googlecode.com/svn/trunk/nacl/googleclie...
Here's an interesting comment I read on Slashdot a while back that summarizes the paper:
This is really a little operating system, with 44 system calls. Those system calls are the same on Linux, MacOS (IA-32 version) and Windows. That could make this very useful - the same executable can run on all major platforms.
Note that you can't use existing executables. Code has to be recompiled for this environment. Among other things, the "ret" instruction has to be replaced with a different, safer sequence. Also, there's no access to the GPU, so games in the browser will be very limited. As a demo, they ported Quake, but the rendering is entirely on the main CPU. If they wanted to support graphics cross-platform, they could put in OpenGL support.
Executable code is pre-scanned by the loader, sort of like VMware. Unlike VMware, the hard cases are simply disallowed, rather than being interpreted. Most of the things that are disallowed you wouldn't want to do anyway except in an exploit.
This sandbox system makes heavy use of some protection machinery in IA-32 that's unused by existing operating systems. IA-32 has some elaborate segmentation hardware which allows constraining access at a fine-grained level. I once looked into using that hardware for an interprocess communication system with mutual mistrust, trying to figure out a way to lower the cost of secure IPC. There's a seldom-used "call gate" in IA-32 mechanism that almost, but not quite, does the right thing in doing segment switches at a call across a protection boundary. The Google people got cross-boundary calls to work with a "trampoline code" system that works more like a system call, transferring from untrusted to trusted code. This is more like classic "rings of protection" from Multics.
Note that this won't work for 64-bit code. When AMD came up with their extension to IA-32 to 64 bits, they decided to leave out all the classic x86 segmentation machinery because nobody was using it. (I got that info from the architecture designer when he spoke at Stanford.) 64-bit mode is flat address space only.
http://tech.slashdot.org/comments.pl?sid=1056231&cid=260...