Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ICPP – Run C++ anywhere like a script (github.com/vpand)
83 points by davikr on Aug 8, 2024 | hide | past | favorite | 53 comments


Why interpret at all? Back in the mid to early 90's I started embedding C++ compilers into the game engines I wrote, where the "game scripting language" was just #define macros hiding the C++ syntax so the game level developers, who worked in this script, could be basically anyone that could code. Their "script" would compile to a DLL that was hot loaded. What they were doing in their scripts would compile in under 5 seconds, and they were good to go. If they ran into problems, one of the game engine developers would just run their "script" in the IDE debugger.

Borrowed this idea from Nothing Real, the developers of Shake, the video/film compositing system.


Totally valid take. The answer is: it depends.

The advantage of a lot of scripting tech is some form of REPL, which is really just a super-fast code-compile-run loop. In your example, "why?" boils down to how useful/painful those five seconds-per-change are. Maybe that adds up and slows the coder down, or maybe it's no big deal. It all kind of depends on the workflow and how fast you need to be able to iterate on code changes. Moving to a scripted interpreter would eliminate that wait period at the cost of runtime performance, which might be a valuable business tradeoff.

FWIW, that "script" solution sounds awesome for the time. I'll add that five seconds to build a hot-loaded DLL in the 90's is really, really good performance for that solution, regardless of its role as a scripting alternative. Today, that would probably be mere milliseconds to compile - impossible to distinguish from an embedded LUA or JS solution.


> Why interpret at all?

a tiny part of points that come to my mind:

- education

- iteration on writing simple functionality

- loading and trying out several APIs to see what's possible (I use it frequently with Elixir / Erlang for example)

It makes life easier for newcomers to wrap their head around something and produce a good solution rather than a "working" one


But the "interpretation" is kind of in name only. When you run your program there's still going to be a compilation step, it's just the interpreter will merge it with the run step, and it will do it every time you run the program. I'm with the GP, I don't understand the advantage of this approach over traditional AOT compilation (actually this isn't even JIT, it's just deferred AOT).



I don't suppose modern os would let you do that today, sounds like a security nightmare


Yes, any modern OS lets any process load into its memory space binaries from anywhere the user has permissions, even if those are binaries it generated just now. It can be a security problem if the binaries are generated from untrusted sources (e.g. you download some, say, Haskell, compile it and run it fully automatically).


Windows has a private heap system where you can disable code execution from the pages allocated to `HeapCreate`, if you don't* set that flag:

https://learn.microsoft.com/en-us/windows/win32/api/heapapi/...

> HEAP_CREATE_ENABLE_EXECUTE

> 0x00040000

> All memory blocks that are allocated from this heap allow code execution, if the hardware enforces data execution prevention. Use this flag heap in applications that run code from the heap. If HEAP_CREATE_ENABLE_EXECUTE is not specified and an application attempts to run code from a protected page, the application receives an exception with the status code STATUS_ACCESS_VIOLATION.

I think POSIX has equivalent memory protection calls, but no equivalent to HeapCreate


But you can still call VirtualAlloc(), VirtualProtect(), and LoadLibrary(), so this isn't really a security mechanism, but more of a safety mechanism.

I don't think Windows provides a mechanism to disable creating any further executable pages, although I've seen Chrome do it by hooking those functions (and I know it because I've had to bypass it :)).


I wouldn't expect windows to prevent creating further executable pages; there are legitimate use cases for creating dynamically allocated executable memory. It just means that whatever foreign data you load into those pages can't execute, which is a security mechanism (for example, game save data can be loaded into these heaps so that you can load all game state but without the save file potentially running foreign code)


There are legitimate uses, but the point would be that the process could ask the system to lock it down with whatever executable code is already present. This could be used to prevent already running code from tampering with the process' behavior by loading new code, or to thwart code injection.

>It just means that whatever foreign data you load into those pages can't execute, which is a security mechanism (for example, game save data can be loaded into these heaps so that you can load all game state but without the save file potentially running foreign code)

But malloc() and all the other standard memory allocation functions already return pointers into non-executable pages, anyway. Perhaps those functions call into this one internally, but using this over whatever your language offers by default offers no additional protection.


The benefit of heapalloc and heapcreate is that it makes serialization of data simpler because you can define the bounds of the heap so you can easily save pointer offsets. There are other flags available as well. It's a win32 feature so you would only use it if windows is your target platform (and then write other bindings for other platforms)


I still don't get it. Why can't you do the same with a regular buffer?


App Sandboxing and VBS enclaves go onto that direction.


All of this calls VirtualAlloc behind the scenes, and you can do that yourself as well for manual page allocation. Each page can have options set with VirtualProtect to allow or disallow execution of code within the pages as well.


iOS won't let you do that. Or at least Apple won't let you do that on iOS.


What's actually happening is that the SDK doesn't expose the system calls necessary to do it, but I can guarantee that if you can get a native binary to run on the device, you can have it do whatever you want. If that wasn't the case, the few apps that do support JITting wouldn't work.


This is just JIT'ing. It's used in python and lua commonly


It's just loading a dynamic library from an arbitrary file, as traditionally done for third party software addons (e.g. Photoshop plugins since the early 1990s and VST instruments and effects since the late 1990s).


I wish, JIT and Python still isn't something to brag about.


Python 3.13 has an experimental JIT compiler https://peps.python.org/pep-0744/


As I said, not something to brag about.


Folks who like this kind of thing should definitely check out CERN's Root framework. I've been using its C++ interpreter in a Jupyter notebook environment to learn C++. It's probably also quite a bit more mature than this project. https://root.cern/


This is Great! Thanks for the pointer.

Any other great open source tools like this that you can share?


Well, one thing you can use alongside this project is a small library called cpp-dump, which lets you pretty print variables. https://github.com/philip82148/cpp-dump

It's just a normal library you can use with any compiled project, but it works nicely with Root C++ for built-in and std types.

Great if (say) you're working through implementing vector or linear algebra and want a nice way to display your multi-dimensional arrays and vectors.

Just copy the project folder somewhere (most conveniently where you invoke the interpreter from), do `#include "cpp-dump/dump.hpp` and then `cpp_dump(myVariable)` to print your variables.

You can see how it looks in this example where I was mucking about with permutations of vectors: https://i.imgur.com/yRpY5Bj.png


Along the lines of scripting is interactive programming. I'm working on a native Clojure dialect on LLVM with C++ interop, qalled jank. It can JIT compile C++ code, can be embedded into any C++-compatible application, and is a full Clojure dialect which doesn't hid any of its C++ runtinme. So you can do inline C++, compile C++ sources alongside your jank. and require them like a normal Clojure namespace. Worth a look if you're using C++ but you're craving something more interactive. https://jank-lang.org/


I wonder if I can use this to learn a large c++ codebase like Chromium. One of the issues I had trying to learn chromium was that in order to play and experiment with their classes/functions I needed to spend several minutes to link my little test code with their static libraries just to be able to see if my understanding of them was correct. Which is just too long of time for such experiments so I gave up.


Last time I checked out Chromium code base it was about 300-400 Megs of uncompressed cpp files. Lets not also forget the fact you also needed to run some code generator script that generated another 200 Megs of DOM files, or interface files. At that point in time I gave up and went to sleep and never touched it again.


I really hope Ladybird is able to stay relatively small and approachable, it would be wonderful to have as a truly customizable open-source browser that's not a massive codebase that takes forever to compile and develop.


I do think there are some geniuses working on the Chromium code base, and I would imagine there are really good reasons for doing it their way. I would imagine also Ladybird over time will face the same problems, and come up with similar solutions as Chromium team.

All I know is most of the large scale C/C++ code bases eventually become these monolithic giant code bases that require some really specialised software tools to compile and link.


Not only C and C++, I have seen this scale with most Fortune 500 projects I have been involved.


I agree on all fronts. This parallels the last time I looked at, and gave up building a backend for LLVM. And that was after giving up doing the same for GCC. Those codebases are _impenetrable_.

It's clear as mud how one would hook a jumbo codebase into the REPL. If it's possible, that would be a game changer.


I added LLVM JIT support to https://ossia.io a few years ago, it's not too bad, but a big issue is that the JIT does not support all the necessary features used by the frontend in terms of relocations, etc. So it happens relatively often that C++ code will compile to LLVM IR without issue, but then fail at the JIT step because some relocation is not supported by the JIT engine yet.

Most of the code is here : https://github.com/ossia/score/tree/master/src/plugins/score... with the actual LLVM API interoperation contained there : https://github.com/ossia/score/tree/master/src/plugins/score...

It's been used for fun projects, for instance for this paper about data sonification : https://www.researchgate.net/profile/Maxime_Poret/publicatio...


How feasible would it be for something like gdb to be able to use a C++ interpreter (whether icpp, or even a souped up `constexpr` interpreter from the compiler) to help with "optimized out" functions?

gdb also doesn't handle overloaded functions well, e.g. `x[i]`.


GDB does have hooks for interpreters to be executed within it, but I haven't managed to make this work. https://sourceware.org/gdb/current/onlinedocs/gdb.html/JIT-I....


It does though? Just compiled a small program that creates a vector, and GDB is perfectly happy accessing it using this syntax. It will even print std::string’s correctly if you cast them to const char* by hand. (Linux x86-64, GDB 14.2.)


I've defined a few pretty printers, but `operator[]` doesn't work for my user-defined types. Knowing it works for vectors, I'll try and experiment to see if there's something that'll make it work.

  (gdb) p unrolls_[0]
  Could not find operator[].
  (gdb) p unrolls_[(long)0]
  Could not find operator[].
  (gdb) p unrolls_.data_.mem[0]
  $2 = {
`unrolls_[i]` works within C++. This `operator[]` method isn't even templated (although the container type is); the index is hard-coded to be of type `ptrdiff_t`, which is `long` on my platform.

I'm on Linux, gdb 15.1.


> This `operator[]` method isn't even templated (although the container type is)

That might be it. If that operator isn’t actually ever emitted out of line, then GDB will (naturally) have nothing to call. If it helps, with the following program

  template<typename T>
  struct Foo {
      int operator[](long i) { return i * 3; }
  };
  
  Foo<bool> bar;
  template int Foo<bool>::operator[](long); // [*]
  
  int main(void) {
      Foo<int> foo;
      __asm__("int3");
      return foo[19];
  }
compiled at -g -O0 I can both `p foo[19]` and `p bar[19]`, but if I comment out the explicit instantiation marked [*], the latter no longer works. At -g -O2, the former does not work because `foo` no longer actually exists, but the latter does, provided the instantiation is left in.


Can confirm, this works for me in my actual examples, thanks!


> It will even print std::string’s correctly if you cast them to const char* by hand

What does that mean? I think `print str.c_str()` has worked for me in GDB before, but sounds like you did something different.


I was observing that `p (const char *)str` also worked in my experiment, but I’m far from a C++ expert and upon double-checking this seems to have been more of an accident than intended behaviour, because there is no operator const_pointer in basic_string that I can find. Definitely use `p str.c_str()`.


If your std::string was using a short string optimization, that would explain the “accident”.

Some implementations even put char[0] at the first byte in the optimized form.


That explanation doesn't work IMO, unless `str` is a std::string pointer, which is contrary to the syntax GP suggested with `str.c_str()`.

It doesn't seem possible in actual C++ that the cast from non-pointer to pointer would work at all (even if a small string happens to be inlined at offset 0.) Like GP, I looked for a conversion operator, and I don't think it's there. Maybe it is a feature of the gdb parser.


Good point, but if it’s a long string, 2/3 of the most common implementations would make the first word the c_str()-equivalent pointer:

https://devblogs.microsoft.com/oldnewthing/20240510-00/?p=10...


So it's actually printing *(const char **)&s?


The first pointer-sized chunk of the string structure is a pointer to the C-string representation. So the cast works as written.


Well, no, because (const char *)str is nonsense, if str is an std::string.


Not to the debugger. If the first 8 bytes of the object referenced by str is a char* the debugger is perfectly capable of using it that way.


this "optimized out" thing is bullshit as hell


This is really cool...

https://github.com/vpand/icpp-qt

Ok now things are getting interesting. I think this could be used to add easily shareable/hackable plugins to existing C++ projects.


I've done it by embedding libclang into an executable. You still have to be really careful to keep ABI compatibility between the host and the JITed plugin, if you want to send and receive complex C++ objects. Most likely you'll need to set up a simple C ABI and reconstruct the objects on either side of the interface. The last thing you want is to send std::string across a DLL boundary.


Related is cargo script for Rust: https://doc.rust-lang.org/cargo/reference/unstable.html#scri... (nightly only)


You can do it out of the box with rust, no need for any tools, because you can strategically mix shell and rust in the same code: https://neosmart.net/blog/self-compiling-rust-code/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: