Why interpret at all? Back in the mid to early 90's I started embedding C++ compilers into the game engines I wrote, where the "game scripting language" was just #define macros hiding the C++ syntax so the game level developers, who worked in this script, could be basically anyone that could code. Their "script" would compile to a DLL that was hot loaded. What they were doing in their scripts would compile in under 5 seconds, and they were good to go. If they ran into problems, one of the game engine developers would just run their "script" in the IDE debugger.
Borrowed this idea from Nothing Real, the developers of Shake, the video/film compositing system.
The advantage of a lot of scripting tech is some form of REPL, which is really just a super-fast code-compile-run loop. In your example, "why?" boils down to how useful/painful those five seconds-per-change are. Maybe that adds up and slows the coder down, or maybe it's no big deal. It all kind of depends on the workflow and how fast you need to be able to iterate on code changes. Moving to a scripted interpreter would eliminate that wait period at the cost of runtime performance, which might be a valuable business tradeoff.
FWIW, that "script" solution sounds awesome for the time. I'll add that five seconds to build a hot-loaded DLL in the 90's is really, really good performance for that solution, regardless of its role as a scripting alternative. Today, that would probably be mere milliseconds to compile - impossible to distinguish from an embedded LUA or JS solution.
But the "interpretation" is kind of in name only. When you run your program there's still going to be a compilation step, it's just the interpreter will merge it with the run step, and it will do it every time you run the program. I'm with the GP, I don't understand the advantage of this approach over traditional AOT compilation (actually this isn't even JIT, it's just deferred AOT).
Yes, any modern OS lets any process load into its memory space binaries from anywhere the user has permissions, even if those are binaries it generated just now. It can be a security problem if the binaries are generated from untrusted sources (e.g. you download some, say, Haskell, compile it and run it fully automatically).
> All memory blocks that are allocated from this heap allow code execution, if the hardware enforces data execution prevention. Use this flag heap in applications that run code from the heap. If HEAP_CREATE_ENABLE_EXECUTE is not specified and an application attempts to run code from a protected page, the application receives an exception with the status code STATUS_ACCESS_VIOLATION.
I think POSIX has equivalent memory protection calls, but no equivalent to HeapCreate
But you can still call VirtualAlloc(), VirtualProtect(), and LoadLibrary(), so this isn't really a security mechanism, but more of a safety mechanism.
I don't think Windows provides a mechanism to disable creating any further executable pages, although I've seen Chrome do it by hooking those functions (and I know it because I've had to bypass it :)).
I wouldn't expect windows to prevent creating further executable pages; there are legitimate use cases for creating dynamically allocated executable memory. It just means that whatever foreign data you load into those pages can't execute, which is a security mechanism (for example, game save data can be loaded into these heaps so that you can load all game state but without the save file potentially running foreign code)
There are legitimate uses, but the point would be that the process could ask the system to lock it down with whatever executable code is already present. This could be used to prevent already running code from tampering with the process' behavior by loading new code, or to thwart code injection.
>It just means that whatever foreign data you load into those pages can't execute, which is a security mechanism (for example, game save data can be loaded into these heaps so that you can load all game state but without the save file potentially running foreign code)
But malloc() and all the other standard memory allocation functions already return pointers into non-executable pages, anyway. Perhaps those functions call into this one internally, but using this over whatever your language offers by default offers no additional protection.
The benefit of heapalloc and heapcreate is that it makes serialization of data simpler because you can define the bounds of the heap so you can easily save pointer offsets. There are other flags available as well. It's a win32 feature so you would only use it if windows is your target platform (and then write other bindings for other platforms)
All of this calls VirtualAlloc behind the scenes, and you can do that yourself as well for manual page allocation. Each page can have options set with VirtualProtect to allow or disallow execution of code within the pages as well.
What's actually happening is that the SDK doesn't expose the system calls necessary to do it, but I can guarantee that if you can get a native binary to run on the device, you can have it do whatever you want. If that wasn't the case, the few apps that do support JITting wouldn't work.
It's just loading a dynamic library from an arbitrary file, as traditionally done for third party software addons (e.g. Photoshop plugins since the early 1990s and VST instruments and effects since the late 1990s).
Folks who like this kind of thing should definitely check out CERN's Root framework. I've been using its C++ interpreter in a Jupyter notebook environment to learn C++. It's probably also quite a bit more mature than this project. https://root.cern/
Well, one thing you can use alongside this project is a small library called cpp-dump, which lets you pretty print variables. https://github.com/philip82148/cpp-dump
It's just a normal library you can use with any compiled project, but it works nicely with Root C++ for built-in and std types.
Great if (say) you're working through implementing vector or linear algebra and want a nice way to display your multi-dimensional arrays and vectors.
Just copy the project folder somewhere (most conveniently where you invoke the interpreter from), do `#include "cpp-dump/dump.hpp` and then `cpp_dump(myVariable)` to print your variables.
You can see how it looks in this example where I was mucking about with permutations of vectors: https://i.imgur.com/yRpY5Bj.png
Along the lines of scripting is interactive programming. I'm working on a native Clojure dialect on LLVM with C++ interop, qalled jank. It can JIT compile C++ code, can be embedded into any C++-compatible application, and is a full Clojure dialect which doesn't hid any of its C++ runtinme. So you can do inline C++, compile C++ sources alongside your jank. and require them like a normal Clojure namespace. Worth a look if you're using C++ but you're craving something more interactive. https://jank-lang.org/
I wonder if I can use this to learn a large c++ codebase like Chromium. One of the issues I had trying to learn chromium was that in order to play and experiment with their classes/functions I needed to spend several minutes to link my little test code with their static libraries just to be able to see if my understanding of them was correct. Which is just too long of time for such experiments so I gave up.
Last time I checked out Chromium code base it was about 300-400 Megs of uncompressed cpp files. Lets not also forget the fact you also needed to run some code generator script that generated another 200 Megs of DOM files, or interface files. At that point in time I gave up and went to sleep and never touched it again.
I really hope Ladybird is able to stay relatively small and approachable, it would be wonderful to have as a truly customizable open-source browser that's not a massive codebase that takes forever to compile and develop.
I do think there are some geniuses working on the Chromium code base, and I would imagine there are really good reasons for doing it their way. I would imagine also Ladybird over time will face the same problems, and come up with similar solutions as Chromium team.
All I know is most of the large scale C/C++ code bases eventually become these monolithic giant code bases that require some really specialised software tools to compile and link.
I agree on all fronts. This parallels the last time I looked at, and gave up building a backend for LLVM. And that was after giving up doing the same for GCC. Those codebases are _impenetrable_.
It's clear as mud how one would hook a jumbo codebase into the REPL. If it's possible, that would be a game changer.
I added LLVM JIT support to https://ossia.io a few years ago, it's not too bad, but a big issue is that the JIT does not support all the necessary features used by the frontend in terms of relocations, etc. So it happens relatively often that C++ code will compile to LLVM IR without issue, but then fail at the JIT step because some relocation is not supported by the JIT engine yet.
How feasible would it be for something like gdb to be able to use a C++ interpreter (whether icpp, or even a souped up `constexpr` interpreter from the compiler) to help with "optimized out" functions?
gdb also doesn't handle overloaded functions well, e.g. `x[i]`.
It does though? Just compiled a small program that creates a vector, and GDB is perfectly happy accessing it using this syntax. It will even print std::string’s correctly if you cast them to const char* by hand. (Linux x86-64, GDB 14.2.)
I've defined a few pretty printers, but `operator[]` doesn't work for my user-defined types.
Knowing it works for vectors, I'll try and experiment to see if there's something that'll make it work.
(gdb) p unrolls_[0]
Could not find operator[].
(gdb) p unrolls_[(long)0]
Could not find operator[].
(gdb) p unrolls_.data_.mem[0]
$2 = {
`unrolls_[i]` works within C++. This `operator[]` method isn't even templated (although the container type is); the index is hard-coded to be of type `ptrdiff_t`, which is `long` on my platform.
> This `operator[]` method isn't even templated (although the container type is)
That might be it. If that operator isn’t actually ever emitted out of line, then GDB will (naturally) have nothing to call. If it helps, with the following program
template<typename T>
struct Foo {
int operator[](long i) { return i * 3; }
};
Foo<bool> bar;
template int Foo<bool>::operator[](long); // [*]
int main(void) {
Foo<int> foo;
__asm__("int3");
return foo[19];
}
compiled at -g -O0 I can both `p foo[19]` and `p bar[19]`, but if I comment out the explicit instantiation marked [*], the latter no longer works. At -g -O2, the former does not work because `foo` no longer actually exists, but the latter does, provided the instantiation is left in.
I was observing that `p (const char *)str` also worked in my experiment, but I’m far from a C++ expert and upon double-checking this seems to have been more of an accident than intended behaviour, because there is no operator const_pointer in basic_string that I can find. Definitely use `p str.c_str()`.
That explanation doesn't work IMO, unless `str` is a std::string pointer, which is contrary to the syntax GP suggested with `str.c_str()`.
It doesn't seem possible in actual C++ that the cast from non-pointer to pointer would work at all (even if a small string happens to be inlined at offset 0.) Like GP, I looked for a conversion operator, and I don't think it's there. Maybe it is a feature of the gdb parser.
I've done it by embedding libclang into an executable. You still have to be really careful to keep ABI compatibility between the host and the JITed plugin, if you want to send and receive complex C++ objects. Most likely you'll need to set up a simple C ABI and reconstruct the objects on either side of the interface. The last thing you want is to send std::string across a DLL boundary.
Borrowed this idea from Nothing Real, the developers of Shake, the video/film compositing system.