I've long dreamt of a system that compiles to native code, but stores a compressed SSA form (similar to SafeTSA or LLVM bitcode) in the binary for efficient runtime re-optimization based on profiling, somewhat similar to current Android Runtimes. One could then have several levels of debugging symbols, one that gives names to local variables represented by CFG nodes, and another that adds a compressed diff between some standardized decompiler output and the original source.
You could then decompile to some alternative syntax, but you'd lose any idiosyncratic formatting represented by the compressed diff.
I'm not aware how to get byte-for-byte identical source out of a Java class file or .NET assembly.
Last I checked, Java AoT compilation precluded runtime re-optimization, though I presume they've fixed that by now.
Last I checked, they both used stack-based bytecode, which typically takes longer to JIT and results in slower native code than a compressed SSA / control flow graph (see the SafeTSA papers).
Though SSA is deferred to JIT/ILC instead. In either case you get the access to all the actual low-level bits when you need to. No other portable target lets you do that.
If you add an extra blank line, or add a trailing space to a line, can you really get that back out from a .NET assembly? Normal disassembly at a minimum results in a loss of formatting.
Last time I looked, [Unison code] -> [entry in the AST DB] was a one-way process. Adding a function means writing it (with whatever style you like) and seeing if what you wrote constitutes a new function or an existing one. You can't fluff db entries back up in to human friendly code.
I don't see why it couldn't be done though, I think it just hasn't been a priority. Heck, you could have 100 different users collaborating in 100 different "languages", and so long as they serialized to the same AST and back, none of them would ever have to see the atrocious syntax which the other users prefer. Their editors and browsers could just render everything according to their users' preferences.
if they were truly represented by the same AST, I can't believe the differences between these "languages" would be anything more than swapping symbols out.
True. Perhaps "styles" or "syntaxes" would be better. It's all that stuff we spend time building consensus around, but that doesn't affect program behavior. If our tools respected our style preferences on a user-by-user basis, we could drop the consensus requirement altogether.
https://news.ycombinator.com/item?id=40882133