You're two library functions away from having it easy: Copy from JavaScript to W...

singularity2001 · on Oct 26, 2024

From the new official WASM proposal:

https://github.com/WebAssembly/js-string-builtins/blob/main/...

"the overhead of importing glue code is prohibitive for primitives such as String, ArrayBuffer, RegExp, Map, and BigInt where the desired overhead of operations is a tight sequence of inline instructions, not an indirect function call"

I guess the more elegant and universal stringref proposal is DEAD now !?

https://github.com/WebAssembly/stringref/blob/main/proposals...

I don't really mind, as it keeps the wasm bytecode cleaner.

davexunit · on Oct 26, 2024

Quote from https://wingolog.org/archives/2023/10/19/requiem-for-a-strin...

    We don’t yet have consensus on this proposal in the Wasm standardization group, and we may never reach there, although I think it’s still possible. As I understand them, the objections are two-fold:
 
    WebAssembly is an instruction set, like AArch64 or x86. Strings are too high-level, and should be built on top, for example with (array i8).
 
    The requirement to support fast WTF-16 code unit access will mean that we are effectively standardizing JavaScript strings.

I really like stringref and hope the detractors can be convinced of its usefulness. Dealing with strings is not fun right now.

robocat · on Oct 26, 2024

> Dealing with strings is not fun right now.

And dealing with strings isn't fun in many other languages or runtimes or OSes.

e.g.1. C# "Strings in .NET are stored using UTF-16 encoding. UTF-8 is the standard for Web protocols and other important libraries. Beginning in C# 11, you can add the u8 suffix to a string literal to specify UTF-8 encoding. UTF-8 literals are stored as ReadOnlySpan<byte> objects" - https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

e.g.2. Erlang/BEAM/Elixir: "The Erlang string type is implemented as a single-linked-list of unicode code points. That is, if we write “Hello” in the language, this is represented as [$H, $e, $l, $l, $o]". The overhead of this representation is massive. Each Cons-cell use 8 bytes for the code point and 8 bytes for the pointer to the next value. This means that the 5-byte ASCII-representation of “Hello” is 5*16 = 80 bytes in the Erlang representation." - https://medium.com/@jlouis666/erlang-string-handling-7588daa...

debugnik · on Oct 26, 2024

> The Erlang string type is implemented as a single-linked-list

This refers just to Erlang's string() type, not BEAM strings in general; it's just a bad default. If you're not using binaries, you're doing it wrong, and that's exactly why Elixir's strings are UTF-8 binaries.

davexunit · on Oct 26, 2024

Okay? Is this an argument in favor of doing nothing?

xscott · on Oct 26, 2024

Thank you for the links. To the extent I understood it from a quick reading, it all looks like stuff you could get with the existing import/export mechanisms. I would choose (modified) UTF-8, but I understand why UCS16 is always going to be around.

I agree about keeping wasm bytecode cleaner. The core plus simd stuff is such a great generalization of the ARM and X86 CPUs we mostly use. The idea of gunking it all up with DOM related stuff is distasteful.