On a semi-related note, there was recently a discussion[1] on the F3 file format...

lifthrasiir · 2025-10-07T07:36:26 1759822586

As someone seriously trying to develop a compressed archive format with WebAssembly, sandboxing is actually easy and that's indeed why WebAssembly was chosen. The real problem is determinism, which WebAssembly does technically support but actual implementations may vary significantly. And even when WebAssembly can be made fully deterministic, function calls made to those WebAssembly modules may still be undeterministic! I tried very hard to avoid such pitfalls in my design, and it is entirely reasonable to avoid WebAssembly due to these issues.

bangaladore · 2025-10-07T17:04:01 1759856641

I'm confused why determinism is a problem here? You write an algorithm that should produce the same output for a given input. How does WASM make that not deterministic?

lifthrasiir · 2025-10-07T22:49:43 1759877383

Assume that I have 120 MB of data to process. Since this is quite large, implementations may want to process them in chunks (say, 50 MB). Now those implementations would call the WebAssembly module multiple times with different arguments, and input sizes would depend on the chunk size. Even though each call is deterministic, if you vary arguments non-deterministically then you lose any benefit of determinism: any bug in the WebAssembly module will corrupt data.

bangaladore · 2025-10-08T01:02:15 1759885335

But that is the case in any language and runtime? There is nothing unique about WASM here.

lifthrasiir · 2025-10-08T01:36:56 1759887416

Yes and that's exactly my point. It is not enough to make the execution deterministic.

Thinking about that, you may have been confused why I said it's reasonable to avoid WebAssembly for that. I meant that a full Turing-complete execution might not be necessary if that makes it easier to ensure the correctness; OpenZL graphs are not even close to a Turing-complete language for example.

TiredOfLife · 2025-10-07T07:23:52 1759821832

And no mention of zpaq that has had emedable decompressors feature for 15 years

blank_state · 2025-10-07T15:02:12 1759849332

you did not read the white paper then

snapplebobapple · 2025-10-07T04:12:22 1759810342

Isnt that a huge vector for viruses if exevutable code is included in the compressed archive?

themerone · 2025-10-07T04:15:53 1759810553

Wasm can be sandboxed. Its a safe as visiting a website with javascript.

orangeboats · 2025-10-07T06:08:19 1759817299

Can't the decompressor still produce a malicious uncompressed file?

tlb · 2025-10-07T06:20:06 1759818006

Any decompressor can produce a malicious file. Just feed a malicious file to the compressor.

orangeboats · 2025-10-07T06:30:14 1759818614

Yes, but currently the decompressors we use (so things like zstd, zlib, 7z) come from a mostly-verifiable source -- either you downloaded it straight from the official site, or you got it from your distro repo.

However, we are talking about an arbitrary decompressor here. The decompressor WASM is sandboxed from the outside world and it can't wreak havoc on your system, true, but nothing stops it from producing a malicious uncompressed file from a known good compressed file.

mort96 · 2025-10-07T07:13:24 1759821204

The format-specific decompressor is part of the compressed file. Nothing here crosses a security boundary. Either the compressed file is trustworthy and therefore decompresses into a trustworthy file, or the compressed file is not trustworthy and therefor decompresses into a non-trustworthy file.

If the compressed file is malicious, it doesn't matter whether it's malicious because it originated from a malicious uncompressed file, or is malicious because it originated from a benign uncompressed file and the transformation into a compressed file introduces the malicious parts due to the bundled custom decompressor.

yorwba · 2025-10-07T06:39:45 1759819185

If the decompressor is included in the compressed file and it's malicious, the file can hardly be called known good.

tecleandor · 2025-10-07T07:17:19 1759821439

But also I guess the logic of the decompressor could output different files in different occasions, for example, if it detects a victim, making it difficult to verify.

viraptor · 2025-10-07T07:52:57 1759823577

If it can "detect a victim", then the sandbox is faulty. The decompressor shouldn't see any system details. Only the input and output streams.

jo-m · 2025-10-07T07:56:15 1759823775

So, not very safe.

snapplebobapple · 2025-10-07T15:16:49 1759850209

I think this is the first time a genuine technical question of mine rather than a social view has been downvoted here. Thats sad.