I like this, but unfortunately it doesn't solve one annoying problem: lexical scope doesn't work and it will fail in an unexpected way.
If you reference something lexically, your code fails at runtime. Want to use an import? You have to use import() inside the closure you pass to spawn(). Typescript doesn't know this. Your language server doesn't know this. Access a variable that shadows a built in global? Now you're accessing the built in global.
The only way this could even be addressed is by having a full on parser. Even then you can't guarantee things will work.
I think the only "fix" is for JS to introduce a new syntax to have a function that can't access lexical scope, returning a value that either extends a subclass of Function or has a cheeky symbol set on it. At least then, it'll fail at compile time.
There is a simple solution to this problem, but it's not very popular: do the same thing Workers do, require using a separate file. All the tooling works out of the box, you have no issues with lexical scoping, etc. The only downside is it's (currently) clunky to work with, but that can be fixed with better interfaces.
I've been using a functionally identical implementation of this since I wrote it in my startup's codebase a decade ago. It's really handy, but definitely not without edge case issues. I've occasionally had to put in workarounds for false positive TypeScript/lint errors or a tool in the bundling pipeline trying to be too clever and breaking the output.
Overall it's great, and I'm glad to see a generic implementation of it which will hopefully become a thriving open source project, but ultimately it's a kludge. What's really needed is for JS to introduce a native standardized version of this construct which TypeScript and the rest of the ecosystem have to play nice with.
A linter rule provided by the library could be helpful here. I know it's just a workaround but probably easier than going for a solution that does compile time checks.
This should be the expected behavior when multithreading. It is the expected behavior when executing a child process, such as node’s child_process.fork.
Fork, and normal worker threads always enter a script, there's clearly no shared lexical scope. This spawn method executes a function, but that fn can't interact with the scope outside
While I agree with GP that this should be the expected behavior, your comment raises what I think is a large problem/wild-goose-chase in ‘modern’ language designs implementing concurrency.
The push from language designers (this applies across the high/low level spectrum and at all ranges of success for languages) to make concurrent code ‘look just like’ linearly read, synchronous, single-threaded code is pervasive and seems to avoid large pushback by users of the language. The complaints that should be made against this syntax design become complaints that code doesn’t do what developers think it should.
My position is that concurrent (and parallel) code IS NOT sequential code and languages should embrace those differences. The move to or design of async/await is often explicitly argued for from this position. But the semantic differences in concurrent code IMO should not be obscured or obfuscated by seeking to conform that code to sequential code’s syntax.
I’d love a way to be able to specify that sort of thing. I wrote a little server-side JSX rendering layer, and event handlers were serialized to strings, and so they had similar restrictions.
> Serialization Protocol: The library uses a custom "Envelope" protocol (PayloadType.RAW vs PayloadType.LIB). This allows complex objects like Mutex handles to be serialized, sent to a worker, and rehydrated into a functional object connected to the same SharedArrayBuffer on the other side.
It's kinda "well, yes, you can't share objects, but you can share memory. So make objects that are just thin wrappers around shared memory"
I'd be interested to see a comparison with https://piscinajs.dev/ - does this achieve more efficient data passing for example?
Lack of easy shared memory has always felt like a problem to me in this space, as often the computation I want to off-load requires (or returns) a lot of data.
This looks great. If it works as well as the readme suggests, this’ll let me reach for Bun in some of the scenarios where I currently reach for Go. Typescript has become my favorite language, but the lack of efficient multithreading is sometimes a deal breaker.
I added a bit more information about Bun compatibility:
> While Bun is supported and Bun does support the `using` keyword, it's runtime automatically creates a polyfill for it whenever Function.toString() is called. This transpiled code relies on specific internal globals made available in the context where the function is serialized. Because the worker runs in a different isolated context where these globals are not registered, code with `using` will fail to execute.
Interesting. Are you talking about the latency to spawn new workers, or getting data from the main thread to the worker? To give you an idea, this library uses a lazily initialized thread pool (thread-per-core by default), where tasks are shared between workers (like the Tokio library in Rust). This means workers only need to be initialized once, and passing data via structured clone is usually very fast and optimized in most engines. Better yet is to use ArrayBuffer or SharedArrayBuffer, which can be transferred or shared between threads without any serialization overhead.
(I suspect, to paraphrase Greenspun's rule, any sufficiently complicated app using Web Workers contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of this library...)
It was a design decision to make the syntax feel as familiar to Rust as possible. But I do agree that it's a bit verbose and that it won't hurt to add a .dispose() handle to the objects themselves.
From an overall system point of view, this is the current pinnacle of footgun design.
The OS does thread management and scheduling, facilitates IPC, locking, etc. All of this is one big largely-solved problem (at least for the kind of things most people are doing in JavaScript today). But because of history, we now have a very popular language and runtimes that are trying to replicate all these features, reinventing wheels, and adding layers on inefficiency to the overall execution.
I don’t disagree with you about the additional inefficiency that is very likely to accumulate as JS adds more and more ‘features’ (via the language, frameworks, or libraries). But as a genuine question, isn’t this reimplementation (or any comparable library for multithreading) required by JavaScript’s position on sandboxing. I would be suspicious of intent if browsers were allowed to spawn any number of threads to execute non-trusted scripts at the level typically seen from more native application code.
Allowing access to native threading doesn’t imply that the API provided by the language is unrestricted. There is a (very wide) middle zone to land in.
Documentation here is exceptionally well written for a JS project, although move() doing different things depending on the type of data you pass to it feels like a foot-gun, and also how is it blocking access to arrays you pass to it?
One such example: https://github.com/developit/workerize-loader
reply