It seems like bun caches the manifest responses. PNPM, for example, resolves all package versions when installing (without a lockfile), which is slower. The registry does have a 300 second cache time, so not faulting you there, but it means your benchmark is on the fully cached path, which you'd only hit when installing something for the first time. Subsequent installs would use the lockfile and bun and PNPM seem fast* in that case.
If I install a simple nextjs app, then remove node_modules, the lockfile, and the ~/.bun/install/cache/*.npm files (i.e. keep the contents, remove the manifests) and then install, bun takes around ~3-4s. PNPM is consistently faster for me at around ~2-3s.
I'm not familiar with bun's internals so I may be doing something wrong.
One piece of feedback, having the lockfile be binary is a HUGE turn off for me. Impossible to diff. Is there another format?
* I will mention that even in the best case scenario with PNPM (i.e. lockfile and node_modules) it still takes 400ms to start up, which, yes, is quite slow. So every action APART from the initial install is much MUCH faster with bun. I still feel 400ms is good enough for a package manager which is invoked sporadically. Compare that to esbuild which is something you invoke constantly, and having that be fast is such a godsend.
> It seems like the main thing that bun does to stay ahead is cache the manifest responses. PNPM, for example, resolves all package versions when installing (without a lockfile), which is slower.
This isn't the main optimization. The main optimization is the system calls used to copy/link files. To see the difference, compare `bun install --backend=copyfile` with `bun install --backend=hardlink` (hardlink should be the default). The other big optimization is the binary formats for both the lockfile and the manifest. npm clients waste a lot of time parsing JSON.
The more minor optimizations have to do with reducing memory usage. The binary lockfile format interns the strings (very repetitive strings). However, many of these strings are tiny, so it's actually more expensive to store a hash and a length separately from the string itself. Instead, Bun stores the string as 8 bytes and one bit bit says whether the entire string is contained inside those 8 bytes or if it's a memory offset into the lockfile's string buffer (since 64-bit pointers can't use the full memory address and bun currently only targets 64-bit CPUs, this works)
yarn also caches the manifest responses.
> If I install a simple nextjs app, then remove node_modules, the lockfile, and the ~/.bun/install/cache/.npm files (i.e. keep the contents, remove the manifests) and then install, bun takes around ~3-4s. PNPM is consistently faster for me at around ~2-3s.
This sounds like a concurrency bug with scheduling tasks from the main thread to the HTTP thread. I would love someone to help review the code for the thread pool & async io.
> One piece of feedback, having the lockfile be binary is a HUGE turn off for me. Impossible to diff. Is there another format?
If you do `bun install -y`, it will output as a yarn v1 lockfile.
Of course, I can't say for sure that he looked at the fastest possible way to parse json here, but my intuition would be that if he didn't, it's because he had an educated guess that it'd still be slower.
You don't need to go straight to simdjson et al, something like Rust serde which desierializes to typed structs with data bllike strings borrowed from the input can be very fast.
Nobody is arguing that JSON is equally as performant as binary formats. What the others are saying is that the amount of JSON in your average lock file should be small enough that parsing it is negligible.
If you were dealing with a multi-gigabyte lock file then it would be a different matter but frankly I agree with their point that parsing a lock file which is only a few KB shouldn’t be a differentiator (and if it is, then the JSON parser is the issue, and fixing that should be the priority rather than changing to a binary format).
Moreover the earlier comment about lock files needing to be human readable is correct. Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.
> I agree with their point that parsing a lock file which is only a few KB
You mean a few MB? NPM projects typically have thousands of dependencies. A 10MB lock file wouldn't be atypical and parse time for a 10MB JSON file can absolutely be significant. Especially if you have to do it multiple times.
> Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.
You can read and edit a SQLite file way easier than a huge JSON file.
GitHub (disclosure: where I work) does respect some directives in a repo’s .gitattributes file. For example, you can use them to override language detection or mark files as generated or vendored to change diff presentation. You can also improve the diff hunk headers we generate by default by specifying e.g. `*.rb diff=ruby` (although come to think of it I don’t know why that’s necessary since we already know the filetype — I’ll look into it)
In principal there’s no reason we couldn’t extend our existing rich diff support used for diffing things like images to enhance the presentation of lockfile diffs. There’s not a huge benefit for text-based lock files but for binary ones (if such a scheme were to take off) it would be a lot more useful.
Any way to use `.gitattributes` to specify a file is _not_ generated? I work on a repo with a build/ directory with build scripts, which is unfortunately excluded by default from GitHub's file search or quick-file selection (T).
Does this really work for jump to file? (we're not talking language statistics or supressing diffs on PRs, which is mostly what linguist readme is talking about).
> File finder results exclude some directories like build, log, tmp, and vendor. To search for files within these directories, use the filename code search qualifier.
(The inability of quick jumping to files from /build/ folder with `T` has been driving me crazy for YEARS!)
Correct me if I'm wrong, but checking those two files:
I don't see `/build` matching anything there. So to me this `/build` suppression from search results seems like controlled by some other piece of software at GitHub :/
I checked and you're right: The endpoint that returns the file list has a hardcoded set of excludes and pays no attention to `.gitattributes`.
I think it's reasonable to respect the linguist overrides here so I'll open a PR to remove entries from the exclude if the repo has a `-linguist-generated` or `-linguist-vendored` gitattribute for that directory [1]. So in your case you can add
build/** -linguist-generated
to `.gitattributes` and once my PR lands files under `build` should be findable in file-finder.
Thanks for pointing this out! Feel free to DM me on twitter (@cbrasic) if you have more questions.
On Linux, not yet. I don't have a machine that supports reflinks right now and I am hesitant to push code for this without manually testing it works. That being said, it does use copy_file_range if --backend=copyfile, which can use reflinks.
Still don't understand whhy we even need all these inodes.. The repo is centrally accessible (and should be read-only btw). Resolving that shouldn't be a problem. It's been more than a decade and npm is still a mess.
I'm ultra excited about Bun being finally open sourced, congrats on the amazing progress here Jarred!
Since JSC is actually compilable to Wasm [1] and Zig supports WASI compilation, I wonder how easy would be to get it running fully in WAPM with WASI. Any thoughts on how feasible that should be?
Congratulations for the release! You are doing an impressive work with bun. I find particulary exciting the built-in sqlite, and I cannot wait to move all my projects to bun. Egoistically speaking (my 2012 mbp doesn't support AVX2 instructions), I hope that now that the project is public, since you are going to get a lot of issue reports about the failure on install, you will find some time to get back to issue#67. Thank you, and keep up the excellent work.
Yeah, the install function of npm/yarn/pnpm are all incredbly slow. And also seems to get slower super-linearly with the number of dependencies. I have one project where it can take minutes (on my 2015 MacBook - admittedly it’s quicker on my brand new machine) just to add one be dependency and re-resolve the lock file. If that can solved by a reliable tool I’d definitely switch!
This is one of if not the most insane thing in web dev at the moment. Git can diff thousand of files between two commit in less time than it takes to render the result on screen. But somehow it can take actual minutes to find out where to place a dependency in a simple tree with npm. God, why ?
> it can take actual minutes to find out where to place a dependency in a simple tree with npm. God, why ?
npm is famous for a lot of things & reasons, but none of those are "because it's well engineered".
To this day, npm still runs the `preinstall` script after dependencies have actually been downloaded to your disk. It modifies a `yarn.lock` file if you have it on disk when running `npm install`. Lots of things like these, so that the install is slow, is hardly surprising.
I don't know exactly the "since when", but recently I was caught off guard when issuing `npm i` by mistake in a yarn project. It modifies "yarn.lock" by changing some, if not all, the registry from yarn pkg registry to npm package registry.
People building language tooling often use the language itself, even if it is not very suitable for the task at hand.
This happens because the tooling often requires domain knowledge which they have and if they set out to write tooling for a language they tend to be experienced in that language.
> Yarn v2 is backwards compatible though. You just need to use the node_modules "linker" (not the default one) and it's ready to go.
Last I checked, not quite. Yarn 2+ patches some dependencies to support PnP, even if you don’t use PnP. I discovered this while trying out both Yarn 2 and a pre-release version of TypeScript which failed to install—because the patch for it wasn’t available for that pre-release. I would have thought using node_modules would bypass that patching logic, but no such luck.
I have just discovered yarn2 / yarn3. The main advantage over npm / pnpm seems be the Zero Install philosophy [1] and the Plug'n'Play architecture [2]. Have you some feedback about these features?
By the way, the yarn2 / yarn3 project is hosted on a distinct repository [3].
The benchmark source code links on the homepage are "Not found".
Also a few questions:
What do you attribute the performance advantage to? How much of it is JavascriptCore instead of v8 versus optimized glue binding implementations in the runtime? If the latter, what are you doing to improve performance?
Similarly for the npm client: how much is just that bun is the only tool in a compiled /GC free language versus special optimization work?
How does Zigs special support for custom allocators factor in?
One of the things I'm excited about is bun install.
On Linux, it installs dependencies for a simple Next.js app about 20x faster than any other npm client available today.