I started building Bun a little over a year ago, and as of about 20 minutes ago,...

zebracanevra · on July 5, 2022

It seems like bun caches the manifest responses. PNPM, for example, resolves all package versions when installing (without a lockfile), which is slower. The registry does have a 300 second cache time, so not faulting you there, but it means your benchmark is on the fully cached path, which you'd only hit when installing something for the first time. Subsequent installs would use the lockfile and bun and PNPM seem fast* in that case.

If I install a simple nextjs app, then remove node_modules, the lockfile, and the ~/.bun/install/cache/*.npm files (i.e. keep the contents, remove the manifests) and then install, bun takes around ~3-4s. PNPM is consistently faster for me at around ~2-3s.

I'm not familiar with bun's internals so I may be doing something wrong.

One piece of feedback, having the lockfile be binary is a HUGE turn off for me. Impossible to diff. Is there another format?

* I will mention that even in the best case scenario with PNPM (i.e. lockfile and node_modules) it still takes 400ms to start up, which, yes, is quite slow. So every action APART from the initial install is much MUCH faster with bun. I still feel 400ms is good enough for a package manager which is invoked sporadically. Compare that to esbuild which is something you invoke constantly, and having that be fast is such a godsend.

Jarred · on July 5, 2022

> It seems like the main thing that bun does to stay ahead is cache the manifest responses. PNPM, for example, resolves all package versions when installing (without a lockfile), which is slower.

This isn't the main optimization. The main optimization is the system calls used to copy/link files. To see the difference, compare `bun install --backend=copyfile` with `bun install --backend=hardlink` (hardlink should be the default). The other big optimization is the binary formats for both the lockfile and the manifest. npm clients waste a lot of time parsing JSON.

The more minor optimizations have to do with reducing memory usage. The binary lockfile format interns the strings (very repetitive strings). However, many of these strings are tiny, so it's actually more expensive to store a hash and a length separately from the string itself. Instead, Bun stores the string as 8 bytes and one bit bit says whether the entire string is contained inside those 8 bytes or if it's a memory offset into the lockfile's string buffer (since 64-bit pointers can't use the full memory address and bun currently only targets 64-bit CPUs, this works)

yarn also caches the manifest responses.

> If I install a simple nextjs app, then remove node_modules, the lockfile, and the ~/.bun/install/cache/.npm files (i.e. keep the contents, remove the manifests) and then install, bun takes around ~3-4s. PNPM is consistently faster for me at around ~2-3s.

This sounds like a concurrency bug with scheduling tasks from the main thread to the HTTP thread. I would love someone to help review the code for the thread pool & async io.

> One piece of feedback, having the lockfile be binary is a HUGE turn off for me. Impossible to diff. Is there another format?

If you do `bun install -y`, it will output as a yarn v1 lockfile.

If you add this to your .gitattributes:

    *.lockb binary diff=lockb

It will print the diff as a yarn lockfile.

the_duke · on July 6, 2022

Did you benchmark if a the binary lockfile actually makes any appreciable difference for execution time?

Considering the speed with which a fast parser can gobble up JSON I'm somewhat skeptical that this would be relevant for common operations.

joeblubaugh · on July 6, 2022

> The other big optimization is the binary formats for both the lockfile and the manifest. npm clients waste a lot of time parsing JSON.

Yes he did

Ygg2 · on July 6, 2022

Where exactly?

I don't see it either. Perf data that shows that was the issue.

rattray · on July 6, 2022

Take a look at Jarred's twitter, and you'll see he spends a lot of time profiling things:

https://news.ycombinator.com/item?id=31993429

Of course, I can't say for sure that he looked at the fastest possible way to parse json here, but my intuition would be that if he didn't, it's because he had an educated guess that it'd still be slower.

Ygg2 · on July 10, 2022

That's not logically connected to statement "parsing JSON is major bottle neck".

It's just comparison of execution times of several different package manager.

Better would be parsing JSON vs binary in Bun.

jahewson · on July 6, 2022

Fast JSON parsers use many exotic tricks. Picking one optimisation and baking it into the file format isn’t so bad.

the_duke · on July 6, 2022

You don't need to go straight to simdjson et al, something like Rust serde which desierializes to typed structs with data bllike strings borrowed from the input can be very fast.

IshKebab · on July 6, 2022

It's still very slow compared to binary formats. Especially indexed ones like SQLite.

laumars · on July 6, 2022

Nobody is arguing that JSON is equally as performant as binary formats. What the others are saying is that the amount of JSON in your average lock file should be small enough that parsing it is negligible.

If you were dealing with a multi-gigabyte lock file then it would be a different matter but frankly I agree with their point that parsing a lock file which is only a few KB shouldn’t be a differentiator (and if it is, then the JSON parser is the issue, and fixing that should be the priority rather than changing to a binary format).

Moreover the earlier comment about lock files needing to be human readable is correct. Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.

IshKebab · on July 6, 2022

> I agree with their point that parsing a lock file which is only a few KB

You mean a few MB? NPM projects typically have thousands of dependencies. A 10MB lock file wouldn't be atypical and parse time for a 10MB JSON file can absolutely be significant. Especially if you have to do it multiple times.

> Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.

You can read and edit a SQLite file way easier than a huge JSON file.

zebracanevra · on July 5, 2022

> If you add this to your .gitattributes:

Not applicable to GitHub etc though.

I'm also not seeing any speed differences when using -y/yarn lockfile. Why not make it the default?

brasic · on July 6, 2022

> Not applicable to GitHub etc though.

GitHub (disclosure: where I work) does respect some directives in a repo’s .gitattributes file. For example, you can use them to override language detection or mark files as generated or vendored to change diff presentation. You can also improve the diff hunk headers we generate by default by specifying e.g. `*.rb diff=ruby` (although come to think of it I don’t know why that’s necessary since we already know the filetype — I’ll look into it)

In principal there’s no reason we couldn’t extend our existing rich diff support used for diffing things like images to enhance the presentation of lockfile diffs. There’s not a huge benefit for text-based lock files but for binary ones (if such a scheme were to take off) it would be a lot more useful.

hoten · on July 6, 2022

Any way to use `.gitattributes` to specify a file is _not_ generated? I work on a repo with a build/ directory with build scripts, which is unfortunately excluded by default from GitHub's file search or quick-file selection (T).

brasic · on July 6, 2022

Yes! Use `<pattern> -linguist-generated` (the minus sets a negative override for any gitattribute).

Here's a test demonstrating that this usage works: https://github.com/github/linguist/blob/32ec19c013a7f81ffaee...

jakub_g · on July 6, 2022

Does this really work for jump to file? (we're not talking language statistics or supressing diffs on PRs, which is mostly what linguist readme is talking about).

Quoting the docs on finding files:

https://docs.github.com/en/search-github/searching-on-github...

> File finder results exclude some directories like build, log, tmp, and vendor. To search for files within these directories, use the filename code search qualifier.

(The inability of quick jumping to files from /build/ folder with `T` has been driving me crazy for YEARS!)

Correct me if I'm wrong, but checking those two files:

- https://github.com/github/linguist/blob/master/lib/linguist/...

I don't see `/build` matching anything there. So to me this `/build` suppression from search results seems like controlled by some other piece of software at GitHub :/

Also, files from `/build` are not hidden in diffs, so per this table: https://github.com/github/linguist/blob/HEAD/docs/overrides.... they are not "linguist-generated".

brasic · on July 6, 2022

I checked and you're right: The endpoint that returns the file list has a hardcoded set of excludes and pays no attention to `.gitattributes`.

I think it's reasonable to respect the linguist overrides here so I'll open a PR to remove entries from the exclude if the repo has a `-linguist-generated` or `-linguist-vendored` gitattribute for that directory [1]. So in your case you can add

  build/** -linguist-generated

to `.gitattributes` and once my PR lands files under `build` should be findable in file-finder.

Thanks for pointing this out! Feel free to DM me on twitter (@cbrasic) if you have more questions.

[1] Recursively matching a directory with gitattributes requires the `/**` syntax unlike .gitignore: https://git-scm.com/docs/gitattributes#:~:text=with%20a%20fe...

brasic · on July 14, 2022

For anyone watching this thread, shoot me a DM if you’d like to test this change. It should be enabled in all repos within a week or two.

brasic · on July 26, 2022

This is shipped for github.com! See https://github.com/github/docs/commit/37df8eadb1f3bd0c260653... which is now live on https://docs.github.com/en/search-github/searching-on-github...

jakub_g · on July 6, 2022

Awesome! Thanks!

hoten · on July 6, 2022

Amazing, thanks!

oynqr · on July 6, 2022

Does it try to use reflinking with `--backend=copyfile` when the FS supports it?

Jarred · on July 6, 2022

On macOS it explicitly uses clonefile()

On Linux, not yet. I don't have a machine that supports reflinks right now and I am hesitant to push code for this without manually testing it works. That being said, it does use copy_file_range if --backend=copyfile, which can use reflinks.

aconbere · on July 6, 2022

Hmmm, pnpm also uses hardlinks (or a reflink if it’s available) to copy out of a content addressable on disk cache.

jbverschoor · on July 6, 2022

Still don't understand whhy we even need all these inodes.. The repo is centrally accessible (and should be read-only btw). Resolving that shouldn't be a problem. It's been more than a decade and npm is still a mess.

aconbere · on July 6, 2022

No arguments here! I'm consistently dismayed at the state of these tools :(

syrusakbary · on July 5, 2022

I'm ultra excited about Bun being finally open sourced, congrats on the amazing progress here Jarred!

Since JSC is actually compilable to Wasm [1] and Zig supports WASI compilation, I wonder how easy would be to get it running fully in WAPM with WASI. Any thoughts on how feasible that should be?

[1]: https://wapm.io/syrusakbary/jsc

panzerboiler · on July 5, 2022

Congratulations for the release! You are doing an impressive work with bun. I find particulary exciting the built-in sqlite, and I cannot wait to move all my projects to bun. Egoistically speaking (my 2012 mbp doesn't support AVX2 instructions), I hope that now that the project is public, since you are going to get a lot of issue reports about the failure on install, you will find some time to get back to issue#67. Thank you, and keep up the excellent work.

nicoburns · on July 5, 2022

Yeah, the install function of npm/yarn/pnpm are all incredbly slow. And also seems to get slower super-linearly with the number of dependencies. I have one project where it can take minutes (on my 2015 MacBook - admittedly it’s quicker on my brand new machine) just to add one be dependency and re-resolve the lock file. If that can solved by a reliable tool I’d definitely switch!

oreilles · on July 5, 2022

This is one of if not the most insane thing in web dev at the moment. Git can diff thousand of files between two commit in less time than it takes to render the result on screen. But somehow it can take actual minutes to find out where to place a dependency in a simple tree with npm. God, why ?

capableweb · on July 5, 2022

> it can take actual minutes to find out where to place a dependency in a simple tree with npm. God, why ?

npm is famous for a lot of things & reasons, but none of those are "because it's well engineered".

To this day, npm still runs the `preinstall` script after dependencies have actually been downloaded to your disk. It modifies a `yarn.lock` file if you have it on disk when running `npm install`. Lots of things like these, so that the install is slow, is hardly surprising.

hoten · on July 5, 2022

Since when would npm install modify a yarn lock file?

seniorsassycat · on July 6, 2022

Npm 7+ can read and write yarn.lock

https://github.com/npm/cli/blob/latest/workspaces/arborist/l...

rickstanley · on July 5, 2022

I don't know exactly the "since when", but recently I was caught off guard when issuing `npm i` by mistake in a yarn project. It modifies "yarn.lock" by changing some, if not all, the registry from yarn pkg registry to npm package registry.

tempest_ · on July 5, 2022

People building language tooling often use the language itself, even if it is not very suitable for the task at hand.

This happens because the tooling often requires domain knowledge which they have and if they set out to write tooling for a language they tend to be experienced in that language.

kaba0 · on July 7, 2022

JS definitely doesn’t help, but it is a surprisingly fast language with modern runtimes. The problem lies elsewhere.

nicoburns · on July 5, 2022

In fairness, I suspect your average node_modules folder has a lot more files than your average git repo (maybe even an order of magnitude more)

pxc · on July 6, 2022

> it can take actual minutes to find out where to place a dependency in a simple tree with npm. God, why ?

Is it even a tree? Does NPM still allow circular dependencies?

nine_k · on July 5, 2022

But who still uses npm for that, and for what reason? Yarn seems much faster.

solardev · on July 6, 2022

Is yarnv2 any better for this? I haven't tried it because it wasn't backward-compatible, but for fresh projects, would that be a better choice?

metaldrone · on July 6, 2022

Yes, yarn v2 (now v3) is a bit faster than yarn v1, but don't expect miracles.

Yarn v2 is backwards compatible though. You just need to use the node_modules "linker" (not the default one) and it's ready to go.

eyelidlessness · on July 6, 2022

> Yarn v2 is backwards compatible though. You just need to use the node_modules "linker" (not the default one) and it's ready to go.

Last I checked, not quite. Yarn 2+ patches some dependencies to support PnP, even if you don’t use PnP. I discovered this while trying out both Yarn 2 and a pre-release version of TypeScript which failed to install—because the patch for it wasn’t available for that pre-release. I would have thought using node_modules would bypass that patching logic, but no such luck.

conaclos · on July 6, 2022

I have just discovered yarn2 / yarn3. The main advantage over npm / pnpm seems be the Zero Install philosophy [1] and the Plug'n'Play architecture [2]. Have you some feedback about these features?

By the way, the yarn2 / yarn3 project is hosted on a distinct repository [3].

[1] https://yarnpkg.com/features/zero-installs

[2] https://yarnpkg.com/features/pnp

[3] https://github.com/yarnpkg/berry

niutech · on July 7, 2022

Maybe the problem lies in too many dependencies in the first place? Nowadays JS development is super bloated.

claytongulick · on July 5, 2022

It's a deeply impressive achievement, absolutely blows my mind that you were able to do it in a year.

andai · on July 5, 2022

Dang, nice work! Any idea where the slowness comes from in the other tools, how Bun manages to be so much faster?

the_duke · on July 5, 2022

The benchmark source code links on the homepage are "Not found".

Also a few questions:

What do you attribute the performance advantage to? How much of it is JavascriptCore instead of v8 versus optimized glue binding implementations in the runtime? If the latter, what are you doing to improve performance?

Similarly for the npm client: how much is just that bun is the only tool in a compiled /GC free language versus special optimization work?

How does Zigs special support for custom allocators factor in?

Jarred · on July 5, 2022

will fix the broken links shortly

edit: should be fixed