More

attractivechaos · 2025-12-10T21:11:11 1765401071

Even if getHeaders() has security/performance concerns, the better solution is to make it an alias to the newer headers.get() in this case. Keeping the old API is a small hassle to a handful of developers but breaking existing code puts a much bigger burden on a lot more users.

nodesocket · 2025-12-10T21:25:29 1765401929

Ya, why not just alias old api calls to the new if implementation details changed?

attractivechaos · 2025-12-10T02:15:31 1765332931

This is a smart implementation of Robin Hood hashing I am not aware of. In my understanding, a standard implementation keeps the probe length of each entry. This one avoids that due to its extra constraints. I don't quite understand the following strategy, though

> To meet property (3) [if the key 0 is present, its value is not 0] ... "array index plus one" can be stored rather than "array index".

If hash code can take any value in [0,2^32), how to define a special value for empty buckets? The more common solution is to have a special key, not a special hash code, for empty slots, which is easier to achieve. In addition, as the author points out, supporting generic keys requires to store 32-bit hash values. With the extra 4 bytes per bucket, it is not clear if this implementation is better than plain linear probing (my favorite). The fastest hash table implementations like boost and abseil don't often use Robin Hood hashing.

attractivechaos · 2025-12-05T05:43:31 1764913411

> But you only need about 5% of the concepts in that comment to be productive in Rust.

The similar argument against C++ is applicable here: another programmer may be using 10% (or a different 5%) of the concepts. You will have to learn that fraction when working with him/her. This may also happen when you read the source code of some random projects. C programmers seldom have this problem. Complexity matters.

scuff3d · 2025-12-05T06:41:02 1764916862

There's also the problem of the people who are either too clever for their own good, or not nearly as clever as they think they are. Either group can produce horribly convoluted code to perform relatively simple tasks, and it's irritating as hell everytime I run into it. That's not unique to Rust of course, but the more tools you give to them the bigger mess they make.

hu3 · 2025-12-05T14:35:18 1764945318

And this is amplified by LLMs.

Before LLMs, only the author had a firm grasp of how their convoluted solution works.

Now sometimes not even the author knows wtf is going on among thousands of added lines of code.

diarrhea · 2025-12-05T18:18:44 1764958724

A similar problem applies to Go, just inverted. Take iteration. The vast majority of use cases for iterating over containers are map, filter, reduce. Go doesn't have these functions. That's very simple! All Go developers are aligned here: just use a for loop. There's no room for "10% of concepts corners", there's just that 1 corner.

But, for loops get tedious. So people will make helper functions. Generic ones today, non-generic in the past. The result is that you have a zoo of iteration-related helper functions all throughout. You'll need to learn those when onboarding to a new code base as well. Go's readability makes this easier, but by definitions everything's entirely non-standard.

jcgl · 2025-12-06T08:15:25 1765008925

> The result is that you have a zoo of iteration-related helper functions all throughout. You'll need to learn those when onboarding to a new code base as well.

This is overblown, imo. Now just generics exist, you just define Map(), Filter(), and Reduce() in your internal util package. So yes, a new dev needs to find the util package. But they need to do that anyway.

What’s more, these particular functions don’t spread into the type signatures of other functions. That means a new dev only has to go looking for them when they themselves want to use those functions.

Sure, it’s not entirely ideal maybe. But the tone and content of your comment makes it sound a zillion times worse than it is.

cestith · 2025-12-05T20:18:36 1764965916

This was a big complaint against Forth. Every Forth site used a slightly different “standard” library.

kibwen · 2025-12-05T14:05:25 1764943525

No, this is overly simplistic. The features in the quoted comment are largely things that nobody other than stdlib developers need to understand. There is no bespoke subset-dialect of Rust where people are tossing around the `fundamental` attribute--it is strictly an obscure detail that not even an expert Rust programmer would be expected to have even heard of.

zozbot234 · 2025-12-05T14:13:25 1764944005

The main issue with "fundamental" is that it's currently unstable, but a stable version of it could definitely be useful for lessening the "orphan rule" constraints on implementing traits. Probably would want a different name such as #[deorphan] though.

attractivechaos · 2025-09-15T16:38:10 1757954290

FASTA was invented in late 1980s. At that time, unix tools often limited line length. Even in early 2000s, some unix tools (on AIX as I remember) still had this limit.

attractivechaos · 2025-08-09T15:40:02 1754754002

> Another thing is human readable is typically synonymous with unindexed

Indexing is not directly related to binary vs text. Many text formats in bioinformatics are indexed and many binary formats are not when they are not designed with indexing in mind.

> a human-eye check is tedious/impossible because you have to scroll through gigabytes to find what you want.

Yes, indexing is better but without indexing, you can use command line tools to extract the portion you want to look at and then pipe to "more" or "less".

attractivechaos · 2025-08-09T13:24:46 1754745886

> human-readable files are ridiculously inefficient on every axis you can think of (space, parsing, searching, processing, etc.).

In bioinformatics, most large text files are gzip'd. Decompression is a few times slower than proper file parsing in C/C++/Rust. Some pure python parsers can be "ridiculously inefficient" but that is not the fault of human-readability. Binary files are compressed with existing libraries. Compressed binary files are not noticeably faster to parse than compressed text files. Binary formats can be indeed smaller but space-efficienct formats take years to develop and tend to have more compatibility issues. You can't skip the text format phase.

> And at that scale, "readable" has no value, since it would take you longer to read the file than 10 lifetimes.

You can't read the whole file by eye, but you can (and should often) eyeball small sections in a huge file. For that, you need a human-readable file format. A problem with this field IMHO is that not many people are literally looking at the data by eye.

kaathewise · 2025-08-09T15:15:29 1754752529

One of the problems is that a lot of bioinformatics formats nowadays have to hold so much data that most text editors stop working properly. For example, FASTA splits DNA data into lines of 50-80 characters for readability. But in FASTQ, where the '>' and '+' characters collide with the quality scores, as far as I know, DNA and the quality data are always put into one line each. Trying to find a location in a 10k long line gets very awkward. And I'm sure some people can eyeball Phred scores from ASCII, but I think they are a minority, even among researchers.

Similarly, NEXUS files are also human-readable, but it'd be tough to discern the shape of inlined 200 node Newick trees.

When I was asking people who did actual bioinformatics (well, genomics) what some of their annoyances when working with the bioinf software were, having to do a bunch of busywork on files in-between pipeline steps (compressing/uncompressing, indexing) was one of the complaints mentioned.

I think there's a place in bioinformatics for a unified binary format which can take care of compression, indexing, and metadata. But with that list of requirements it'd have to be binary. Data analysis moved from CSVs and Excel files to Parquet, and I think there's a similar transition waiting to happen here

jltsiren · 2025-08-09T17:49:08 1754761748

My hypothesis is that bioinformatics favors text files, because open source tools usually start as research code.

That means two things. First, the initial developers are rarely software engineers, and they have limited experience developing software. They use text files, because they are not familiar with the alternatives.

Second, the tools are usually intended to solve research problems. The developers rarely have a good idea what the tools eventually end up doing and what data the files need to store. Text-based formats are a convenient choice, as it's easy extend and change them. By the time anyone understands the problem well enough to write a useful specification, the existing file format may already be popular, and it's difficult to convince people to switch to a new format.

mfld · 2025-08-11T08:19:49 1754900389

Yes, most bioinformatics tools are the result of research projects.

However, the most common bioinformatics file formats have actually been devised by excellent software engineers (e.g. SAM/BAM, VCF, BED).

I think it is just very convenient to have text-based formats as you don't need any special libraries to read/modify the files and can reach for basic Unix text-processing tools instead. Such modifications are often needed in a research context.

Also, space-efficient file formats (e.g. CRAM) are often within reach once disk space becomes a pressing issue. Now you only need to convince the team to use them. :)

kaathewise · 2025-08-09T18:23:15 1754763795

Totally. A good chuck of the formats are just TSV files with some metadata in header. Setting aside the drawbacks, this approach is both straightforward and flexible.

I think we're seeing some change in that regard, though. VCF got BCF and SAM and got BAM

attractivechaos · 2025-03-19T23:43:52 1742427832

For linked lists and binary trees, intrusive data structures are better.

> Well, except the first one, template macros, where I can’t really find any pro, only cons.

For toy examples, the first (expanding a huge macro) has mostly cons. But it is more flexible when you want to instantiate different parts of the header. The second approach can work but will be clumsy in this case because the whole header is considered as one unit.

attractivechaos · 2025-03-06T17:41:41 1741282901

The title sounds like this is a standalone allocator, but the implementation seems to handle thread-local storage only. The hard work is done elsewhere.

attractivechaos · 2025-03-05T17:20:35 1741195235

The hardware may be ten million times faster, but the software...

attractivechaos · 2025-02-13T18:00:08 1739469608

I was hit by a similar thing. Rust once caused regression failures in 5000+ packages due to incompatibility with older "time" packages [1]. It was considered okay. At that point, I don't care what they say about semver.

[1]: https://github.com/rust-lang/rust/issues/127343#issuecomment...

purplesyringa · 2025-02-13T18:08:27 1739470107

The comment you linked to explicitly shows that a maintainer does not consider this "okay" at all. T-libs-api made a mistake, the community got enraged, T-libs-api hasn't made such a mistake since. The fact that it happened sucks, but you can't argue that they didn't admit the failure.

generalizations · 2025-02-13T18:43:43 1739472223

"a maintainer"

The way you word that makes it sound like "the maintainers" and "T-libs-api" do not consider this "okay". Reading just above the linked comment, however, puts a very different impression of the situation:

> We discussed this regression in today's @rust-lang/libs-api team meeting, and agree there's nothing to change on Rust's end. Those repos that have an old version of time in a lockfile will need to update that.

estebank · 2025-02-13T19:43:10 1739475790

You're reading an artifact of a point in time, before the it hit stable and the rest of the project found out about this. t-libs-api misunderstood the impact because in the past there had been situations that looked similar and were unproblematic to go ahead with, but weren't actually similar. There were follow up conversations, both in public and private, where the consensus arrived was that this was not ok.

generalizations · 2025-02-13T20:40:38 1739479238

What I'm hearing is that the nature of the issue was recognized - that this was a breaking change; but that the magnitude of the change and the scale of the impact of that break was underestimated.

TBH that does not inspire confidence. I would expect that something claiming or aspiring to exhibit good engineering design would, as a matter of principle, avoid any breaking change of any magnitude in updates that are not intended to include breaking changes.

hitekker · 2025-02-13T18:53:47 1739472827

Thanks for clarifying. I took a look as well, and the very first reply confirms your opinion and that of the GP's parent. Plenty of downvotes and comments that come after criticizing the maintainers, "I am not sure how @rust-lang/libs-api can look at 5400 regressions and say "eh, that's fine"."

Not sure why people are trying to cover this up.

estebank · 2025-02-13T19:25:59 1739474759

It's not covering I up. The people that commented, including the one you quote are part of the project.

attractivechaos · 2025-02-13T21:51:48 1739483508

You are sincere. I believe this is not a cover-up but more of a misunderstanding. Think this way: many people coming to that github thread don't know who are core rust devs but they can clearly see the second commenter is involved. That comment denied this being a major issue and concluded the decision was made as a team. To the public and perhaps some kernel devs, this may be interpreted as the official attitude.

pornel · 2025-02-13T19:07:56 1739473676

The change itself was very reasonable. They only missed the mark on how that change was introduced. They should have waited with it until the next Rust edition, or at least held back a few releases to give users of the one affected package time to update.

The change was useful, fixing an inconsistency in a commonly used type. The downside was that it broke code in 1 package out of 100,000, and only broke a bit of useless code that was accidentally left in and didn't do anything. One package just needed to delete 6 characters.

Once the new version of Rust was released, they couldn't revert it without risk of breaking new code that may have started relying on the new behavior, so it was reasonable to stick with the one known problem than potentially introduce a bunch of new ones.

hnaccount_rng · 2025-02-13T21:30:41 1739482241

But that is not how backwards compatibility works. You do not break user space. And user space is pretty much out of your control! As a provider of a dependency you do not get to play such games with your users. At least not, when those users care about reliability.

estebank · 2025-02-13T18:38:32 1739471912

That was a mistake and a breakdown in processes that wasn't identified early enough to mitigate the problem. That situation does not represent the self imposed expectations on acceptable breakage, just that we failed to live up to it and by the time it became clearer that the change was problematic it was too late to revert course because then that would have been a breaking change.

Yes: adding a trait to an existing type can cause inference failures. The Into trait fallback, when calling a.into() which gives you back a is particularly prone to it, and I've been working on a lint for it.

generalizations · 2025-02-13T18:41:32 1739472092

TBH that's a level of quality control that probably informs the Linux kernel dev's view of Rust reliability - it's a consideration when evaluating the risk of including that language.

Avamander · 2025-02-13T18:46:26 1739472386

Are you sure you want to start comparing the quality control of C and Rust packaging or reliability?

whoknowsidont · 2025-02-13T19:08:10 1739473690

Your comment misunderstands the entire point and risk assessment of what's being talked about.

It's about the overall stability and "contract" of the tooling/platform, not what the tooling can control under it. A great example was already given: It took clang 10 years to be "accepted."

It has nothing to do with the language or its overall characteristics, it's about stability.

generalizations · 2025-02-13T18:48:29 1739472509

I trust the quality control of the Linux kernel devs a lot more than the semantics of a language.

dralley · 2025-02-13T19:17:02 1739474222

Kernel devs more than almost everyone else are well aware that even the existing C toolchains are imperfect.

attractivechaos · 2025-02-13T19:15:15 1739474115

Maintaining backward compatibility is hard. I am sympathetic. Nonetheless, if the rust dev team think this is a big deal, then clarify in release notes, write a blog post and make a commitment that regression at this level won't happen again. So far, there is little official response to this event. The top comment in the thread I point to basically thinks this is nothing. It is probably too late do anything for this specific issue but in future it would be good to explain and highlight even minor compatibility issues through the official channel. This will give people more confidence.

estebank · 2025-02-13T19:23:23 1739474603

> Nonetheless, if the rust dev team think this is a big deal, then clarify in release notes, write a blog post and make a commitment that regression at this level won't happen again. So far, there is little official response to this event.

There was an effort to write such a blog post. I pushed for it. Due to personal reasons (between being offline for a month and then quitting my job) I didn't have the bandwidth to follow up on it. It's on my plate.

> The top comment in the thread I point to basically thinks this is nothing.

I'm in that thread. There are tons of comments by members of the project in that thread making your case.

> It is probably too late do anything for this specific issue but it would be good to explain and highlight even minor compatibility issues through the official channel.

I've been working on a lint to preclude this specific kind of issue from ever happening again (by removing .into() calls that resolve to its receiver's type). I customized the diagnostic to tell people exactly what the solution is. Both of these things should have been in place before stabilization at the very least. That was a fuck up.

> This will give people more confidence.

Agreed.

attractivechaos · 2025-02-13T19:41:30 1739475690

Thanks for the clarification. This has given me more confidence in rust's future.

twoodfin · 2025-02-13T19:27:47 1739474867

It’s hard for me to tell if you’re describing a breakdown in the process for evolving the language or the process for evolving the primary implementation.

Bugs happen, CI/CD pipelines are imperfect, we could always use more lint rules …

But there’s value in keeping the abstract language definition independent of any particular implementation.

deathanatos · 2025-02-13T18:05:11 1739469911

> At that point, I don't care what they say about semver.

Semver, or any compatibility scheme, really, is going to have to obey this:

> it is important that this API be clear and precise

—SemVer

Any detectable change being considered breaking is just Hyrum's Law.

(I don't want to speak to this particular instance. It may well be that "I don't feel that this is adequately documented or well-known that Drop isn't considered part of the API" is valid, or arguments that it should be, etc.)