It's a cool video but it's formatted almost like a tutorial, with folks in the comments appearing excited to actually try it on their own bodies with lab equipment they have access to. It's pretty irresponsible given that I don't think he fully expressed the risks of doing this to your own body.
It's not just risky, it's hard to know if it really "worked" for many reasons.
This is why we run double-blind, randomized control trials- to be convinced that the treatment "worked".
How certain are you of this really? I'd take this bet with you.
You're saying that we won't achieve AGI in ~80 years, or roughly 2100, equivalent to the time since the end WW2.
To quote Shane Legg from 2009:
"It looks like we’re heading towards 10^20 FLOPS before 2030, even if things slow down a bit from 2020 onwards. That’s just plain nuts. Let me try to explain just how nuts: 10^20 is about the number of neurons in all human brains combined. It is also about the estimated number of grains of sand on all the beaches in the world. That’s a truly insane number of calculations in 1 second."
Are humans really so incompetent that we can't replicate what nature produced through evolutionary optimization with more compute than in EVERY human brain?
Throw away the H1B, introduce streamlined high skill immigration to the US. Top 1% of talent from all over the world should be able to move in under 2 weeks.
The first country that cracks this will have streets paved with gold.
There's a few options out there that have noticeably better compression, with the downside of being less widely-compatible with tools. zstd also has the benefit of being very fast (depending on your settings, of course).
CRAM compresses unmapped fastq pretty well, and can do even better with reference-based compression. If your institution is okay with it, you can see additional savings by quantizing quality scores (modern Illumina sequencers already do this for you). If you're aligning your data anyways, probably retaining just the compressed CRAM file with unmapped reads included is your best bet.
There are also other fasta/fastq specific tools like fqzcomp or MZPAQ. Last I checked, both of these could about halve the size of our fastq.gz files.
The fact that these formats are unable to represent degenerate bases (Ns in particular, but also the remaining IUPAC bases), in my experience renders them unusable for many, if not most, use-cases, including for the storage of FASTQ data
The question of how to represent things not specified in the original format is a tough one.
At the loosest end a format can leave lots of space for new symbols, and you can just use those to represent something new. But then not everyone agrees on what the new symbol means, and worse multiple groups can use symbols to mean different things.
On the other end of the spectrum, you can be strict about the format, and not leave space for new symbols. Then to represent new things you need a new standard, and people to agree on it.
It's mostly a question of how well code can be updated and agreed upon, how strict you can require your tooling to be w.r.t. formats.
The original FASTA/Pearson format and fasta/tfasta tools have supported 'N' for ambiguous nucleotides since at least 1996 [1], and the FASTQ format has to my knowledge always supported 'N' bases (i.e. since around 2000). IUPAC codes themselves date back to 1970 [2]. You can probably get away with not supporting the full range of IUPAC nucleotide codes, but not supporting 'N' makes your tool unusable to represent what is probably the majority of available FASTA/FASTQ data
There has to be a name for the fallacy where people in our profession imagine that everything in technology follows Moore's law -- even when it doesn't.
We're standing here with a kind of survivorship bias because of all the technologies we use daily that did cost reduce and make it. Plenty did not. We just forget about them.
He isn't saying they won't ever come down, he's saying they will not be coming down any time soon due to structural factors he discusses in the article.
Computers do get more powerful, but the price for a decent system has been about the same for a long time. $1500 got me a good system 10 years ago, but I'd still be paying $1500 for a good system today.
The first mobile phone, the Motorola DynaTAC 8000X, was launched in 1984 for $3,995 (more than $12k in 2025 dollars). So we should expect a 12x cost reduction in LLMs over 40 years.
IBM 3380 Model K introduced in 1987, has 7.5 GB of storage and costed about $160000 to $170000, or adjusted for inflation it is $455000 in 2025 US dollars, that's $60666/GB. A Solidigm D5-P5336 drive that can store 128 TB costs about $16500 in 2025 US dollars, that is $0.129/GB. That's a 470279x price reduction in slightly less than 40 years. So what is likely going to happen to LLM pricing? No one knows and both your example as well as mine doesn't mean anything.
e.g the amount of backlash thought emporium got when he genetically engineered himself to remove lactose intolerance: https://www.youtube.com/watch?v=J3FcbFqSoQY
Risky, but it's his body!
reply