More

az09mugen · 2026-02-19T23:00:30 1771542030

I tried Emacs a bit after using Sublime Text for a while. I'm still using Sublime Text to this day because muscle memory, but the experience got me a deeper understanding of the capabilities of Sublime. While Emacs is profoundly hackable it feels a little bit "rough" on the edges. Sublime feels less hackable but more "clean".

I did not get IDEmacs ( https://codeberg.org/IDEmacs/IDEmacs ) to work but it basically it's an editor I would use.

For now fresh ( https://github.com/sinelaw/fresh/tree/master ) seems to be very promising.

Anyway I traded very happily the command palette Ctrl-Shift-P in Sublime for M-x and few other cool things.

Emacs will always have all my respect because of the concepts it introduced.

az09mugen · 2026-02-17T20:54:20 1771361660

Yes there are python files. "View all files" will show them.

az09mugen · 2026-02-15T07:57:56 1771142276

Will try that micro got which seems interesting. But what a strange way to show 243 lines of python in 5 columns : https://karpathy.ai/microgpt.html

vmykyt · 2026-02-16T20:04:42 1771272282

Seems like this is his way to say "my code fits into one screen"

az09mugen · 2026-02-12T21:25:17 1770931517

This is the perfect illustration of Goodhart's Law : https://en.wikipedia.org/wiki/Goodhart%27s_law

az09mugen · 2026-02-11T23:44:03 1770853443

I do not agree on the "lossless" adjective. And even if it is lossless, for sure it is not deterministic.

For example I would not want a zip of an encyclopedia that uncompresses to unverified, approximate and sometimes even wrong text. According to this site : https://www.wikiwand.com/en/articles/Size%20of%20Wikipedia a compressed Wikipedia without medias, just text is ~24GB. What's the medium size of an LLM, 10 GB ? 50 GB ? 100 GB ? Even if it's less, it's not an accurate and deterministic way to compress text.

Yeah, pretty easy to calculate...

nagaiaida · 2026-02-12T01:08:49 1770858529

(to be clear this is not me arguing for any particular merits of llm-based compression, but) you appear to have conflated one particular nondeterministic llm-based compression scheme that you imagined with all possible such schemes, many of which would easily fit any reasonable definitions of lossless and deterministic by losslessly doing deterministic things using the probability distributions output by an llm at each step along the input sequence to be compressed.

notpushkin · 2026-02-12T02:16:18 1770862578

With a temperature of zero, LLM output will always be the same. Then it becomes a matter of getting it to output the exact replica of the input: if we can do that, it will always produce it, and the fact it can also be used as a bullshit machine becomes irrelevant.

With the usual interface it’s probably inefficient: giving just a prompt alone might not produce the output we need, or it might be larger than the thing we’re trying to compress. However, if we also steer the decisions along the way, we can probably give a small prompt that gets the LLM going, and tweak its decision process to get the tokens we want. We can then store those changes alongside the prompt. (This is a very hand-wavy concept, I know.)

duskwuff · 2026-02-12T03:11:28 1770865888

There's an easier and more effective way of doing that - instead of trying to give the model an extrinsic prompt which makes it respond with your text, you use the text as input and, for each token, encode the rank of the actual token within the set of tokens that the model could have produced at that point. (Or an escape code for tokens which were completely unexpected.) If you're feeling really crafty, you can even use arithmetic coding based on the probabilities of each token, so that encoding high-probability tokens uses fewer bits.

From what I understand, this is essentially how ts_zip (linked elsewhere) works.

program_whiz · 2026-02-12T10:56:03 1770893763

The models are differentiable, they are trained with backprop. You can easily just run it in reverse to get the input that produces near certainty of producing the output. For a given sequence length, you can create a new optimzation that takes the input sequence, passes to model (frozen) and runs steps over the input sequence to reduce the "loss" which is the desired output. This will give you the optimal sequence of that length to maximize the probability of seeing the output sequence. Of course, if you're doing this to chatGPT or another API-only model, you have no choice but to hunt around.

Of course the optimal sequence to produce the output will be a series of word vectors (of multi-hundreds of dimensions). You could match each to its closest word in any language (or make this a constraint during solving), or just use the vectors themselves as the compressed data value.

Ultimately, NNets of various kinds are used for compression in various contexts. There are some examples where guassian-splatting-like 3d scenes are created by comrpessing all the data into the weights of a nnet via a process similar to what I described to create a fully explorable 3d color scene that can be rendered from any angle.

microtonal · 2026-02-12T10:49:29 1770893369

A bit of nitpicking, a temperature of zero does not really exist (it would lead to division by zero in softmax). It's sampling (and non-deterministic compute kernels) that makes token prediction non-deterministic. You could simply fix it (assuming deterministic kernels) by using greedy decoding (argmax with a stable sort in the case of ties).

As temperatures approach zero, the probability of the most likely token approaches one (assuming no ties). So my guess is that LLM inference providers started using temperature=0 to disable sampling because people would try to approximate greedy decoding by using teensy temperatures.

D-Machine · 2026-02-12T06:10:47 1770876647

> With a temperature of zero, LLM output will always be the same

Ignoring GPU indeterminism, if you are running a local LLM and control batching, yes.

If you are computing via API / on the cloud, and so being batched with other computations, then no (https://thinkingmachines.ai/blog/defeating-nondeterminism-in...).

But, yes, there is a lot of potential from semantic compression via AI models here, if we just make the efforts.

az09mugen · 2026-02-11T21:32:07 1770845527

This resonates so much with Wirth's law : https://en.wikipedia.org/wiki/Wirth%27s_law

az09mugen · 2026-02-10T07:36:12 1770708972

Same, even on the web I was not able to find a car photo.

az09mugen · 2026-02-09T22:14:47 1770675287

Same here. Context switching is a real flow-killer.

az09mugen · 2026-02-08T09:27:38 1770542858

Thanks for that video of Primeagen.

az09mugen · 2026-02-07T13:06:18 1770469578

3 : https://en.wikipedia.org/wiki/Newton's_laws_of_motion

hulitu · 2026-02-10T16:30:50 1770741050

This is language witchcraft. Some "laws" described there are "principles". That's how i learned them in school and that's why i was surprised to see "Newton's third law". I think i am getting old.