More

utopcell · 2025-12-10T03:29:39 1765337379

Very strong statement on the title, given the following limitation:

> Generation tasks. Method applies to classification only. Preliminary decoder experiments show perplexity increases.

daemonologist · 2025-12-10T03:34:56 1765337696

Yeah, burying this on page 8 is a bit suspect imo (the eval datasets are listed on page 3, so if you were familiar with them you would have a hint then).

The distillation of a student that predicts "anchor layers" and then acts as a backbone for classification is perfectly cool on its own; no need to stretch the title/abstract so much.

gcr · 2025-12-10T03:42:22 1765338142

agreed re: title/abstract stretching. good work stands on its own without needing hype. "we found a nifty way to distill llama-70b using a much smaller student transformer model; the key is using intermediate activation layers in a compressed representation" would be about as effective at selling it while being more immediately approachable IMO

anima-core · 2025-12-10T16:51:44 1765385504

That limitation is already accounted for in how the title is meant to be read. The 224× compression result is specifically about the structure of intermediate activations on classification tasks. The paper makes that explicit in multiple places, including the Limitations section, where generation is identified as an entirely separate challenge.

The title reflects the strongest verified result in the domain the method currently supports, not a universal claim across all modalities. In other words, the compression result is real, but it shouldn't be interpreted as applying to generative decoding... yet.

utopcell · 2025-12-09T17:03:37 1765299817

I really don't see the benefit of this over, say, pixel buds.

crimsoneer · 2025-12-10T10:30:15 1765362615

I really wanted pixel buds to fit this use case, but have found the experience incredibly crap. "Hey Google, let's chat live" is like some mad lottery.

utopcell · 2025-11-15T15:35:44 1763220944

..we found an off the shelf keyboard that could work, but we couldn't get it because it was 999 euros. So let's make 7 iterations of our own keyboard with our Formlabs 3d printer, create silicom molds for each key, print legends with our uv printer and we're done. Glad he did though, looks awesome!

utopcell · 2025-11-15T15:28:55 1763220535

I'd watch a video about the making of this video.

utopcell · 2025-11-09T23:10:05 1762729805

The disrespect towards these people is unfathomable.

utopcell · 2025-10-23T10:41:22 1761216082

definitely laggy, but works even under low lighting conditions and a camera that is not facing straight forward.

il_nets · 2025-10-24T17:08:09 1761325689

May I ask what device and browser you are using?

utopcell · 2025-10-23T10:39:17 1761215957

That was a blast from the past! Many, including me, surely still have those Wii sensors/controllers around. Fun times!

utopcell · 2025-10-14T02:34:38 1760409278

How low can this go? Can this run on a 5090 card (32GiB)?

JonathanFly · 2025-10-14T10:33:10 1760437990

Set nproc_per_node-1 instead of 8 (or run the training script directly instead of using torchrun) and set device_batch_size=4 instead of 32. You may be able to use 8 with a 5090, but it didn't work on my 4090. However it's way slower than expected, one H100 isn't 250x the 4090, so I'm not sure it's training correctly. I'll let it run overnight and see if the outputs make any sense, maybe the metrics are not accurate in this config.

utopcell · 2025-10-09T01:14:17 1759972457

..or, as Aristotle put it more succinctly: "those who can, do; those who understand, teach".

utopcell · 2025-09-25T02:08:16 1758766096

I searched on my work browser, on my personal browser, on the phone (mobile + desktop page) and in all cases I use the link to midjourney.com first.

The spirit of the question I can subscribe to though: too many ads on top of the results these days.