More

cttet · 2025-12-12T06:44:39 1765521879

In modal logic sense, Chinese is inherently more □ oriented while the language in US is using ◇ more. so in Chinese ¬(□¬A) can be used to represent a possibly concept ♢A

cttet · 2025-11-24T09:47:33 1763977653

Yep, all users are logged out automatically...

cttet · 2025-10-29T03:33:38 1761708818

A "worst image" instead of best image competition may be easy to implement and quite indicative of which one has less frustration experience.

vunderba · 2025-10-29T06:47:05 1761720425

OP here. That's kind of the idea of listing the number of attempts alongside failure/successes. It's a loose metric for how "compliant" a model is - e.g. how much work you have to put it in order to get a nominally successful result.

cttet · 2025-10-24T10:59:19 1761303559

I rarely think with words, when I think with words it is like 20x slower, it is more robust, but in that case I would use pen and paper for that.

cttet · 2025-10-20T01:43:46 1760924626

At least most functional language tutorial claim to be based on abstract machines, not like the C language, which is a spherical cow that people not often aware of. https://queue.acm.org/detail.cfm?id=3212479

cttet · 2025-10-11T04:13:33 1760156013

It seem to have both feature and a discrete number passed into next layer, which one did you think of first? or it is both by design?

diyer22 · 2025-10-11T04:24:56 1760156696

I understand that by "discrete number" you mean the selected output of each layer.

Both the "feature" and the "selected output" are designed to be passed to the next layer.

cttet · 2025-10-11T06:01:52 1760162512

Oh it is selected output, yes I meant that I was a bit confused. So in the initial design when you first tried it, you passed both to the next layer? or it is part of where you find out to perform better?

diyer22 · 2025-10-11T06:24:05 1760163845

Even in the earliest stages of the DDN concept, we had already decided to pass features down to the next layer.

I never even ran an ablation that disabled the stem features; I assume the network would still train without them, but since the previous layer has already computed the features, it would be wasteful not to reuse them. Retaining the stem features also lets DDN adopt the more efficient single-shot-generator architecture.

Another deeper reason is that, unlike diffusion models, DDN does not need the Markov-chain property between adjacent layers.

cttet · 2025-10-11T06:28:33 1760164113

Thanks! Really like your intuition!

cttet · 2025-08-04T02:29:18 1754274558

It may be a Claude specific thing. I tried to ask Claude to various tasks in machine learning, like implement gradient boosting without specifying the language, thinking it will use Python since it is the most common option and have utilities like Numpy to make it much easier. But Claude mostly choose Javascript for the language and somehow managed to do it in JS.

cttet · 2025-07-07T14:13:19 1751897599

Maybe it is a cultural difference aspect, but I feel that "supermarket workers, gas station attendants" (in an Asian country) that I know of should be quite capable of most ARC tasks.

cttet · 2025-07-07T14:07:19 1751897239

The point is not that having a high score -> AGI, their ideas are more of having a low score -> we don't have AGI yet.

cttet · 2025-04-14T03:03:07 1744599787

In all their experiments, backprop is used for most of their parameter though...

hansvm · 2025-04-14T05:14:10 1744607650

There is a meaningful distinction. They only use backprop one layer at a time, requiring additional space proportional to that layer. Full backprop requires additional space proportional to the whole network.

It's also a bit interesting as an experimental result, since the core idea didn't require backprop. Being an implementation detail, you could theoretically swap in other layer types or solvers.