Hacker Newsnew | past | comments | ask | show | jobs | submit | cttet's commentslogin

In modal logic sense, Chinese is inherently more □ oriented while the language in US is using ◇ more. so in Chinese ¬(□¬A) can be used to represent a possibly concept ♢A


Yep, all users are logged out automatically...


A "worst image" instead of best image competition may be easy to implement and quite indicative of which one has less frustration experience.


OP here. That's kind of the idea of listing the number of attempts alongside failure/successes. It's a loose metric for how "compliant" a model is - e.g. how much work you have to put it in order to get a nominally successful result.


I rarely think with words, when I think with words it is like 20x slower, it is more robust, but in that case I would use pen and paper for that.


At least most functional language tutorial claim to be based on abstract machines, not like the C language, which is a spherical cow that people not often aware of. https://queue.acm.org/detail.cfm?id=3212479


It seem to have both feature and a discrete number passed into next layer, which one did you think of first? or it is both by design?


I understand that by "discrete number" you mean the selected output of each layer.

Both the "feature" and the "selected output" are designed to be passed to the next layer.


Oh it is selected output, yes I meant that I was a bit confused. So in the initial design when you first tried it, you passed both to the next layer? or it is part of where you find out to perform better?


Even in the earliest stages of the DDN concept, we had already decided to pass features down to the next layer.

I never even ran an ablation that disabled the stem features; I assume the network would still train without them, but since the previous layer has already computed the features, it would be wasteful not to reuse them. Retaining the stem features also lets DDN adopt the more efficient single-shot-generator architecture.

Another deeper reason is that, unlike diffusion models, DDN does not need the Markov-chain property between adjacent layers.


Thanks! Really like your intuition!


It may be a Claude specific thing. I tried to ask Claude to various tasks in machine learning, like implement gradient boosting without specifying the language, thinking it will use Python since it is the most common option and have utilities like Numpy to make it much easier. But Claude mostly choose Javascript for the language and somehow managed to do it in JS.


Maybe it is a cultural difference aspect, but I feel that "supermarket workers, gas station attendants" (in an Asian country) that I know of should be quite capable of most ARC tasks.


The point is not that having a high score -> AGI, their ideas are more of having a low score -> we don't have AGI yet.


In all their experiments, backprop is used for most of their parameter though...


There is a meaningful distinction. They only use backprop one layer at a time, requiring additional space proportional to that layer. Full backprop requires additional space proportional to the whole network.

It's also a bit interesting as an experimental result, since the core idea didn't require backprop. Being an implementation detail, you could theoretically swap in other layer types or solvers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: