Hacker Newsnew | past | comments | ask | show | jobs | submit | andreybaskov's commentslogin

Out of all things I'm actually surprised they went straight to custom silicon, but gotta respect that decision. It's likely the only way to compete with Tesla right now.

Thank you for saying this. It's almost like others are saying we should stop trying things because they are hard and challenging.

I wish we could dream a bit bigger rather than coming up with reasons something will fail.


> What high quality data sources are not already tapped? Synthetic data? Video?

> Where does the next 1000x flops come from? Even with Moore's law dead, we can easily build 1,000x more computers. And for arguments about lack of power - we have sun.


A Dyson sphere brain ?


I don't think we need that for 1,000x. We can building more solar, nuclear and there is still room for at least 10x improvement in efficiency for the chips. We are far far away from maxing out our compute capability as civilization before we start shooting satellites into the sun.


I see LLMs in a similar way - a new UI paradigm that "clicks the right buttons" when you know what you need, but don't know exact names of the buttons to click.

And from my experience there are lots and lots of jobs that are just "clicking the right buttons".


Say we discover a new architecture breakthrough like Yann LeCun's proposed JEPA. Won't scaling laws apply to it anyway?

Suppose training is so efficient that you can train state of the art AGI on a few GPUs. If it's better than current LLMs, there will be more demand/inference, which will require more GPUs and we are back at the same "add more gpus".

I find it hard to believe that we, as a humanity, will hit the wall of "we don't need more compute", no matter what the algorithms are.


  > Won't scaling laws apply to it anyway?
Yes, of course. Scaling Laws will always apply, but that's not really the point[0]

The fight was never "Scale is all you need" (SIAYN) vs "scale is irrelevant" it was "SIAYN" vs "Scaling is not enough". I'm not aware of any halfway serious researcher that did not think scaling was going to result in massive improvements. Being a researcher from the SINE camp myself...

Here's the thing:

The SIAYN camp argued that the transformer architecture was essentially good enough. They didn't think scale was all you needed, but that the rest would me minor tweaks and increasing model size and data size would get us there. That those were the major hurdles. In this sense they argued that we should move our efforts away from research and into engineering. That AGI was now essentially a money problem rather than a research problem. They pointed to Sutton's Bitter Lesson narrowly, concentrating on his point about compute.

The SINE (or SINAYN) camp wasn't sold. We read the Bitter Lesson differently. That yes, compute is a key element to modern success, but just as important was the rise of our flexible algorithms. In the past we couldn't work with such algorithms because of lack of computational power, but that the real power was the algorithms. We're definitely a more diverse camp too, with vary arguments. Many of us look at animals and see that we can do so much more with so much less[2]. Clearly even if SIAYN were sufficient, it does not appear to be efficient. Regardless, we all agree that there's still subtle nuances in intelligence that need working out.

The characteristics of the scaling "laws" matter but it isn't enough. In the end what matters is generalization. For that we don't really have measures. Unfortunately, with the SIAYN camp also came benchmark maximization. It was a good strategy in the beginning as it helped give us direction. But we are now at the hard problem with the SINE camp predicted. How do you do things like make a model a good music generator when you have no definition of "good music"? Even in a very narrow sense we don't have a half way decent mathematical definition of any aesthetics. We argued "we should be trying to figure this out so we don't hit a wall" and they argued "it'll emerge with scale".

So now the cards have been dealt. Who has the winning hand? More importantly, which camp will we fund? And will we fund the SIAYN people that converted to SINE or will we fund those who have been SINE when times were tough?

[0] They've been power laws and I expect them to continue to be power laws[1]. But the parameters of those laws do still matter, right?

[1] https://www.youtube.com/watch?v=HBluLfX2F_k

[2] A mouse has on the order of 100M neurons (and 10^12 synapses). Not to mention how little power they operate on! These guys can still our perform LLMs on certain tasks despite the LLMs having like 4 orders of magnitude more parameters and many more in data!


I agree scaling alone is not enough, and transformers itself is a proof of that - it was an iteration on the attention mechanism and a few other changes.

But no matter what the next big thing is, I'm sure it would immediately fill all available compute to maximize its potential. It's not like intelligence has a ceiling beyond which you don't need more intelligence.


Was "scale is all you need" actually a real thing said by a real person? Even the most pro scale people like Altman seem to be saying research and algorithms are a thing too. I guess as you say a more important thing is where the money goes. I think Altman's been overdoing it a bit on scaling spend.


Yes, they even made t-shirts.

  > Even the most pro scale people like Altman seem to be saying research and algorithms are a thing too.
I think you missed the nuance in my explanation of both sides. Yes, they believed algorithmic development mattered but small. Tuning, not even considering exporting different architectures than the transformer.

Which Altman said that AGI is a scaling problem, which is why he was asking for $7T. But he was clearly a lier given this from last year. There's no way he really believed this in late 2024.

  > Altman claimed that AGI could be achieved in 2025 during an interview for Y Combinator, declaring that it is now simply an engineering problem. He said things were moving faster than expected and that the path to AGI was "basically clear."[0]
I'm with Chollet on this one, our obsession with LLMs have held us back. Not that we didn't learn a lot from them but that our hyper fixation closed our minds to other possibilities. The ML field (and CS in general) gets hyper fixated on certain things and I just don't get that. Look at diffusion models. There was basically a 5 year gap between the first unet based model and DDPM. All because we were obsessed with GANs at the time. We jump on a hypetrain and shun anyone who doesn't want to get on. This is not a healthy ecosystem and it hinders growth.

Just because we end up with success doesn't mean the path to get there was reasonable nor does it mean it was efficient.

[0] https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...


Fair enough although that Altman quote doesn't match what he actually said in the interview. He said:

>...first time ever where I felt like we actually know what to do like I think from here to building an AGI will still take a huge amount of work there are some known unknowns but I think we basically know what to go what to go do and it'll take a while it'll be hard but that's tremendously exciting... https://youtu.be/xXCBz_8hM9w?t=2330

and at the end there was "what are you excited for in 2025?" and Altman says "AGI" but that doesn't specify if that's it arriving or just working on it.

I don't think huge amount of work and known unknowns is the same as we just need to scale.


Does anyone know or have a guess on the size of this latest thinking models and what hardware they use to run inference? As in how much memory and what quantization it uses and if it's "theoretically" possible to run it on something like Mac Studio M3 Ultra with 512GB RAM. Just curious from theoretical perspective.


Rough ballpark estimate:

- Amazon Bedrock serves Claude Opus 4.5 at 57.37 tokens per second: https://openrouter.ai/anthropic/claude-opus-4.5

- Amazon Bedrock serves gpt-oss-120b at 1748 tokens per second: https://openrouter.ai/openai/gpt-oss-120b

- gpt-oss-120b has 5.1B active parameters at approximately 4 bits per parameter: https://huggingface.co/openai/gpt-oss-120b

To generate one token, all active parameters must pass from memory to the processor (disregarding tricks like speculative decoding)

Multiplying 1748 tokens per second with the 5.1B parameters and 4 bits per parameter gives us a memory bandwidth of 4457 GB/sec (probably more, since small models are more difficult to optimize).

If we divide the memory bandwidth by the 57.37 tokens per second for Claude Opus 4.5, we get about 80 GB of active parameters.

With speculative decoding, the numbers might change by maybe a factor of two or so. One could test this by measuring whether it is faster to generate predictable text.

Of course, this does not tell us anything about the number of total parameters. The ratio of total parameters to active parameters can vary wildly from around 10 to over 30:

    120 : 5.1 for gpt-oss-120b
    30 : 3 for Qwen3-30B-A3B
    1000 : 32 for Kimi K2
    671 : 37 for DeepSeek V3
Even with the lower bound of 10, you'd have about 800 GB of total parameters, which does not fit into the 512 GB RAM of the M3 Ultra (you could chain multiple, at the cost of buying multiple).

But you can fit a 3 bit quantization of Kimi K2 Thinking, which is also a great model. HuggingFace has a nice table of quantization vs required memory https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF


I love logical posts like this. There are other factors like mxfp4 in gpt-oss, mla in deepseek, etc.

>Amazon Bedrock serves Claude Opus 4.5 at 57.37

I checked the other Opus-4 models on bedrock:

Opus 4 - 18.56tps Opus 4.1 - 19.34tps

So they changed the active parameter count with Opus 4.5


Good observation!

56.37 tps / 19.34 tps ≈ 2.9

This explains why Opus 4.1 is 3 times the price of Opus 4.5.


Thanks! That's a great way to analyze it by comparing to open source models. Though I wonder if they use the same hardware for gpt-oss-120b and Claude Opus.


That all depends on what you consider to be reasonably running it. Huge RAM isn’t required to run them, that just makes them faster. I imagine technically all you'd need is a few hundred megabytes for the framework and housekeeping, but you’d have to wait for the some/most/all of the model to be read off the disk for each token it processes.

None of the closed providers talk about size, but for a reference point of the scale: Kimi K2 Thinking can spar in the big leagues with GPT-5 and such…if you compare benchmarks that use words and phrasing with very little in common with how people actually interact with them…and at FP16 you’ll need 2.9TB of memory @ 256,000 context. It seems it was recently retrained it at INT4 (not just quantized apparently) and now:

“ The smallest deployment unit for Kimi-K2-Thinking INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP). (https://huggingface.co/moonshotai/Kimi-K2-Thinking) “

-or-

“ 62× RTX 4090 (24GB) or 16× H100 (80GB) or 13× M3 Max (128GB) “

So ~1.1TB. Of course it can be quantized down to as dumb as you can stand, even within ~250GB (https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-l...).

But again, that’s for speed. You can run them more-or-less straight off the disk, but (~1TB / SSD_read_speed + computation_time_per_chunk_in_RAM) = a few minutes per ~word or punctuation.


    > (~1TB / SSD_read_speed + computation_time_per_chunk_in_RAM) = a few minutes per ~word or punctuation.
 
You have to divide SSD read speed by the size of the active parameters (~16GB at 4 bit quantization) instead of the entire model size. If you are lucky, you might get around one token per second with speculative decoding, but I agree with the general point that it will be very slow.


Yeah thanks for calling that out. I kind of panicked when I reached that part of the explanation and was stuck on whether or not I should go into dense models vs MoE. The question was about ‘big stuff like that’, which they most certainly use MoE, then I even chose an MoE as an example, but then there are giant dense models like Llama, but that’s not what was asked, although it wasn’t not asked because ‘also big league stuff’…anyway, I basically thought “you’re welcome” and “no problem”, then said “you’re problem”.


Originally posted this in another thread, but very curious what others think.

Can I ask my partner to buy a product on Amazon?

Can I ask my personal assistant to buy a product on Amazon?

Can I hire a contractor to buy products on Amazon?

Can I communicate with a contractor via API to direct them what products to buy?

What if there is no human on the other end and its an LLM?

Same issue with LinkedIn. I know execs who have assistants running their socials. Is this legal?

Like, where do we draw the line? In the future, would the only way to shop on Amazon be with approved VR goggles that scan your retina to verify you are a human?


> where do we draw the line?

Perplexify has shown itself to be a bad actor [1][2][3], and possibly incompetent, too [4].

We need to draw a line, eventually. But it’s far from urgent. And I don’t think Perplexity should be the one deciding.

[1] https://blog.cloudflare.com/perplexity-is-using-stealth-unde...

[2] https://www.reuters.com/legal/litigation/perplexity-ai-loses...

[3] https://arstechnica.com/tech-policy/2025/10/reddit-sues-to-b...

[4] https://brave.com/blog/comet-prompt-injection/


The law has nothing to do with it. Amazon is a private company and can make rules about who can or can't place orders on its website. When you create an account you agree to their ToS.


Interesting. Amazon ToS actually has a section about agents - https://www.amazon.com/gp/help/customer/display.html?nodeId=...

And they even provide a definition of what an Agent is:

"Agent” means any software or service that takes autonomous or semi-autonomous action on behalf of, or at the instruction of, any person or entity.

Though to me it raises even more questions. What is a software that takes "autonomous" action on my behalf. Is curl "autonomous"?


"autonomous or semi-autonomous" is the key phrase. If you manually invoke a curl command then no, it isn't an agent. If you write code that itself determines when and how to invoke that command then it is.


Am I not manually instructing the agent to buy a certain product?

What if I set up a cron job to buy a certain product every month - is that not autonomous? What if it is first querying my live toilet paper sticks to make the decision?


Exactly. It's software - `curl` or LLM. It's a function that accepts input and produces output. One is much more sophisticated that the other, but it's made out of the same machine instructions, there is no magic.

What's the criteria that makes one function "autonomous" and the other one "manual"? I feel it really boils down to this.


> Is curl "autonomous"?

Only when you supply -L


I don't think GP meant "legal" in the literal sense. Regardless, the post's meaning is still the same if you replace "Is this legal?" with "Does this conform to Amazon's ToS?", so please read it charitably and avoid being pedantic about this sort of thing.


Legal matters are all about pedantry.


Can I use Amazon Mechanical Turk to place orders for myself on Amazon?


No.

That is why you have personal credentials to log in to Amazon. If you want to have delegating capabilities you can open an Amazon business account.


Me and my wife share the same Amazon account. Should I open a business account to do grocery shopping?


Amazon support family sharing for the same home address - 2 adults and 4 children I think can share the same Prime

My wife and I used to share 1 account, but then I wanted to buy her a gift that had to be a surprise - so had to create a new account and add it as part of the family to the original one…. Then kids grew up and wanted to make small orders themselves, and I didn’t want them to see our order history…


I know its there. But we _prefer_ to have a single account to simplify tracking and picking up packages. I'm curious if from their point of view (or their ToS) I'm even allowed to share my credentials with anyone else.

1Password shared vaults are there for a reason - people share credentials all the time, business or personal.


She should not. You should create an Amazon family and enroll her as a member.

This way Amazon can keep track of your separate buyer profiles.


Would Amazon be ok with me opening a business account, creating credentials for a Perplexity assistant, and having it buy products?

Based on this article, I'd think not?


Even if it was not AI it would not be allowed. You are effectively creating dummy accounts with bots.

Even the SEC would be against it, as it would inflate the user base of Amazon.


Agentic browsers raise a lot of questions that were hanging there even before LLMs.

Can I ask my partner to buy a product on Amazon? Can I ask my personal assistant to buy a product on Amazon? Can I hire a contractor to buy products on Amazon? Can I communicate with a contractor via API to direct them what products to buy? What if there is no human on the other end and its an LLM?

Same issue with LinkedIn. I know execs who have assistants running their socials. Is this legal?


Perplexity should use this argument.


You've been spamming this in a few threads.

A private business can 100% refuse service to you. Examples with regards to "delegation":

- If you come in using a form of non-cash payment that doesn't belong to you.

- If you're purchasing a car, and are filling out paperwork under someone else's name. FYI, you can buy cars on Amazon.com.

- If you attempt to pick-up a pre-order or an item earmarked for someone else.

...

Of course some businesses are more or less restrictive base on fraud chance, yada yada, but you get the idea. You're not being oppressed. Go shop elsewhere.


Apologies, I didn't mean to spam, haven't seen the other thread that picked up more votes and was really curious about where the line is.

I completely understand private businesses having a right to refuse a service without a cause. But as others pointed, the question is to what degree "delegation" is acceptable if I'm acting in a good faith?

I'm guessing the answer is "to a degree it doesn't impact our business".


I don't think any of your examples are analogous to the questions/point the GP was trying to make. Your questions seem to be centered around someone trying to trick or defraud a retailer; GP's is about simple, straightforward delegation.

But yes, agreed, businesses have the right to refuse service to anyone (outside of illegal discrimination).

We should fight it, though, when those refusals are backed by anti-consumer practices. It's pretty clear that Amazon doesn't like agent-mediated purchases because it allows the customer to bypass Amazon's ability to put sponsored products in front of you, and try to get you to buy related and add-on products along with what you actually want.

Sure, it is their right to do that, but as consumers I think we shouldn't be complacent and just take what the big shopping overlords feed us. Consolidation (and races to the bottom such as this) is making it harder and harder to find competing retailers and products when we want to vote with our wallets as to what kinds of shopping experiences are acceptable.

And the bottom line is that if Amazon realizes that they're losing sales because people want to use AI agents to buy things, and they're banning those agents, they'll change their tune. But that only works so long as there are alternatives with better practices, and, well... there aren't many.


Retail store literally had to pay rent to a landlord. How’s that not a rent seeking business?


That's not what rent-seeking is. Rent-seeking is charging for something that's free before, generally through seeking legal enforcement. If you think the concept of renting stuff is bad, you've missed somewhat.


I once landed a GA airplane in a very busy uncontrolled class C airport that closed its tower at 4pm due to staff shortage, but was still operational. Since then I have a tremendous respect for aviation resilience to any single point failure. I imaging having entire JFK on CTAF isn't an option though.


This is Alaska 37, wings up turning base over Coney Island...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: