Hacker Newsnew | past | comments | ask | show | jobs | submit | darkbatman's commentslogin

By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale.


Its so useful to use Cerebras api for other tasks too not just coding with qwen coder but even simpler things like lets say analysing with gpt-120 oss or llama.

Just plug it in with normal chat interface like Jan or Cherry studio and its incredibly fast.


We kinda use https://github.com/googleapis/genai-toolbox but for databases looking forward if klavis provide more or general solution.

Ideally when we are writing agents we need mcp to support auth, custom headers because by design when deploying for saas we need to pass around client params to be able to isolate client connections.

We do token optimisation and other smart stuff to save token money. Looking forward to try this as well if this solves similar problems as well


Thank you! Yes we do provide auth and support other remote MCP servers via our API : https://docs.klavis.ai/api-reference/strata/create. It indeed support custom headers. Feel free to give us a try or come talk to us!


true, I was using Cline/Roocode from almost an year and it always made sure to read things from memory-bank which i really liked. Claude has gone downhill from August mid for me and often it doesn't follow instructions from claude.md or forget things mid-way.


Mostly agree with article, though what happens in few years when juniors will eventually become senior.

Personally seeing trend juniors are relying so much on AI that they can't even explain what they wrote even in interview or coding assignments or even PR. Its like blackbox to them.

I believe then we would see the higher impact or may be by then its solved problem already.


With already https://agentcommunicationprotocol.dev (ACP) same name seems confusing now. even though differences are there in both.


IBM announced in March 2025 its Agent Communication Protocol (ACP) but is now abandoning the ACP name and merging ACP efforts with Google’s Agent2Agent (A2A) protocol at the Linux Foundation. The ACP team is winding down as the industry backs A2A for open, community-driven AI agent interoperability under Linux Foundation governance. This move aims to unify protocols and avoid fragmentation in AI agent standards.

https://lfaidata.foundation/communityblog/2025/08/29/acp-joi...


That seems odd. Even with an A2A protocol, don’t you still need to standardize a client “surface” or “API” or whatever, so agents can describe IDE actions they want to trigger in the expected terms over that protocol?

Or is A2A like USB, where it acts as both a registry of, and “standardized standardization process” for, suites of concrete message types for each use-case?

Like, yeah, when a "client" drives an "agent", that's no different than what any generic "agent" would be doing to drive an "agent"; an IDE or what-have-you can just act as the "parent agent" in that context.

But when an "agent" is driving a "client", that's all about the "agent" understanding that the "client" isn't just some generic token-driven inference process, but an actual bundle of algorithms that does certain concrete things, and has to be spoken to a certain way to get it to do those concrete things.

I had assumed that IBM's older ACP was in large part concerned with formalizing that side of interoperation. Am I wrong?


Would be nice to see some benchmarks.

Also from my experience you need more power to get some significant result. Mostly fine tuning would work if base model is very close to what you are trying to achieve and you won't be much happy with the results though.

Also context length becomes an issue trying to fit in with gpu with lesser ram.


Are there any load test results available, we would like to use this at zenskar but at high scale really need it to work.

System merges and final are definitely unpredictable so nice project.


Great to hear that you are considering it for zenskar. We don't have a publicly available load test, but in internal checks it was able to handle 15k requests per second (locally on a MacBook Pro/M2 Docker). What is the load that you are expecting? Happy to connect.


Is it similar to rethinkdb, I remember using RethinkDB for similar use case for live queries running directly on database layer back in 2016.


Not the temporal you are thinking of. its Date replacement in JS.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: