More

darkbatman · 2025-10-22T20:38:10 1761165490

By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale.

darkbatman · 2025-10-01T20:00:05 1759348805

Its so useful to use Cerebras api for other tasks too not just coding with qwen coder but even simpler things like lets say analysing with gpt-120 oss or llama.

Just plug it in with normal chat interface like Jan or Cherry studio and its incredibly fast.

darkbatman · 2025-09-23T16:57:07 1758646627

We kinda use https://github.com/googleapis/genai-toolbox but for databases looking forward if klavis provide more or general solution.

Ideally when we are writing agents we need mcp to support auth, custom headers because by design when deploying for saas we need to pass around client params to be able to isolate client connections.

We do token optimisation and other smart stuff to save token money. Looking forward to try this as well if this solves similar problems as well

wirehack · 2025-09-23T17:01:14 1758646874

Thank you! Yes we do provide auth and support other remote MCP servers via our API : https://docs.klavis.ai/api-reference/strata/create. It indeed support custom headers. Feel free to give us a try or come talk to us!

darkbatman · 2025-09-21T15:13:31 1758467611

true, I was using Cline/Roocode from almost an year and it always made sure to read things from memory-bank which i really liked. Claude has gone downhill from August mid for me and often it doesn't follow instructions from claude.md or forget things mid-way.

darkbatman · 2025-09-21T15:09:35 1758467375

Mostly agree with article, though what happens in few years when juniors will eventually become senior.

Personally seeing trend juniors are relying so much on AI that they can't even explain what they wrote even in interview or coding assignments or even PR. Its like blackbox to them.

I believe then we would see the higher impact or may be by then its solved problem already.

darkbatman · 2025-08-30T14:48:50 1756565330

With already https://agentcommunicationprotocol.dev (ACP) same name seems confusing now. even though differences are there in both.

schwentkerr · 2025-08-30T17:24:18 1756574658

IBM announced in March 2025 its Agent Communication Protocol (ACP) but is now abandoning the ACP name and merging ACP efforts with Google’s Agent2Agent (A2A) protocol at the Linux Foundation. The ACP team is winding down as the industry backs A2A for open, community-driven AI agent interoperability under Linux Foundation governance. This move aims to unify protocols and avoid fragmentation in AI agent standards.

https://lfaidata.foundation/communityblog/2025/08/29/acp-joi...

derefr · 2025-08-30T17:55:47 1756576547

That seems odd. Even with an A2A protocol, don’t you still need to standardize a client “surface” or “API” or whatever, so agents can describe IDE actions they want to trigger in the expected terms over that protocol?

Or is A2A like USB, where it acts as both a registry of, and “standardized standardization process” for, suites of concrete message types for each use-case?

Like, yeah, when a "client" drives an "agent", that's no different than what any generic "agent" would be doing to drive an "agent"; an IDE or what-have-you can just act as the "parent agent" in that context.

But when an "agent" is driving a "client", that's all about the "agent" understanding that the "client" isn't just some generic token-driven inference process, but an actual bundle of algorithms that does certain concrete things, and has to be spoken to a certain way to get it to do those concrete things.

I had assumed that IBM's older ACP was in large part concerned with formalizing that side of interoperation. Am I wrong?

darkbatman · 2025-07-24T20:37:01 1753389421

Would be nice to see some benchmarks.

Also from my experience you need more power to get some significant result. Mostly fine tuning would work if base model is very close to what you are trying to achieve and you won't be much happy with the results though.

Also context length becomes an issue trying to fit in with gpu with lesser ram.

darkbatman · 2025-05-11T17:41:30 1746985290

Are there any load test results available, we would like to use this at zenskar but at high scale really need it to work.

System merges and final are definitely unpredictable so nice project.

super_ar · 2025-05-11T18:27:09 1746988029

Great to hear that you are considering it for zenskar. We don't have a publicly available load test, but in internal checks it was able to handle 15k requests per second (locally on a MacBook Pro/M2 Docker). What is the load that you are expecting? Happy to connect.

darkbatman · 2025-04-10T08:37:20 1744274240

Is it similar to rethinkdb, I remember using RethinkDB for similar use case for live queries running directly on database layer back in 2016.

darkbatman · on Jan 30, 2025

Not the temporal you are thinking of. its Date replacement in JS.