Hacker Newsnew | past | comments | ask | show | jobs | submit | cr4zy's commentslogin

For compression and long-running agents, may I suggest https://memtree.dev. We offer a simple API that compresses messages asynchronously for instant responses and small context leading to much higher quality generations. We're about to release a dashboard that will show you what each compressed request looked like, the token distribution between system, memory, and tool messages, along with memory retrievals, etc... Is this the type of thing that you're looking for?


Something like this needs to be open-sourced. You're going to have a hell of a time trying to get enough trust from people to run all of their prompts through your servers.


For code it's actually quite good so far IME. Not quite as good as Gemini 2.5 Pro but much faster. I've integrated it into polychat.co if you want to try it out and compare with other models. I usually ask 2 to 5 models the same question there to reduce the model overload anxiety.


I discuss how the automation wave is already starting with white-collar job openings at a 12 year low in the U.S. I also talk about how we cannot simply count on taxing AI to support automated workers, as countries that don't tax will outcompete those who do. We therefore need international cooperation, based in MAIM, from the original Superintelligence Strategy paper.

I also discuss how bioweapons cannot be avoided via restricting open weight models as originally suggested in Dan's paper. Rather we need to heavily invest in bioweapon defense, and in particular use AI for wastewater monitoring and accelerating metagenomics (detangling mixed DNA).


Open WebUI does offer an API, but I have it disabled for PolyChat.


One tradeoff of bringing your own API keys is that as you add more model providers, you get more billing accounts to deal with. Chorus also doesn't have an incentive to efficiently use your tokens. We save 67% on Anthropic token costs using Claude Caching. We also use cheaper "task models" for conversation titles, tagging, and parts of the RAG pipeline which all drastically cuts token costs.

For local models I highly recommend https://github.com/crizCraig/open-webui

They do the side by side thing that Chorus does and you can serve it to anywhere including your phone.


So it looks like some folks are getting errors with the non-streaming models, i.e. the o1 models. I think their long running cxns with zero packets may cause some networks to drop the requests. Will look into a hearbeat/keepalive on those.


I've added a "Thinking...." which sends server side events to keep the cxn open. Would love to hear if o1 models now work for anyone who they were broken for.


It's not unlimited free unfortunately. After some free use, we provide monthly usage-tier plans. But there are no rate limits like other providers as you can move up to the next usage-tier.


Cool! I should say most of PolyChat is open source at https://github.com/open-webui - just the combo models and payment are closed source right now. Open to arguments on making PolyChat fully open source as well!


Sounds like you made most of these changes upstream? What about the background chats and the chat tree overview, are either of them in Open WebUI, or are they also custom to PolyChat? I run OWUI locally and am interested in those features for selfish reasons. If for some reason your multi-model idea doesn’t pan out, I’d love to see it merged upstream, too. Thanks for your contributions!

(Another annoying thing about OWUI is getting logged out every time the image upgrades… is that something else you’ve looked at?)


Background chats are new in v5 of Open WebUI, so you can use it too. Overview has been there also, but it's kind hidden in the hamburger menu.

The upgrade/logout issue you're facing is likely due to not setting WEBUI_SECRET_KEY outside of your docker container. This causes all previous cookies to be unreadable as a new key will get generated by start.sh and won't decrypt the old cookies.


Thanks! I definitely recommend launching quickly and iterating.

edit: I should add, I'm making only $30/mo in subs, but I just launched so we'll see!


You’re right. Thanks for sharing info openly, appreciated.


Wow, this is awesome! Thanks for building. I didn't realize there was a protocol for streaming while rendering, though I noticed sumo.ai doing something similar for audio. Gemini with grounding is new to me also, very nice!


thanks! Streaming was actually pretty hard to get working, but it goes roughly like this as a streaming pipeline:

- The LLM is prompted to generate an explainer video as sequence of small Manim scene segments with corresponding voiceovers

- LLM streams response token-by-token as Server-Sent-Events

- Whenever a complete Manim segment is finished, send it to Modal to start rendering

- Start streaming the rendered partial video files from manim as they are generated via HLS


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: