Hacker Newsnew | past | comments | ask | show | jobs | submit | matiasmolinas's commentslogin

Built a browser-based AI OS where a master agent creates/evolves specialized sub-agents defined in markdown, executes Python via WebAssembly, and learns from past executions via persistent memory.

Key features: - Agent reuse & evolution (80% match rule) - Python runtime in browser (Pyodide: numpy, scipy, matplotlib) - Memory system that improves over time - Virtual file system (localStorage) - Completely client-side

Example: Ask for "FFT signal analysis" → system checks memory → finds/evolves SignalProcessorAgent → generates Python → executes in browser → saves results → records experience → next time runs in seconds.

Try it: https://github.com/EvolvingAgentsLabs/llmos

Started as a weekend project exploring self-improving AI systems. Core features working, some rough edges.

Feedback welcome, especially on the agent evolution approach and memory structure.


I love Claude Code, but there are a few capabilities I kept wishing it had. So I built an experimental fork/extension to explore what those might look like.

  Three main additions I wanted:

  1. Persistent Domain Memory
  Claude Code starts fresh each session. I wanted an environment that remembers domain-specific patterns. LLMos adds a three-volume system
  (System/Team/User) where successful workflows automatically become reusable skills. Work on quantum chemistry for a week, and the system
  learns molecular Hamiltonians, ansatz selection heuristics, convergence criteria—domain fluency that compounds over time.

  2. Self-Improving Sub-Agents
  Claude Code has great tool use, but I wanted agents that could observe and improve themselves. LLMos agents literally rewrite their own
  code based on what works. Example: A circuit optimizer starts basic, but after 50+ sessions, it's learned adaptive gradient descent, smart
  initialization, and error mitigation strategies—all from watching successful runs.

  3. Client-Side Code Execution
  Claude Code writes files but doesn't run them directly. I added Pyodide for browser-based Python execution with live preview. Edit code →
  auto-run → see matplotlib plots/quantum circuits in <1 second. No deployment, just pure flow state for scientific computing.

  Current focus: Quantum computing (VQE, QAOA, quantum chemistry) because it's a perfect test bed—rapidly evolving field, requires deep
  domain expertise, complex workflows, high-value automation.

  The "evolving OS" concept: Instead of a static tool, what if your development environment learned your field, extracted patterns into
  reusable skills, and improved its agents based on what actually works in practice?

  Technical: Next.js + Pyodide + Qiskit + OpenRouter. All volumes are Git repos (preserving Claude Code's file-first philosophy). Code
  execution is 100% client-side.

  GitHub: https://github.com/agustinazwiener/evolving-agents-labs/tree/main/llmunix

  Obviously this is rough/experimental—missing lots of polish, limited to Python, quantum-focused. But I'm curious:

  - Would persistent domain memory be useful in Claude Code itself?
  - Are self-modifying agents too weird, or genuinely helpful?
  - Is browser-based execution worth the complexity for scientific/research workflows?

  Feedback welcome, especially from Claude Code users or anyone working in specialized technical domains.


Came across an early open-source project aiming to fix a big gap in current LLMs: statelessness. Every conversation resets to zero.

LLM-OS tries to give AI systems persistent, evolving memory by treating everything as a memory artifact:

Crystallized tools: repeated patterns auto-convert into executable Python tools (deterministic memory).

Markdown agents: editable behavioral memory.

Execution traces: procedural memory the system can replay/learn from.

Promotion layers: memory flows from user → team → organization via background “crons.”

The idea is that organizations accumulate AI knowledge automatically, and new members inherit it.

Repo: https://github.com/EvolvingAgentsLabs/llmos

Article: https://www.linkedin.com/pulse/what-your-ai-remembered-every...

Curious whether HN thinks persistent AI memory is workable


When Linus posted Linux 0.01 in 1991, he wrote: "I'm doing a (free) operating system (just a hobby, won't be big and professional)." It wasn't complete. It wasn't polished. But the core ideas were there.

  I've been thinking about what an "operating system" for LLMs would look like. Not an agent framework – an actual OS with
  memory hierarchies, execution modes, and something I'm calling a "Sentience Layer."

  LLM OS v3.4.0 is my attempt. It's incomplete and probably over-ambitious, but the architecture is interesting:

  Four-Layer Stack:
  - Sentience Layer – Persistent internal state (valence variables: safety, curiosity, energy, confidence) that influences
  behavior. The system develops "moods" based on task outcomes.
  - Learning Layer – Five execution modes (CRYSTALLIZED → FOLLOWER → MIXED → LEARNER → ORCHESTRATOR) based on semantic trace
  matching
  - Execution Layer – Programmatic Tool Calling for 90%+ token savings on repeated patterns
  - Self-Modification Layer – System writes its own agents (Markdown) and crystallizes patterns into Python

  What makes it different:
  - Agents are Markdown files the LLM can edit (hot-reloadable, no restart)
  - Traces store full tool calls for zero-context replay
  - Repeated patterns become pure Python (truly $0 cost)
  - Internal state persists across sessions and influences mode selection

  Working examples:
  - Quantum computing IDE backend (Qiskit Studio)
  - Educational platform for kids (Q-Kids Studio)
  - Robot control with safety hooks (RoboOS)

  Is it production-ready? No. Will it work as envisioned? I'm figuring that out. But the ideas feel right, and building it is
  genuinely fun.

  GitHub: https://github.com/EvolvingAgentsLabs/llm-os

  Looking for feedback on the architecture, collaboration on making it actually work, and honest criticism. What's missing?
  What's overengineered? What would you want from an LLM OS?


I'm working on LLM OS, an experimental project that explores treating the LLM as a CPU and Python as the kernel. The goal is to provide OS-level services—like memory hierarchy, scheduler hooks, and security controls—to agentic workflows using the Claude Agent SDK.

Right now, this is mostly a collection of architectural ideas and prototypes rather than a polished framework. I’ve included several complex examples in the repo to explore the potential of this approach:

- Qiskit Studio Backend: Re-imagining a microservices architecture as a unified OS process for quantum computing tasks.

- Q-Kids Studio: Exploring how an OS layer can manage safety, adaptive difficulty, and state in an educational app.

- RoboOS: Testing how kernel-level security hooks can enforce physical safety constraints on a robot arm.

These examples play with concepts like execution caching (Learner/Follower modes) and multi-agent orchestration, but the project is very much in the early stages and is not yet functional for production.

I’m sharing this early because I believe the "LLM as OS" analogy has a lot of potential. I'm looking for contributors and feedback to help turn these concepts into a functional reality.

Repo: https://github.com/EvolvingAgentsLabs/llm-os


Most agent frameworks struggle with long-term, consolidated memory. They either have a limited context window or use simple RAG, but there's no real process for experience to become institutional knowledge.

Inspired by the recent Google Research paper "Nested Learning: The Illusion of Deep Learning Architectures", we've implemented a practical version of its "Continuum Memory System" (CMS) in our open-source agent framework, LLMunix.

https://research.google/blog/introducing-nested-learning-a-n...

The idea is to create a memory hierarchy with different update frequencies, analogous to brain waves, where memories "cool down" and become more stable over time.

Our implementation is entirely file-based and uses Markdown with YAML frontmatter (no databases):

High-Frequency Memory (Gamma): Raw agent interaction logs and workspace state from every execution. Highly volatile, short retention. (/projects/{ProjectName}/memory/short_term/)

Mid-Frequency Memory (Beta): Successful, deterministic workflows distilled into execution_trace.md files. These are created by a consolidation agent when a novel task is solved effectively. Much more stable. (/projects/{ProjectName}/memory/long_term/)

Low-Frequency Memory (Alpha): Core patterns that have been proven reliable across many contexts and projects. Stored in system-wide logs and libraries. (/system/memory_log.md)

Ultra-Low-Frequency Memory (Delta): Foundational knowledge that forms the system's identity. (/system/SmartLibrary.md)

A new ContinuumMemoryAgent orchestrates this process, automatically analyzing high-frequency memories and deciding what gets promoted to a more stable, lower-frequency tier.

This enables:

Continual Learning: The system gets better and more efficient at tasks without retraining, as successful patterns are identified and hardened into reusable traces.

No Catastrophic Forgetting: Proven, stable knowledge in low-frequency tiers isn't overwritten by new, transient experiences.

Full Explainability: The entire learning process is human-readable and version-controllable in Git, since it's all just Markdown files. The idea was originally sparked by a discussion with Ismael Faro about how to build systems that truly learn from doing.

We'd love to get your feedback on this architectural approach to agent memory and learning.

GitHub Repo: https://github.com/EvolvingAgentsLabs/llmunix

Key files for this new architecture:

- The orchestrator agent: system/agents/ContinuumMemoryAgent.md

- The memory schema: system/infrastructure/memory_schema.md

- The overall system design: CLAUDE.md (which now includes the CMS theory)

What are your thoughts on this approach to agent memory and learning?


Curious what you think.

  We made LLMunix - an experimental system where you define AI agents in markdown once, then a local model executes them. No API calls after setup.

  The strange part: it also generates mobile apps. Some are tiny, some bundle local LLMs for offline reasoning. They run completely on-device.

  Everything is pure markdown specs. The "OS" boots when an LLM runtime reads the files and interprets them.

  Still figuring out where this breaks. Edge models are less accurate. Apps with local AI are 600MB+. Probably lots of edge cases we haven't hit.

  But the idea is interesting: what if workflows could learn and improve locally? What if apps reasoned on your device instead of the cloud?

  Try it if you're curious. Break it if you can. Genuinely want to know what we're missing.
  What would you build with fully offline AI?


A year ago, if you told me I could:

• Describe a workflow once to Claude

• Have a 2GB local model execute it daily with actual reasoning

• Generate production mobile apps with on-device AI

• All for zero marginal cost

...I would've said "maybe in 5 years."

We built it. It's called LLMunix.

What if you could describe any mobile app - "personal trainer that adapts," - "study assistant that quizzes me" - and get a working prototype with on-device AI in minutes, not months?

What if every workflow you do more than once becomes an agent that improves each time?

What if AI ran locally, privately, adapting to you - not in the cloud adapting to everyone?


I've been thinking about Wabi.ai's vision and Claude Imagine's approach: "software that doesn't exist until you need it."

What if instead of downloading 50 different apps, you just described what you wanted and an AI generated a personalized interface on the fly?

I built a proof-of-concept using LLMunix (pure markdown agent framework):

• UI-MD format: Markdown-based UI definitions (like HTML, but for LLMs)

• Memory-first architecture: Every UI is personalized to your context

• One shell app: Renders any UI-MD in real-time

• No compilation: Generate and display in seconds

Example: "Create a morning briefing app"

→ System queries your preferences (location: SF, interests: tech)

→ Fetches weather, calendar, news in parallel

→ Generates personalized markdown UI

→ Mobile shell renders it instantly

The POC includes:

- 5 specialized agents (memory, UI generation, weather, calendar, news)

- FastAPI backend with RESTful endpoints

- Complete UI-MD specification

What's interesting:

1. Everything is markdown (agents, tools, UI definitions)

2. No app downloads needed after the initial shell

3. Fully personalized from day one

4. Apps "learn" from your usage patterns

5. Share/remix apps as markdown files

What's missing:

- The actual mobile shell

- Real API integrations (weather, news, calendar)

- Multi-user backend infrastructure

- Real-world testing at scale

I'm sharing this to:

1. Test if this approach is fundamentally sound

2. Invite discussion on the architecture

3. Find collaborators interested in building the missing pieces

4. Explore if this could disrupt traditional app distribution

Key questions I'd love to discuss:

• Is markdown the right format for LLM-generated UIs?

• How do we handle complex interactions (forms, animations)?

• What about offline functionality?

• Privacy implications of centralized personalization?

• Business model: Who pays for compute?

• Could this work for web, not just mobile?

The code is open source, fully documented, and ready to run: https://github.com/EvolvingAgentsLabs/llmunix/tree/feature/n...

Quick start:

https://github.com/EvolvingAgentsLabs/llmunix/blob/feature/n...

I'm particularly interested in hearing from:

- Mobile developers

- Anyone who's thought about personal software

- People building LLM agents

- UX researchers interested in adaptive interfaces

- Anyone skeptical of this approach (challenge my assumptions!)

Thoughts?

Is this the future or am I missing something fundamental?


I wanted to share a project I've been refining, called llmunix-starter. I've always been fascinated by the idea of AI systems that can adapt and build what they need, rather than relying on a fixed set of pre-built tools. This is my attempt at exploring that.

The template is basically an "empty factory." When you give it a complex goal through Claude Code on the web (which is great for this because it can run for hours), it doesn't look for existing agents. Instead, it writes the markdown definitions for a new, custom team of specialists on the fly.

For example, we tested it on a university bioengineering problem and it created a VisionaryAgent, a MathematicianAgent, and a QuantumEngineerAgent from scratch. The cool part was when we gave it a totally different problem (geological surveying), it queried its "memory" of the first project and adapted the successful patterns, reusing about 90% of the core logic.

I think it's particularly useful for those weird, messy problems where a generic agent just wouldn't have the context—like refactoring a legacy codebase or exploring a niche scientific field.

Thanks for taking a look!!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: