We would've preferred to build this as browser extension too.
But we strongly believe that for building a good agent co-pilot we need bunch of changes at Chromium C++ code level. For example, chromium has a accessibility tree for every website, but doesn't expose it as an API to chrome extension. Having access to accessibility tree would greatly improve agent execution.
We are also building bunch of changes in C++ for agents to interact with websites -- functions like click, elements with indexes. You can inject JS for doing this but it is 20-40X slower.
How is that accessibility tree different from the “accessibility snapshot” that you can get from Playwright for example?
I was tackling a similar problem few weeks ago and I found that playwright MCP was the most usable solution in my case. It doesn’t use an extension but it debugs the browser tabs (I guess using dev tools protocol) but I agree the experience was suboptimal
We don't mind upstreaming. But I don't think Google Chrome/Chromium wants to expose it as an API chrome extensions, if not they would've done this long time ago.
From Google's perspective, extension are meant to be lightweight applications, with restricted access.
I'm not really interested in AI agents for my webbrowser, but it would be pretty cool to see a fork of chromium available that, aside from being de-googled, relaxes all the "restricted access" to make it more fun to modify and customize the way you guys are. Just a thought, may be more of a market for the framework more than the product :)
See Sciter. A very cool, super lightweight alternative to Electron, but unfortunately it seems like a single developer project and I could never get any of the examples to run.
I always wonder about what sort of js engine such projects use since at the end of the day imo, it is all just a dance b/w js engine, html and css. Html & Css feels a little solved problem but the problem is of the js engine.
Sciter uses quickjs and I just checked and its like 35-36x times slower than V8 JIT
Also another interesting rabbit hole is that I found Duktape in the quickjs benchmarks and I saw
https://blogcpp.org/ as one of the projects within Duktape but I can't even see the project on github. We really need some better way of preserving open source stuff I guess
good callout wrt being slower than JIT. ofc for certain applications it's not a showstopper, ie, if you're not using javascript for your MVC but doing more of a progressive enhancement thing.
CSS2 is closer to trivial, but CSS3 is practically a 3D game engine with all of its matrix transforms, transitions, animations, variables - not to mention all the different layout schemes (Sciter blogged about introducing display:flex and display:grid two months ago)
The most interesting part of Sciter to me is that data persistence goes way beyond localStorage (string key: string value) or filesystem API, instead it's DyBase [0][1] behind the scenes, which looks to be a very intriguing style of storing trees of data in the host language's datatype (including whatever classes you define) without mucking about with the leaky abstractions of an ORM.
I'm not GP, but I agree that if your goal is to empower the end user and protect him from corporate overlords, then Firefox is a more logical choice to fork from.
I wonder if someone on the Chromium team will upstream all these BrowserOS changes, or "Not Invented Here" and re-implement it all for Gemini / Google Assistant.
Isn't the Firefox code notoriously hard to fork and work with? I'm sure that nearly all of these Chrome forks would prefer to fork Firefox, but there's a reason they don't.
But we strongly believe that for building a good agent co-pilot we need bunch of changes at Chromium C++ code level. For example, chromium has a accessibility tree for every website, but doesn't expose it as an API to chrome extension. Having access to accessibility tree would greatly improve agent execution.
We are also building bunch of changes in C++ for agents to interact with websites -- functions like click, elements with indexes. You can inject JS for doing this but it is 20-40X slower.