More

theredsix · 2025-01-27T03:59:24 1737950364

If you need to test in production, have a family relative or friend buy the product/subscription and then make them whole offline. Also, do not reverse or refund the transaction.

edoceo · 2025-01-30T17:30:19 1738258219

Been doing it like this since at least 2000. Never been a problem.

theredsix · on Nov 5, 2024

Awesome project, starred! Here are some other projects for agentic browser interactions:

* Cerebellum (Typescript): https://github.com/theredsix/cerebellum

* Skyvern: https://github.com/Skyvern-AI/skyvern

Disclaimer: I am the author of Cerebellum

gregpr07 · on Nov 5, 2024

Thanks man, starred yours too, it's super cool to see all these projects getting spun up!

I see Cerebellum is vision only. Did you try adding HTML + screenshot? I think that improves the performance like crazy and you don't have to use Claude only.

Just saw Skyvern today on previous Show HNs haha :)

theredsix · on Nov 5, 2024

I had an older version that used simplified HTML, and it got to decent performance with GPT-4o and Gemini but at the cost of 10x token usage. You are right, identifying the interactable elements and pulling out their values into a prompt structure to explicitly allow the next actions can boost performance, especially if done with grammar like structured outputs or guidance-llm. However, I saw that Claude had similar levels of performance with pure vision, and I felt that vision + more training would beat a specialized DOM algorithm due to "the bitter lesson".

BTW I really like your handling of browser tabs, I think it's really clever.

gregpr07 · on Nov 5, 2024

Fair, also Claude probably only gets better on this since they kinda want people to use Computer use. We are gonna try to do best of both worlds.

Thanks man, Magnus came up with it this morning haha!

dbacar · on Nov 6, 2024

I starred both of you

theredsix · on Oct 31, 2024

The next step will be adding functionality to convert and save a BrowserStep[] into a portable file format and addition conversation functions to turn those files into .jsonl that can be fed into the transformers library etc. For the PII piece, there's no current plans to introduce anonymization features but open to suggestions.

theredsix · on Oct 31, 2024

Not at the moment, since you need a local model with strong segmentation capabilities (x, y) and none exist ATM. We hope to train one in the future and one of Cerebellum's roadmap items is to create a the ability to save your sessions as a training dataset.

Jayakumark · on Nov 1, 2024

Any idea on how does Sonnet does this, is the image annotated with bounding boxes on text boxes etc. along with its coordinates before sending to sonnet and it responds with box name back or co-ordinate back or ? is SAM2 used for segmenting everything before sending to sonnet ?

theredsix · on Nov 1, 2024

They don't discuss this at all on their blog other than "Training Claude to count pixels accurately was critical." My speculation on how they accomplished it is either explicit tokenizer support with spacial encoding similar to how single-digit tokenization improves math abilities or an extensive pretraining like Molmo.

digdugdirk · on Oct 31, 2024

Do you not think it could work with a shim layer that handled the browser interaction via code and selenium?

theredsix · on Nov 1, 2024

Selenium works on webdriver v4 and the screenshot is transferred as an image by the webdriver protocol. Perhaps modifying DOM before triggering the screenshot and then reverting the changes can work. PRs are welcome!

theredsix · on Oct 31, 2024

OP here, happy to answer any questions you may have!

philonoist · on Nov 1, 2024

What do you think about this tool changing the landscape of software testing?

I think you could change the roles of SDETs and other quality assurance jobs dominated by Selenium and Playwright. I mean think about it. It would half the number of testers needed to do the same work.

theredsix · on Nov 1, 2024

I think if you added additional function calls to detect visual bugs or breaking flows, tools such as this could automate much of QA in addition to detecting non-intuitive UI design patterns.

david_shi · on Nov 2, 2024

Any plans for a python version?

theredsix · on Nov 2, 2024

It's on the roadmap! A few other priorities are higher at the moment, but we'll be excited to see a PR for it in the meantime.

theredsix · on Nov 4, 2024

Update: We had a contributor start a Python port, stay tuned!

hugs · on Oct 31, 2024

Thanks for using Selenium!

theredsix · on Oct 30, 2024

Hi OP here, happy to answer any question!

theredsix · on Oct 29, 2024

You get the partner something shiny too!

theredsix · on Aug 19, 2024

what's your website?

madjam002 · on Aug 19, 2024

https://propertyengine.co.uk

swatcoder · on Aug 19, 2024

It's plausible that it's being naively flagged as something like phishing/misrepresentation of the ".com" site using that name.

madjam002 · on Aug 19, 2024

Hmm possibly, good idea. That's a different company based in SA in a different space, but maybe you're right.

blitzar · on Aug 19, 2024

Buy uk<sitename>.com or <sitename>uk.com and redirect.

(and probably buy the .co.uk as well for the full set)

theredsix · on Aug 9, 2024

You should evaluate the FOSS software features AS IS and ask if you're okay with the current feature set if all future features are behind an "enterprise" tier. If you are, and the hosting of the current version is manageable, then the product is good for both sides. I've often found running the numbers for paying the vendor for cloud vs amortizing devOps costs comes out in favor of the cloud version. I see this as a win-win for both the customer and the company.

theredsix · on July 17, 2024

The author's tone is condescending, angry and entitled. If everyday interactions with him followed the same tone, I would argue that he is the exact type of person behavioral interviews are meant to screen out (technically competent but a nightmare to work with).

whoknowsidont · on July 18, 2024

Tone policing is the sign of someone who has nothing else to offer except their indignation.

dennis_jeeves2 · on July 20, 2024

Upvoted your comment. So many comments here are trying to psychoanalyze a smart guy (in my view). A lot of what he has written is sarcasm - which is total lost on the autistic nerd.

simoncion · on July 18, 2024

If this guy has reasonable technical chops, he seems like someone who would be great to work with.

It's always, always good to have people in your group who are willing to call a steaming shitpile a steaming shitpile. It's also always good to have people in your group who can fairly rapidly turn a steaming shitpile into something that's fit for purpose and reasonably maintainable.

ryandrake · on July 18, 2024

> It's always, always good to have people in your group who are willing to call a steaming shitpile a steaming shitpile.

A lot of companies really don't want that. They'd rather someone who will say, in Bill Lumbergh's voice "Yeaaaaaaa, I think we ought to maybe workshop that a bit, okay?"

Saying "This code is no good and I'm not going to continue the review" will not pass muster at any modern American office where everyone is expected to wear a positivity-mask.

advael · on July 18, 2024

This comment's tone is presumptuous, judgmental, and reactionary. Based on the small sample of textual prose in this completely unrelated context, I have to assume that the author's whole interpersonal vibe and decision-making process is not a culture fit for our organization. We were impressed by your background but will be pursuing other applicants

luzojeda · on July 17, 2024

I didn't feel he was angry at all. Just a person with a hughe passion about his craft and how he dislikes the change of the work environment of IT. Completely reasonable for me.