Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ok and then? Those models were not trained for this purpose.

It's like the last hype over using generative AI for trading.

You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.



These are all multi-modal models, right? And the vision capabilities are particularly touted in Gemini.

https://ai.google.dev/gemini-api/docs/image-understanding


It is relevant because they are trained for the purpose of browser use and completing tasks on websites. Being able to bypass captchas is important for using many websites.

It would be nice to see comparisons to some special-purpose CAPTCHA solvers though.


And more broadly, if an agent is supposed to do everything a human can on the web, its ability to solve a captcha is likely a decent litmus test.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: