Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks nice. I find the cleaning HTML step in our cleaning pipeline extremely important, otherwise there is no real benefit from just using a general vision model and clicking coordinates (and whole HTML is just way too many tokens). How do you guys handle that?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: