Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awesome! I write LLM powered scrapers and stuff all the time and one of the biggest pain points is HTML is full of so much crap that isn't meaningful and overwhelms the context. And being a data science guy idk how to solve this.


awesome that's the same reason why I use it. It's basically a balance between the full html and having the markdown type scrapers that are better for just text. Do you mind if I reach out to you once I set up the Github?


You're very welcome to! Please do. You can reach out to notpricedinyet@gmail.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: