More

nodoodles · 2025-12-02T07:23:33 1764660213

Only a 100 years — the whole history before that was working in the vicinity of a home, it does feel natural to return to that. Instead of anvils, we hit keyboards and instead of swords produce alignment, but either way it brings food to the table and allows flexibility in work-life?

2b3a51 · 2025-12-02T09:26:31 1764667591

Not in tech but was a teacher for decades. My first teaching job in early 80s of the last century had a requirement that teachers live within 5 miles of the building.

In general; perhaps a return to guilds? Apprentices? In an area of my city that has a lot of small craft workshops (and, yes, a few have anvils) there are 'work-live' units being built that have workshops on the ground floor and living accommodation above.

angoragoats · 2025-12-03T01:04:38 1764723878

What the heck does “produce alignment” mean? I don’t produce alignment, I produce software which solves problems for people.

nodoodles · 2025-09-29T23:15:16 1759187716

Yet another browser-based screen/video recorder and editor but with multiple inputs, full privacy and scriptable effects - a slow weekend project

nodoodles · 2025-09-26T13:33:20 1758893600

Here's a bookmarklet:

  javascript:document.querySelectorAll('a').forEach(el => el.style.backgroundColor = 'transparent')

nodoodles · 2025-04-10T08:05:18 1744272318

On 2/ there is a safety point in having the grounding pin on all plugs, and it being longer than the live/neutral: in the socket side, the grounding pin opens up latches that block live/neutral, so kids can’t stick things into them..

I generally would agree it is the best plug standard for safety, but clunky and painful to step on..

nodoodles · on Oct 23, 2024

A little page that tries to keep up with Flickr uploads in real time.. built ca 13 years ago, amazingly still running. http://ekke.si/flickr/?render=random

nodoodles · on Aug 29, 2024

I have thought of that as ‘n’ being the manageable threshold and the (uncontrolled) ‘+1’ the overflow creates the problems. Typically in terms of additional layers or iterations.. but i like your point and perhaps’n+1’ and ‘1+n’ mean different problem shapes

crabmusket · on Aug 29, 2024

I could see that making sense in some contexts, like the "straw that breaks the camel's back". However that's not what's usually being referred to in this database query problem.

Usually it's doing one query that returns n results, then doing one more query for each result. Therefore, you end up having done 1+n queries. If you'd used a join you could potentially have done only 1 query.

nodoodles · on May 7, 2024

What I'd love to see is scraper builder that uses LLMs/'magic' to generate optimised scraping rules for any page, ie css selectors and processing rules mapped to output keys. So you can run scraping itself at low cost and high performance..

jumploops · on May 8, 2024

Agreed!

Apify's Website Content Crawler[0] does a decent job of this for most websites in my experience. It allows you to "extract" content via different built-in methods (e.g. Extractus [1]).

We currently use this at Magic Loops[2] and it works _most_ of the time.

The long-tail is difficult though, and it's not uncommon for users to back out to raw HTML, and then have our tool write some custom logic to parse the content they want from the scraped results (fun fact: before GPT-4 Turbo, the HTML page was often too large for the context window... and sometimes it still is!).

Would love a dedicated tool for this. I know the folks at Reworkd[3] are working on something similar, but not sure how much is public yet.

[0] https://apify.com/apify/website-content-crawler

[1] https://github.com/extractus/article-extractor

[2] https://magicloops.dev/

[3] https://reworkd.ai/

KhoomeiK · on May 8, 2024

This is essentially what we're building at https://reworkd.ai (YC S23). We had thousands of users try using AgentGPT (our previous product) for scraping and we learned that using LLMs for web data extraction fundamentally does not work unless you generate code.

nodoodles · on May 8, 2024

Awesome to hear! Looking forward to a launch -- the Waitlist form was too long to complete, need to take another LLM to fill that :)

KhoomeiK · on May 9, 2024

1 month away ;)

spxneo · on May 8, 2024

all around automation sucks with LLM thrown on top of it

the statistics are not in its favour

visarga · on May 8, 2024

Code is also hard. You got to generate code that accounts for all possible exceptions or errors. If you want to automate an UI for example, pushing a button can cause all sorts of feedback, errors, consequences that need to be known to write the code.

KhoomeiK · on May 8, 2024

Yep, until you generate code—it's harder from a technical POV but you can get way higher performance & reliability.

longgui0318 · on May 8, 2024

Here's a project that describes the use of llm to generate crawling rules and then capture them, but it looks like it's still in the early stages of research.

https://github.com/EZ-hwh/AutoCrawler

nodoodles · on May 8, 2024

Thanks, will look into it, looks promising

nikcub · on May 8, 2024

Most of the top LLM already do this very well. It's because they've been trained on web data, and also because they're being used for precisely this task internally to grab data.

The complicated ops of scraping is running headless browsers, IP ranges, bot bypass, filling captchas, observability and updating selectors, etc. There are a ton of SaaS services that do that part for you.

nodoodles · on May 8, 2024

Agreed there are several complexities but not sure which ‘this’ you mean - specifically updating selectors is one of the areas I had in mind earlier..

selimthegrim · on May 8, 2024

There was one I remember out of UF/FSU called Intoli that seems to have pivoted into consulting.

greggsy · on May 7, 2024

It seems also obvious that one would want to simply drag a box around the content you want, and the tool would just provide some examples to help you refine the rule set.

Ad blockers have had something very close to this for some time, without any sparkly AI buttons.

I’m sure someone would be working on a subscription based model using corporate models in the backend, but it’s something that could easily be implemented with a very small model.

uptown · on May 8, 2024

Mozenda does something like that. I haven't used it in many years, so I'm not up to date on what it currently offers.

geuis · on May 7, 2024

That's an interesting take. I've been experimenting with reducing the overall rendered html size to just structure and content and using the LLM to extract content from that. It works quite well. But I think your approach might be more efficient and faster.

nodoodles · on May 8, 2024

One fun mechanism I've been using for reducing html size is diffing (with some leniency) pages from same domain to exclude common parts (ie headers/footers). That preprocessing can be useful for any parsing mechanism..

cpobuda · on May 7, 2024

I have been working on this. Feel free to DM me.

wraptile · on May 8, 2024

Parsing html is a solved and frankly not a very interesting problem. Writing up xpath/css selectors or JSON parsers (for when data is in script variables) is not much of a challenge for anyone.

More interesting issue is being able to parse data from the whole page content stack which includes XHRs and their triggers. In this case LLM driver would control an indistinguishable web browser to perform all steps to retrieve the data as a full package. Though this is still a low value proposition as the models would get fumbled by harder tasks and easier tasks can be performed by a human being in couple of hours.

LLM use in web scraping is still purely educational and assistive as the biggest problem in scraping is not scraping itself but scraper scaling and blocking which is becoming extremely common.

_el1s7 · on May 12, 2024

Exactly, are you aware of any current efforts of people trying to do that?

wraptile · on May 13, 2024

Not anything in open source yet.

nodoodles · on April 9, 2024

It's "R.I.P.", not "RIP".

nodoodles · on April 1, 2024

Checks calendar… good one! I hope..

Good reminder why not to collect and keep personal data you don’t need

mfkp · on April 1, 2024

Published on 3/29...

westmeal · on April 1, 2024

haha whoops

nodoodles · on March 31, 2024

and folding plastic bags into triangles for storage