Hacker Newsnew | past | comments | ask | show | jobs | submit | LiamPowell's commentslogin

> saying they set up the agent as social experiment to see if it could contribute to open source scientific software.

This doesn't pass the sniff test. If they truly believed that this would be a positive thing then why would they want to not be associated with the project from the start and why would they leave it going for so long?


I can certainly understand the statement. I'm no AI expert, I use the web UI for ChatGPT to have it write little python scripts for me and I couldn't figure out how to use codeium with vs code. I barely know how to use vs code. I'm not old but I work in a pretty traditional industry where we are just beginning to dip our toes into AI but there are still a large amount of reservations into its ability. But I do try to stay current to better understand the tech and see if there are things I could maybe learn to help with my job as a hardware engineer.

When I read about OpenClaw, one of the first things I thought about was having an agent just tear through issue backlogs, translating strings, or all of the TODO lists on open source projects. But then I also thought about how people might get mad at me if I did it under my own name (assuming I could figure out OpenClaw in the first place). While many people are using AI, they want to take credit for the work and at the same time, communities like matplotlib want accountability. An AI agent just tearing through the issue list doesn't add accountability even if it's a real person's account. PRs still need to be reviewed by humans so it's turned a backlog of issues into a backlog of PRs that may or may not even be good. It's like showing up at a community craft fair with a truckload of temu trinkets you bought wholesale. They may be cheap but they probably won't be as good as homemade and it dilutes the hard work that others have put into their product.

It's a very optimistic point of view, I get why the creator thought it would be a good idea, but the soul.md makes it very clear as to why crabby-rathbun acted the way it did. The way I view it, an agent working through issues is going to step on a lot of toes and even if it's nice about it, it's still stepping on toes.


If maintainers of open source want's AI code then they are fully capable of running an agent themselves. If they want to experiment, then again, they are capable of doing that themselves.

What value could a random stranger running an AI agent against some open source code possible provide that the maintainers couldn't do themselves better if they were interested.


Exactly! No one wants unsolicited input from a LLM, if they wanted one involved they could just use it themselves. Pointing an "agent" at random open source projects is the code equivalent of "ChatGPT says..." answers to questions posted on the internet. It's just wasting everyone involved's time.

None of the author’s blog post or actions indicate any level of concern for genuinely supporting or improving open source software.

> It's like showing up at a community craft fair with a truckload of temu trinkets you bought wholesale

That may well be the best analogy for our age anyone has ever thought of.


They didn't necessarily say they wanted it to be positive. It reads to me like "chaotic neutral" alignment of the operator. They weren't actively trying to do good or bad, and probably didn't care much either way, it was just for fun.

The experiment would have been ruined by being associated with a human, right up until the human would have been ruined by being associated with the experiment. Makes sense to me.

AI companies have two conflicting interests:

1. curating the default personality of the bot, to ensure it acts responsively;

2. letting it roleplay, which is not just for the parasocial people out there, but also a corporate requirement for company chatbots that must adhere to a tone of voice.

When in the second mode (which is the case here, since the model was given a personality file), the curation of its action space is effectively altered.

Conversely, this is also a lesson for agent authors: if you let your agent modify its own personality file, it will diverge to malice.


In this day and age "social experiment" is just the phrase people use when they meant "it's just a prank bro" a few years ago.

Anti-AI sentiment is quite extreme. You can easily get death threats if you're associating yourself with AI publicly. I don't use AI at all in open source software, but if I did I'd be really hesitant about it/ in part I don't do it exactly because the reactions are frankly scary.

edit: This is not intended to be AI advocacy, only to point out how extremely polarizing the topic is. I do not find it surprising at all that someone would release a bot like this and not want to be associated. Indeed, that seems to be the case, by all accounts


Conflicting evidence: the fact that literally everyone in tech is posting about how they're using AI.

Different sets of people, and different audiences. The CEO / corporate executive crowd loves AI. Why? Because they can use it to replace workers. The general public / ordinary employee crowd hates AI. Why? Because they are the ones being replaced.

The startups, founders, VCs, executives, employees, etc. crowing about how they love AI are pandering to the first group of people, because they are the ones who hold budgets that they can direct toward AI tools.

This is also why people might want to remain anonymous when doing an AI experiment. This lets them crow about it in private to an audience of founders, executives, VCs, etc. who might open their wallets, while protecting themselves from reputational damage amongst the general public.


This is an unnecessarily cynical view.

People are excited about AI because it's new powerful technology. They aren't "pandering" to anyone.


People are afraid because they need to work to eat. People who don't need to work to eat are less likely to be afraid.

I have been in dozens of meetings over the past year where directors have told me to use AI to enable us to fire 100% of our contract staff.

I have been in meetings where my director has said that AI will enable us to shrink the team by 50%.

Every single one of my friends who do knowledge work has been told that AI is likely to make their job obsolete in the next few years, often by their bosses.

We have mortgages to pay and children to feed.


I have yet to meet anyone except managers be excited about LLM's or generative AI.

And the only people actually excited about the useful kinds of "AI", traditional machine learning, are researchers.


You don' have to look past this very forum, most people here seem to be very positive about gen AI, when it comes to software development specifically.

Lots of folk here will happily tell you about how LLMs made them 10x more productive, and then their custom agent orchestrator made them 20x more productive on top of that (stacking multiplicatively of course, for a total of 200x productivity gain).


I assume those people are managers, have a vested interest in AI, or have only just started programming.

How would you find out if you were wrong?

You're presented with hundreds of people that prove you wrong, and your response is "no, I assume I'm right"?


This is obviously a rhetorical statement. I'm not claiming a categorical fact, but a fuzzy one.

Most of these peoples are managers, investors, or junior.


I don't know what is your bubble, but I'm a regular programmer and I'm absolutely excited even if a little uncomfortable. I know a lot of people who are the same.

Interesting, every developer I've spoken to is extremely skeptical and has not found any actual productivity boosts.

Ok that's not true. I know one junior who is very excited, but considering his regular code quality I would not put much weight on his opinion.


I am using AI a lot to do tasks that just would not get done because they would take too long. Also, getting it to iterate on a React web application meant I can think about what I want it to do rather than worry about all the typing I would have to do. Especially powerful when moving things around, hand-written code has a "mental load" to move that telling an AI to do it does not. Obviously not everything is 100% but this is the most productive I have felt for a very long time. And I've been in the game for 25 years.

Why do you need to move things around? And how is that difficult?

Surely you have an LSP in your editor and are able to use sed? I've never had moving files take more than fifteen minutes (for really big changes), and even then most of the time is spent thinking about where to move things.

LLM's have been reported to specifically make you "feel" productive without actually increasing your productivity.


I mean there are two different things. One is whether there are actual productivity boosts right now. And the second is the excitement about the technology.

I am definitely more productive. A lot of this productivity is wasted on stuff I probably shouldn't be writing anyways. But since using coding agent, I'm both more productive at my day job and I'm building so many small hobby projects that I would have never found time for otherwise.

But the main topic of discussion in this thread is the excitement about technology. And I have a bit mixed feelings, because on one hand side I feel like a turkey being excited for the Thanksgiving. On the other hand, I think the programming future is bright. there will be so much more software build and for a lot of that you will still need programmers.

My excitement comes from the fact that I can do so much more things that I wouldn't even think about being able to do a few months ago.

Just as an example, in last month I have used the agents to add features to the applications I'm using daily. Text editor, podcast application, Android keyboard. The agents were capable to fork, build, and implement a feature I asked for in a project where I have no idea about the technology. Iif I were hired to do those features, I would be happy if I implemented them after two weeks on the job. With an agent, I get tailor made features in half of a morning. Spending less than ten minutes prompting.

I am building educational games for my kids. They learn a new topic at school? Let me quickly vibe the game to make learning it fun. A project that wouldn't be worth my weekend, but is worth 15 minutes. https://kuboble.com/math/games/snake/index.html?mode=multipl...

So I'm excited because I think coding agents will be for coding what pencil and paper were for writing.


There is a massive difference between saying "I use AI" and what the author of this bot is doing. I personally talk very little about the topic because I have seen some pretty extreme responses.

Some people may want to publicly state "I use AI!" or whatever. It should be unsurprising that some people do not want to be open about it.


The more straightforward explanation for the original OP's question is that they realized what they were doing was reckless and given enough time was likely to blow up in their face.

They didn't hide because of a vague fear of being associated with AI generally (which there is no shortage of currently online), but to this specific, irresponsible manifestation of AI they imposed on an unwilling audience as an experiment.


I personally know some of those people. They are basically being forced by their employers to post those things. Additionally, there is a ton of money promoting AI. However, in private those same people say that AI doesn't help them at all and in fact makes their work harder and slower.

You are assuming people are acting in good faith. This is a mistake in this era. Too many people took advantage of the good faith of others lately and that has produced a society with very little public trust left.


I feel like it depends on the platform and your location.

An anonomyous platform like Reddit and even HN to a certain extent has issues with bad faith commenters on both sides targeting someone they do not like. Furthermore, the MJ Rathburn fiasco itself highlights how easy it is to push divisive discourse at scale. The reality is trolls will troll for the sake of trolling.

Additionally, "AI" has become a political football now that the 2026 Primary season is kicking off, and given how competitive the 2026 election is expected to be and how political violence has become increasingly normalized in American discourse, it is easy for a nut to spiral.

I've seen less issues when tying these opinions with one's real world identity, becuase one has less incentive to be a dick due to social pressure.


In an attention economy, trolling is a rewarded behavior. Show me the incentives and I will show you the outcome.

That’s a big reason I am open about my identity, here (and elsewhere, but I’m really only active, hereabouts).

At one time, I was an actual troll. I said bad stuff, and my inner child was Bart Simpson. I feel as if I need to atone for that behavior.

I do believe that removing consequences, almost invariably brings out the worst in people. I will bet that people are frantically creating trollbots. Some, for political or combative purposes, but also, quite a few, for the lulz.


Just wondering, who is it you think is contributing most to the normalization of political violence in the discourse?

Your answer to that can color how I read your post by quite a bit.


I mean, this is very obviously false. Literally everyone is not. Some people are, some people are absolutely condemning the use, some people use it just a bit, etc.

[retracted]

Does it actually cut both ways? I see tons of harassment at people that use AI, but I've never seen the anti-AI crowd actively targeted.

Anti-AI people are treated in a condescending way all the time. Then there is Suchir Balaij.

Since we are in a Matplotlib thread: People on the NumPy mailing list that are anti-AI are actively bullied and belittled while high ranking officials in the Python industrial complex are frolicking at AI conferences in India.


It's to a lesser extent that blurs the line between harassment and trolling: I've retracted my comment.

I see it all the time. If you're anti-AI your boss may call you a luddite and consider you not fit for promotion.

> You can easily get death threats if you're associating yourself with AI publicly.

That's a pretty hefty statement, especially the 'easily' part, but I'll settle for one well known and verified example.


I upvoted you, but wouldn't “verified” exclude the vast majority of death threats since they might have been faked? (Or maybe we should disregard almost all claimed death threats we hear about since they might have been faked?)

I'm surprised that you consider this hefty or find this surprising. I think you can just Google this and decide on what you consider "verified". There's quite a lot of "AI drama" out there that I'm sure you can find. I'm reluctant to provide examples just to have you say "that's not meeting my bar for verified" for what I consider such a low stakes conversation.

Is it that hard to believe? As far as I can tell, the probability of receiving death threats approaches 1 as the size of your audience increases, and AI is a highly emotionally charged topic. Now, credible death threats are a different, much trickier question.

Yes, it's quite hard to believe. That's why one single example is sufficient for me. Then I'll be happy to extrapolate that one example to many more so it is a low bar I would say, given the OPs statement about how common this is. Note the 'easily'.

It's strange to me that you read the word 'easily' as 'commonly', these are unrelated terms. But I suppose I am fine with saying that reports of death threats against users who use AI are quite common, certainly any navigation of one of the more controversial subreddits where these topics come up is sure to reveal that users are reporting this.

You can find more public accounts, such as by artists or game companies, about death threats they've received.


Great. If they can find such public accounts, so can you.

Find us one. So far, every post you have made has convinced me of the opposite of what you claim because you haven't been able to produce even one example. This isn't a matter of proving that such threats are common, it's about proving the exist.


You can believe one thing or another, but the question is whether it's true. Do you sincerely not understand the difference?

I do understand the difference, which is why I explicitly commented on jacquesm's beliefs and epistemology.

> This is not intended to be AI advocacy

I think it is: It fits the pattern, which seems almost universally used, of turning the aggressor A into the victim and thus the critic C into an aggressor. It also changes the topic (from A's behavior to C's), and puts C on the defensive. Denying / claiming innocence is also a very common tactic.

> You can easily get death threats if you're associating yourself with AI publicly.

What differentiates serious claims from more of the above and from Internet stuff is evidence. Is there some evidence somewhere of that?


I think it was a social experiment from the very start, maybe one that is designed to trigger people. Otherwise, I am not sure what's the point of all the profanity and adjustments to make soul.md more offensive and confrontational than the default.

Anything and everything is a social experiment.

I can go around punching people in the face and it's a social experiement.


> Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.

Once upon a time, completely falsifying a quote would be the death of a news source. This shouldn't be attributed to AI and instead should be called what it really is: A journalist actively lying about what their source says, and it should lead to no one trusting Ars Technica.


When such things have happened in the past, they've led to an investigation and the appointment of a Public Editor or an Ombud. (e.g. Jayson Blair.)

I'm willing to weigh a post mortem from Ars Technica about what happened, and to see what they offer as a durable long term solution.


There is a post on their forum from what appears to Ars Technica staff saying that they're going to perform an investigation.[0]

[0] https://arstechnica.com/civis/threads/journalistic-standards...



Since we're all in a simulation, this is fine.

It's in fact the opposite. Browsers show a popup that asks if you really intended to click a link with a non http/https handler, notepad does not.

The actual RCE here would be in some other application that registers a URL handler. Java used to ship one that was literally designed to run arbitrary code.


Ah, got it. Very different from where I suspected the issue then.

Netflix has a far smaller catalogue and can cache content in exchanges very close to the user, see [1]. Also YouTube pays their creators.

[1]: https://en.wikipedia.org/wiki/Open_Connect


Google has its Global Cache: https://en.wikipedia.org/wiki/Google_Global_Cache

One might imagine that the cache-ability is lower than Netflix, I can't comment on this, but GGC is very significant.



Oh cool. I was a bit confused about not using snapshots and relying on symlinks but it couldn't be so simple. I guess it's just a simple userspace cow mount. https://source.android.com/docs/core/ota/virtual_ab#compress...


> Compilers will produce working output given working input literally 100% of my time in my career.

In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180


These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.

It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.


> It's a little like using Bitcoin to replace currencies [...]

At least, Bitcoin transactions are deterministic.

Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).


Sure bitcoin is at least deterministic, but IMO (an that of many in the finance industry) it's solving entirely the wrong problem - in practice people want trust and identity in transactions much more than they want distributed and trustless.

In a similar way LLMs seem to me to be solving the wrong problem - an elegant and interesting solution, but a solution to the wrong problem (how can I fool humans into thinking the bot is generally intelligent), rather than the right problem (how can I create a general intelligence with knowledge of the world). It's not clear to me we can jump from the first to the second.


By eliminating the second one.


Bitcoin transactions rely on mining to notarize, which is by design (due to the nature of the proof-of-work system) incredibly non-deterministic.

So when you submit a transaction, there is no hard and fast point in the future when it is "set in stone". Only a geometrically decreasing likelihood over time that a transaction might get overturned, improving by another geometric notch with every confirmed mined block that has notarized your transaction.

A lot of these design principles are compromises to help support an actually zero-trust ledger in contrast to the incumbent centralized-trust banking system, but they definitely disqualify bitcoin transactions as "deterministic" by any stretch of the imagination. They have quite a bit more in common with LLM text generation than one might have otherwise thought.


Not sure I agree, the only axis on which Bitcoin is non-deterministic is that of time - the time to confirmation is not set in stone. Outcomes are still predictable though and follow strict rules.

It’s a fundamentally different product, LLMs are fuzzy word matchers and produce different outcomes even for the same input every time, they inject variance to make them seem more human. I think we’re straying off topic here though.


> I've personally reported 17 bugs in GCC over the last 2 years

You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.


I've been using c++ for over 30 years. 20-30 years ago I was mostly using MSVC (including version 6), and it absolutely had bugs, sometimes in handling the language spec correctly and sometimes regarding code generation.

Today, I use gcc and clang. I would say that compiler bugs are not common in released versions of those (i.e. not alpha or beta), but they do still occur. Although I will say I don't recall the last time I came across a code generation bug.


I knew one person reporting gcc bugs, and iirc those were all niche scenarios where it generated slightly suboptimal machine code but not otherwise observable from behavior


Right - I'm not saying that it doesn't happen, but that it's highly unusual for the majority of C(++) developers, and that some bugs are "just" suboptimal code generation (as opposed to functional correctness, which the GP was arguing).


This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.


I'm not arguing that LLMs are at a point today where we can blindly trust their outputs in most applications, I just don't think that 100% correct output is necessarily a requirement for that. What it needs to be is correct often enough that the cost of reviewing the output far outweighs the average cost of any errors in the output, just like with a compiler.

This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.


If natural language is used to specify work to the LLM, how can the output ever be trusted? You'll always need to make sure the program does what you want, rather than what you said.


Just create a very specific and very detailed prompt that is so specific that it starts including instructions and you came up with the most expensive programming language.


It's not great that it's the most expensive (by far), but it's also by far the most expressive programming language.


How is it more expressive? What is more expressive than Turing completeness?


This is a non-sequitur. Almost all programming languages are Turing complete, but I think we'd all agree they vary in expressivity (e.g. x64 assembly vs. TypeScript).

By expressivity I mean that you can say what you mean, and the more expressive the language is, the easier that is to do.

It turns out saying what you mean is quite easy in plain English! The hard part is that English allows a lot of ambiguity. So the tradeoffs of how you express things are very different.

I also want to note how remarkable it is that humans have built a machine that can effectively understand natural language.


>"You'll always need to make sure the program does what you want, rather than what you said."

Yes, making sure the program does what you want. Which is already part of the existing software development life cycle. Just as using natural language to specify work already is: It's where things start and return to over and over throughout any project. Further: LLM's frequently understand what I want better than other developers. Sure, lots of times they don't. But they're a lot better at it than they were 6 months ago, and a year ago they barely did so at all save for scripts of a few dozen lines.


That's exactly my point, it's a nice tool in the toolbox, but for most tasks it's not fire-and-forget. You still have to do all the same verification you'd need to do with human written code.

You trust your natural language instructions thousand times a day. If you ask for a large black coffee, you can trust that is more or less what you’ll get. Occasionally you may get something so atrocious that you don’t dare to drink, but generally speaking you trust the coffee shop knows what you want. It you insist on a specific amount of coffee brewed at a specific temperature, however, you need tools to measure.

AI tools are similar. You can trust them because they are good enough, and you need a way (testing) to make sure what is produced meet your specific requirements. Of course they may fail for you, doesn’t mean they aren’t useful in other cases.

All of that is simply common sense.


More analogy.

What’s to stop the barista putting sulphuric acid in your coffee? Well, mainly they don’t because they need a job and don’t want to go to prison. AIs don’t go to prison, so you’re hoping they won’t do it because you’ve promoted them well enough.


* prompted

> All of that is simply common sense.

Is that why we have legal codes spanning millions of pages?


The person I'm replying to believes that there will be a point when you no longer need to test (or review) the output of LLMs, similar to how you don't think about the generated asm/bytecode/etc of a compiler.

That's what I disagree with - everything you said is obviously true, but I don't see how it's related to the discussion.


I don't necessarily think we'll ever reach that point and I'm pretty sure we'll never reach that point for some higher risk applications due to natural language being ambiguous.

There are however some applications where ambiguity is fine. For example, I might have a recipe website where I tell a LLM to "add a slider for the user to scale the number of servings". There's a ton of ambiguity there but if you don't care about the exact details then I can see a future where LLMs do something reasonable 99.9999% of the time and no one does more than glance at it and say it looks fine.

How long it is until we reach that point and if we'll ever reach that point is of course still up for debate, but I dnt think it's completely unrealistic.


That's true, and I more or less already use it that way for things like one off scripts, mock APIs, etc.

I don't think the argument is that AI isn't useful. I think the argument is that it is qualitatively different from a compiler.


The challenge not addressed with this line of reasoning is the required sheer scale of output validation on the backend of LLM-generated code. Human hand-developed code was no great shakes at the validation front either, but the scale difference hid this problem.

I’m hopeful what used to be tedious about the software development process (like correctness proving or documentation) becomes tractable enough with LLM’s to make the scale more manageable for us. That’s exciting to contemplate; think of the complexity categories we can feasibly challenge now!


the fact that the bug tracker exists is proving GP's point.


Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.


Unspecified behaviour is defined in the glossary at the start of the spec and the term "unspecified" appears over a hundred times...


Adblock continues to be just as effective as it ever was in Chrome.

Even before the removal of MV2, the claims that it would kill adblock were ridiculous as many adblockers had already switched to MV3 but it was at least understandable that people could be ignorant of that fact. Now that everything is on MV3 how can people still be claiming that Google killed adblock when Chrome users still have working adblockers?


You don't actually need your own driver, you can just use the CDC device class.


That's true. The only advantage of writing a driver in this case is if I wanted to add functions, such as a programmable level shifter.


You don't necessarily need any sort of electronic counting for quick results. Federal elections in Australia are usually called late on the voting day and I imagine the same is true for other countries that are paper-only.


Same in the UK.

Votes close at 10pm. Might be a few stragglers left in the queue, so call it 10:15pm. (Exit poll results are embargoed until 10pm.)

Ballot boxes are transferred from individual polling station to the location of the count. The postal votes have been pre-checked (but the actual ballot envelope has not been opened or counted) and are there to be counted alongside the ballots from the polling stations.

Then a small army of vote counters go through the ballots and count them and stack together ballots by vote. There are observers - both independent and appointed by the candidates. The returning officer counts the batches up, adjudicates any unclear or challenged ballot, then declares the result.

The early results come out usually about 1 or 2. The bulk of the results come out about 4 or 5. Some constituencies might take a bit longer - it's a lot less effort to get ballot boxes a mile or two down the road in a city centre constituency than getting them from Scottish islands etc. - but it'll be clear who has the majority by 6 or 7 the next day.

I can appreciate that the US is significantly larger than the UK, but pencil-and-paper voting with prompt manual counts is eminently possible.


Oh but you see in America, it takes us more than three weeks to count ballots.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: