More

jlcummings · 2025-08-15T09:01:55 1755248515

Being effective with llm agents requires not just the ability to code or to appreciate nuance with libraries or business rules but to have the ability and proclivity of pedantry. Dad-splain everything always.

And to have boundless contextual awareness… dig a rabbit hole, but beware that you are in your own hole. At this point you can escape the hole but you have to be purposefully aware of what guardrails and ladders you give the agent to evoke action.

The better, more explicit guardrails you provide the more likely the agent is able to do what is expected and honor the scope and context you establish. If you tell it to use silverware to eat, be assured it doesn’t mean to use it appropriately or idiomatically and it will try eating soup with a fork.

Lastly don’t be afraid of commits and checkpoints, or to reject/rollback proposed changes and restate or reset the context. The agent might be the leading actor, but you are the director. When a scene doesn’t play out, try it again after clarification or changing camera perspective or lighting or lines, or cut/replace the scene entirely.

cmsj · 2025-08-15T10:38:11 1755254291

I find that level of pedantry and hand-holding, to be extremely tedious and I frequently find myself just thinking fuck it, I'll write it myself and get what I want the first time.

skydhash · 2025-08-15T11:32:10 1755257530

This. That’s why every programmer strive for a good architecture and write tests. When you have that and all your bug fixes and feature request are only a small amount of lines, that is pure bliss. Even if it requires hours of reading and designing. Anything is better than dumping lot of lines.

dingi · 2025-08-17T11:36:15 1755430575

Why would anyone bother at this point though? Tedious handholding and extra effort for code reviews. Just write the damn thing yourself.

etherealG · 2025-08-20T09:28:01 1755682081

Because once you figure out the correct way to handhold, you can automate it and the tediousness goes away.

It’s only tedious once per codebase or task, then you find the less tedious recipe and you’re done.

You can even get others to do the tedious part at their layer of abstraction so that you don’t have to anymore. Same as compilers, cpu design, or any other pet of the stack lower than the one you’re using.

jlcummings · 2025-06-19T21:18:58 1750367938

Skopeo lets you work with remote registries and local images without a docker/podman/etc daemon.

We use to ‘clone’ across deployment environments and across providers outside of the build pipeline as an adhoc job.

jlcummings · on Jan 17, 2025

It’s weird how normalized over cap billing became acceptable simply because chargeable metrics were not collected/resolved until after the fact. Seems like an obvious gap in the process or a little bit shady.

jlcummings · on Jan 17, 2025

Exactly. Why the fixation on one strategy for handling this not so uncommon scenario. It is so common that handling it should be defacto.

This isn’t a pre-paid gas pump use, but that could be one way to present it. We all want to fill as fast as possible. And if your fill spout can handle top rates, you get top fill rates, until you close in on the hard limit. Then it trickles down to the metered drop. Then stops precisely where it needs to.

By accepting/requesting a hard cap, the provider can make clear that in order to be precise, soft caps will go into affect earlier and induce progressive throttling where applicable. If the throttle doesn’t catch the final milliliter or two of gasoline, before the pump shuts off, the provider can and should just let it go. It’s a loss, but comparatively a figurative drop in the bucket.

The other obvious route is predictive where prior usage guide the guardrails. Ordering two eggs is typical for a single meal. Ordering twelve is not. Ordering three or four is unusual for most but if you are a regular diner your habits will be observable.

Any of this predicated on the provider to want to do something. They seem to lack incentives at this point for making it easy. It is stories like op that I avoid well known problematic providers like Firebase who don’t respect and foster long term relationships.

jlcummings · on Aug 29, 2021

Perhaps the No true Scotsman fallacy?

https://en.m.wikipedia.org/wiki/No_true_Scotsman

jlcummings · on March 29, 2021

Better wouldn’t necessarily be the right qualifier, but faster, typically more repeatable, and greatly more economical with scale/workload would certainly fit as better from different vantages.

With high novelty? Probably not until machine learning and compilers are deeply entangled.

slaymaker1907 · on March 29, 2021

That very much depends on the code. Program synthesis is an active area of research and the programs found would often be very difficult for a human to figure out. Of course a sufficiently determined human can always do whatever these programs do, but I do think it is unfair to give an unlimited amount of time to human optimizers.

jlcummings · on Nov 9, 2020

Likewise, the trivial or novel that you borrow for free isn’t really free when you need to use it suddenly in ways that the license doesn’t permit or it is just technically inconvenient. It is sort of like leasing vs buying, but not really a good analogy.

jlcummings · on Nov 2, 2020

So far I haven’t seen a comment point this out or suggest similar, so let’s say that instead of trying to maintain an application level list of email addresses that is used in a breach (or for other reasons), rely on the exercising service (email) which by formerly sending a verification email, has a record of the destination at least in a log, and maybe during registration placed in a “verified member” list, all more or less managed within the mail service.

jlcummings · on July 2, 2020

The B52 first was first rolled out for production use on 18 March 1954. The "...long-rifle of the air age..." -- Nathan Twining. It has not been manufactured since 1962. They are still in active military service, the ones that remain, and continue to see upgrades and evolving mission scope. 100 years of service is feasible.

Cost per flying hour of a modern B52: $70,000 [1]

1. https://www.airforcemag.com/article/re-engining-the-b-52/

gsnedders · on July 2, 2020

They also spend comparatively little time in the air. Compared with passenger aircraft where their return-on-investment strongly incentivises them being airborne as much as possible, military aircraft spend _a lot_ of time sitting around, either unused or in maintenance.

imtringued · on July 2, 2020

It's very easy to upgrade a strategic bomber. All strategic bombers are prone to interception so the only way to stand out is by having a bigger and modern payload. You can see the same development with tank cannons. Bigger calibers aren't necessary. The Rheinmetall 120mm cannon can be upgraded by simply shooting modern munition.

jlcummings · on July 1, 2020

The advertisers (?) are different, but more than 35 (!) years ago, I remember the same format, just different sponsors for different programming.