Hacker Newsnew | past | comments | ask | show | jobs | submit | jlcummings's commentslogin

Being effective with llm agents requires not just the ability to code or to appreciate nuance with libraries or business rules but to have the ability and proclivity of pedantry. Dad-splain everything always.

And to have boundless contextual awareness… dig a rabbit hole, but beware that you are in your own hole. At this point you can escape the hole but you have to be purposefully aware of what guardrails and ladders you give the agent to evoke action.

The better, more explicit guardrails you provide the more likely the agent is able to do what is expected and honor the scope and context you establish. If you tell it to use silverware to eat, be assured it doesn’t mean to use it appropriately or idiomatically and it will try eating soup with a fork.

Lastly don’t be afraid of commits and checkpoints, or to reject/rollback proposed changes and restate or reset the context. The agent might be the leading actor, but you are the director. When a scene doesn’t play out, try it again after clarification or changing camera perspective or lighting or lines, or cut/replace the scene entirely.


I find that level of pedantry and hand-holding, to be extremely tedious and I frequently find myself just thinking fuck it, I'll write it myself and get what I want the first time.


This. That’s why every programmer strive for a good architecture and write tests. When you have that and all your bug fixes and feature request are only a small amount of lines, that is pure bliss. Even if it requires hours of reading and designing. Anything is better than dumping lot of lines.


Why would anyone bother at this point though? Tedious handholding and extra effort for code reviews. Just write the damn thing yourself.


Because once you figure out the correct way to handhold, you can automate it and the tediousness goes away.

It’s only tedious once per codebase or task, then you find the less tedious recipe and you’re done.

You can even get others to do the tedious part at their layer of abstraction so that you don’t have to anymore. Same as compilers, cpu design, or any other pet of the stack lower than the one you’re using.


Skopeo lets you work with remote registries and local images without a docker/podman/etc daemon.

We use to ‘clone’ across deployment environments and across providers outside of the build pipeline as an adhoc job.


It’s weird how normalized over cap billing became acceptable simply because chargeable metrics were not collected/resolved until after the fact. Seems like an obvious gap in the process or a little bit shady.


Exactly. Why the fixation on one strategy for handling this not so uncommon scenario. It is so common that handling it should be defacto.

This isn’t a pre-paid gas pump use, but that could be one way to present it. We all want to fill as fast as possible. And if your fill spout can handle top rates, you get top fill rates, until you close in on the hard limit. Then it trickles down to the metered drop. Then stops precisely where it needs to.

By accepting/requesting a hard cap, the provider can make clear that in order to be precise, soft caps will go into affect earlier and induce progressive throttling where applicable. If the throttle doesn’t catch the final milliliter or two of gasoline, before the pump shuts off, the provider can and should just let it go. It’s a loss, but comparatively a figurative drop in the bucket.

The other obvious route is predictive where prior usage guide the guardrails. Ordering two eggs is typical for a single meal. Ordering twelve is not. Ordering three or four is unusual for most but if you are a regular diner your habits will be observable.

Any of this predicated on the provider to want to do something. They seem to lack incentives at this point for making it easy. It is stories like op that I avoid well known problematic providers like Firebase who don’t respect and foster long term relationships.


Perhaps the No true Scotsman fallacy?

https://en.m.wikipedia.org/wiki/No_true_Scotsman


Better wouldn’t necessarily be the right qualifier, but faster, typically more repeatable, and greatly more economical with scale/workload would certainly fit as better from different vantages.

With high novelty? Probably not until machine learning and compilers are deeply entangled.


That very much depends on the code. Program synthesis is an active area of research and the programs found would often be very difficult for a human to figure out. Of course a sufficiently determined human can always do whatever these programs do, but I do think it is unfair to give an unlimited amount of time to human optimizers.


Likewise, the trivial or novel that you borrow for free isn’t really free when you need to use it suddenly in ways that the license doesn’t permit or it is just technically inconvenient. It is sort of like leasing vs buying, but not really a good analogy.


So far I haven’t seen a comment point this out or suggest similar, so let’s say that instead of trying to maintain an application level list of email addresses that is used in a breach (or for other reasons), rely on the exercising service (email) which by formerly sending a verification email, has a record of the destination at least in a log, and maybe during registration placed in a “verified member” list, all more or less managed within the mail service.


The B52 first was first rolled out for production use on 18 March 1954. The "...long-rifle of the air age..." -- Nathan Twining. It has not been manufactured since 1962. They are still in active military service, the ones that remain, and continue to see upgrades and evolving mission scope. 100 years of service is feasible.

Cost per flying hour of a modern B52: $70,000 [1]

1. https://www.airforcemag.com/article/re-engining-the-b-52/


They also spend comparatively little time in the air. Compared with passenger aircraft where their return-on-investment strongly incentivises them being airborne as much as possible, military aircraft spend _a lot_ of time sitting around, either unused or in maintenance.


It's very easy to upgrade a strategic bomber. All strategic bombers are prone to interception so the only way to stand out is by having a bigger and modern payload. You can see the same development with tank cannons. Bigger calibers aren't necessary. The Rheinmetall 120mm cannon can be upgraded by simply shooting modern munition.


The advertisers (?) are different, but more than 35 (!) years ago, I remember the same format, just different sponsors for different programming.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: