Agreed. I've never seen a concrete answer with an outcome that can be explained ...

jdross · 2026-01-06T04:40:36 1767674436

I work in insurance - regulated, human capital heavy, etc.

Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.

We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document

hattmall · 2026-01-06T04:57:38 1767675458

Thanks for that. It's a really interesting data point. My takeaway, which I've already felt and I feel like anyone dealing with insurance would anyway, is that the industry is wildly outdated. Which I guess offers a lot of low hanging fruit where AI could be useful. Other than the email drafting, it really seems like all of that should have been handled by just normal software decades ago.

mjevans · 2026-01-06T05:33:16 1767677596

A big win for 'normal software' here is to have authentication as a multi-party/agent approval process. Have the client of the insurance company request the automated delivery of certified documents to some other company's email.

ThrowawayTestr · 2026-01-06T06:31:46 1767681106

>our policy agent extracts all coverage limits and policy details into a data ontology

Aren't you worried about the agent missing or hallucinating policy details?

tonyedgecombe · 2026-01-06T07:02:54 1767682974

Management has decreed that won't happen so it won't.

senko · 2026-01-06T07:33:10 1767684790

What an uncharitable and nasty comment for something they clearly addressed in theirs:

> It is more accurate and consistent than our humans.

So, errors can clearly happen, but they happen less often than they used to.

> It will draft a reply or an email

"draft" clearly implies a human will will double-check.

ptx · 2026-01-06T11:28:26 1767698906

> "draft" clearly implies a human will will double-check.

The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?

senko · 2026-01-06T15:39:15 1767713955

> the whole point was to free the human from reading all the details and relevant context about the case

That's your assumption.

My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.

It doesn't need to be all or nothing.

phantasmish · 2026-01-06T14:09:02 1767708542

The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.

On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.

pwagland · 2026-01-08T10:32:16 1767868336

The real question is how do you enforce that the human is reviewing and double-checking?

When the AI gets "good enough", and the review becomes largely rubber stamping, and 50% is pretty close to that, then you run the risk that a good percentage of the reviews are approved without real checks.

This is why nuclear operators and security scanning operators have regular "awareness checks". Is something like this also being done, and if so what is the failure rate of these checks?

JTbane · 2026-01-06T14:49:56 1767710996

I think it's a good comment, given that the best agents seem to hallucinate something like 10% on a simple task and more than 70% on complex ones.

tonyedgecombe · 2026-01-06T11:02:37 1767697357

>So, errors can clearly happen, but they happen less often than they used to.

If you take the comment at face value. I'm sorry but I've been around this industry long enough to be sceptical of self serving statements like these.

>"draft" clearly implies a human will will double-check.

I'm even more sceptical of that working in practice.

jryb · 2026-01-08T12:59:52 1767877192

Years ago I worked at an insurance company where the whole job was doing this - essentially reading through long PDFs with mostly unrelated information and extracting 3-4 numbers of interest. It paid terrible and few people who worked there cared about doing a good job. I’m sure mistakes were constantly being made.

potamic · 2026-01-06T05:41:49 1767678109

> our policy agent extracts all coverage limits and policy details into a data ontology.

Are they using some software for this or was this built in-house?

throaway45425 · 2026-01-06T12:20:59 1767702059

I think we are the stage of the "AI Bubble" that is equivalent to saying it is 1997, 18% of U.S. households have internet access. Obviously, the internet is not working out or 90%+ of households would have internet access if it was going to be as big of deal as some claim.

I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.

The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.

It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.

"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".

stefan_ · 2026-01-06T10:15:52 1767694552

That sounds a lot like "LLMs are finally powerful enough technology to overcome our paper/PDF-based business". Solving problems that frankly had no business existing in 2020.

heyitsguay · 2026-01-06T05:59:56 1767679196

Thanks for this answer! I appreciate the clarity, I can see the economic impact for your company. Very cool.

linkjuice4all · 2026-01-06T06:32:13 1767681133

Here's some anecdata from the B2B SaaS company I work at

- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work

- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it

- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.

- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production

---

All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.

procaryote · 2026-01-06T08:20:29 1767687629

This makes a lot of sense and is consistent with the lens that LLMs are essentially better autocomplete

moron4hire · 2026-01-06T04:27:39 1767673659

Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.

If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.

wiseowise · 2026-01-06T10:02:30 1767693750

It’s the honeymoon period with crack all over again. Everyone feels great until their teeth start falling out.