I work in insurance - regulated, human capital heavy, etc.
Three examples for you:
- our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans
- our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients.
- our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.
We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document
Thanks for that. It's a really interesting data point. My takeaway, which I've already felt and I feel like anyone dealing with insurance would anyway, is that the industry is wildly outdated. Which I guess offers a lot of low hanging fruit where AI could be useful. Other than the email drafting, it really seems like all of that should have been handled by just normal software decades ago.
A big win for 'normal software' here is to have authentication as a multi-party/agent approval process. Have the client of the insurance company request the automated delivery of certified documents to some other company's email.
> "draft" clearly implies a human will will double-check.
The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?
> the whole point was to free the human from reading all the details and relevant context about the case
That's your assumption.
My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.
The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.
On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.
The real question is how do you enforce that the human is reviewing and double-checking?
When the AI gets "good enough", and the review becomes largely rubber stamping, and 50% is pretty close to that, then you run the risk that a good percentage of the reviews are approved without real checks.
This is why nuclear operators and security scanning operators have regular "awareness checks". Is something like this also being done, and if so what is the failure rate of these checks?
Years ago I worked at an insurance company where the whole job was doing this - essentially reading through long PDFs with mostly unrelated information and extracting 3-4 numbers of interest. It paid terrible and few people who worked there cared about doing a good job. I’m sure mistakes were constantly being made.
I think we are the stage of the "AI Bubble" that is equivalent to saying it is 1997, 18% of U.S. households have internet access. Obviously, the internet is not working out or 90%+ of households would have internet access if it was going to be as big of deal as some claim.
I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.
The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.
It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.
"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".
That sounds a lot like "LLMs are finally powerful enough technology to overcome our paper/PDF-based business". Solving problems that frankly had no business existing in 2020.
Here's some anecdata from the B2B SaaS company I work at
- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work
- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it
- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.
- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production
---
All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.
Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.
If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.