I have such a deep need for something that's just a step above semantic search. These non-frontier models running blazingly fast can solve that.
So many problems simply don't require a full LLM, but more than traditional software. Training a novel model isn't really a compelling argument at most tech startups right now, so you need to find an LLM-native way to do things.
I agree. I also don't really have an ambient assistant problem. My phone is always nearby and Siri picks up wake words well (or I just hold the powerbutton).
My problem is Siri doesn't do any of this stuff well. I'd really love to just get it out of the way so someone can build it better.
Some of the more magical moments we’ve had with Juno is automatic shopping list creation saying “oh no we are out of milk and eggs” out loud without having to remember to tell Siri becomes a shopping list and event tracking around kids “Don’t forget next Thursday is early pickup”. A nice freebie is moving the wake word to the end. “What’s weather Juno today?” becomes much more natural than a prefixed wake word.
Honestly this has been my main issue with the tech privacy issue for years.
I love smart gadgets. I really wanted to go all in and automate my life, and the whole 'personal data' thing seemed like a really fair trade off for what was promised.
Only, they took all the data and never really delivered the convenience.
I spent about 10 years trying to figure out why WearOS needed to collect all my data, all the time, even when I wasn't wearing a watch, and yet when it crashed every few weeks, there was no way to restore anything from a backup. Had to start again from scratch every time (or ADB). What's the point in collecting all that data when I couldn't usefully access any of it?
Same thing with Google home, more or less. I wasn't a big fan of the terms and conditions, but hey, it's super convenient just being able to announce 'ok Google I need to get out of bed soon' and have it turn on the lights, play music etc.
Only, some mornings it wouldn't do that. Wouldn't even remember that I'd set an alarm. And alarms kinda need to be reliable: if they work 19 times out of 20, that's not actually good enough to rely on. Dumb alarm clocks, or phones, you can be pretty sure the alarm will go off
So, not much point using Google for morning routines and alarms. So, not much point giving it full access to everything I say any time.
I would give it all my data if it could reliably remember to play preset alarms, or give a basic backup and restore option. Hell, I'd probably give Google access to all my photos if the UI wasn't so ugly.
I still don't really understand big techs reasoning here.
If data is the new gold and everyone was dying for more ways to track odds us all and harvest our data - why not just build a decent product?
If phone batteries lasted for days, people would spend more time on their phones, isn't that what the tech companies want?
If competent people worked on making Gmail efficient, light, user friendly, and not crawling with bugs more people would use it, and more data.
It's like the oligarchs trying to take over the world will do literally anything, anything to win, other than paying people to develop decent, reliable products
When I use Claude daily (both professionally and personally with a Max subscription), there are things that it does differently between 4.5 and 4.6. It's hard to point to any single conversation, but in aggregate I'm finding that certain tasks don't go as smoothly as they used to. In my view, Opus 4.6 is a lot better at long standing conversations (which has value), but does worse with critical details within smaller conversations.
A few things I've noticed:
* 4.6 doesn't look at certain files that it use to
* 4.6 tends to jump into writing code before it's fully understood the problem (annoying but promptable)
* 4.6 is less likely to do research, write to artifacts, or make external tool calls unless you specifically ask it to
* 4.6 is much more likely to ask annoying (blocking) questions that it can reasonably figure out on it's own
* 4.6 is much more likely to miss a critical detail in a planning document after being explicitly told to plan for that detail
* 4.6 needs to more proactively write its memories to file within a conversation to avoid going off track
* 4.6 is a lot worse about demonstrating critical details. I'm so tired of it explaining something conceptually without it thinking about how it implements details.
Just hit a situation where 4.6 is driving me crazy.
I'm working through a refactor and I explicitly told it to use a block (as in Ruby Blocks) and it completely overlooked that. Totally missed it as something I asked it to do.
It's easy to output LLM junk, but I and my colleagues are doing a lot of incredible work that simply isn't possible without LLMs involved. I'm not talking a 10 turn chat to whip out some junk. I'm talking deep research and thinking with Opus to develop ideas. Chats where you've pressure tested every angle, backed it up with data pulled in from a dozen different places, and have intentionally guided it towards an outcome. Opus can take these wildly complex ideas and distill them down into tangible, organized artifacts. It can tune all of that writing to your audience, so they read it in terms they're familiar with.
Reading it isn't the most fun, but let's face it - most professional reading isn't the most fun. You're probably skimming most of the content anyways.
Our customers don't care how we communicate internally. They don't care if we waste a bunch of our time rewriting perfectly suitable AI content. They care that we move quickly on solving their problems - AI let's us do that.
> Reading it isn't the most fun, but let's face it - most professional reading isn't the most fun. You're probably skimming most of the content anyways.
I find it difficult to skim AI writing. It's persuasive even when there's minimal data. It'll infer or connect things that flow nice, but simply don't make sense.
I don't really understand this retort. I assume most of us work in a professional environment where it's difficult, if not impossible, to share our work.
We've been discussing these types of anecdotes with code patterns, management practices, communication styles, pretty much anything professionally for years. Why are the LLM conversations held to this standard?
Well, because I've worked in different places, and with different organizations, and can see for myself how different approaches to professional conduct manifest in the finished product, or the flexibility of the team, effectiveness of communication, etc.
Especially with things like code and writing, I assess the artifacts: software and prose. These stories of incredibly facility of LLMs on code and writing are never accompanied by artifacts that back up these claims. The ones that I can assess don't meet the bar that is being claimed. So everyone who has it working well is keeping it to themselves, and only those with bad-to-mediocre output are publishing them, I am meant to believe? I can't rule it out entirely of course, but I am frustrated at the ongoing demands that I maintain credulity.
FWIW I have sat out many other professional organization and software development trends because I wanted to wait and assess for myself their benefits, which then failed to materialize. That is why I hold LLMs to this standard, I hold all tools to this standard: be useful or be dismissed.
It's really interesting that I've only seen a few actual pieces of large-scale LLM output by people boasting about it, and most of them (e.g. the trash fire of a "web browser" by Anthropic) are bad.
Every now and then, I feel like I simply cannot tap the correct keys. Things I do from muscle memory are jumping to the next letter over. This isn't just a temporary problem. It lasts for days/weeks.
Like, give me semantic search that can detect the difference between SSL and TLS without needing to put a full LLM in the loop.
reply