Hacker Newsnew | past | comments | ask | show | jobs | submit | andhuman's commentslogin

One of the more annoying software that does this is the copilot Office 365 on the web. Every time (!) I open it, it shows a popup on how to add files to the context. That itself would be annoying, but it also steals focus! So you would be typing something and suddenly you’re not typing anymore for M$ decided it’s time for a popup. I finally learned to just wait for the pop up and then dismiss it with esc. Ugh!

If you login to the exchange online admin center you first have to complete a short "on-rails-shooter" video game. They constantly shuffle shit around and want to give you a tour via popups about it.

I have the admin accounts for multiple companies, so I have to play the game repeatedly.


MS has never learned to not interrupt the user. This has been a dark pattern for them since their very first window manager.

I built this recently. I used nvidia parakeet as STT, open wake word as the wake word detection, mistral ministral 14b as LLM and pocket tts for tts. Fits snugly in my 16 gb VRAM. Pocket is small and fast and has good enough voice cloning. I first used the chatterbox turbo model, which perform better and even supported some simple paralinguistic word like (chuckle) that made it more fun, but it was just a bit too big for my rig.

OP asked:

> Is anyone doing true end-to-end speech models locally (streaming audio out), or is the SOTA still “streaming ASR + LLM + streaming TTS” glued together?

Your setup is the latter, not the former.


It’s uncanny good. I prefer it to pocket, but then again pocket is much smaller and for realtime streaming.

Ah right I guess I meant for instant which I assume qwen can't do

Gave it four of my vibe questions around general knowledge and it didn’t do great. Maybe expected with a model as small as this one. Once support in llama.cpp is out I will take it for a spin.

I’ve tried the voice clinking and it works great. I added a 9s clip and it captured the speaker pretty well.

But don’t do the fake mistake I did and use a hf token that doesn’t have access to read from repos! The error message said that I had to request access to the repo, but I’ve had already done that, so I couldn’t figure out what was wrong. Turns out my HF token only had access to inference.


I’ve recently bought the LG with 4th generation OLED, and for me that works for long coding sessions (I use it for work). They shifted or did something with the pixel arrangement for this generation just for text legibility.


Interesting experiment. I would hazard to guess that Google is on top when it comes to these sorts of things (spatial ability), then OpenAI and last Anthropic. I would like to see the same experiment using Google’s Live view or whatever it’s called in their Gemini App.


The new LG panels are bright enough. I think they’re called 4th generation WOLED.


330 nits in SDR is good relative to other OLED monitors and good enough for most indoor environments but not good enough for my indoor environment. Windows are too big and not tinted, just too much ambient light for anything below 500 nits.


This is big. The first really big open weights model that understands images.


How is this different from Llama 3.2 "vision capabilities"?

https://www.llama.com/docs/how-to-guides/vision-capabilities...


Guessing GP commenter considers Apache more "open" than Meta's license. Which to be fair isn't terrible but also not quite as clean as straight apache


Llama's license explicitly disallows its usage in the EU.

If that doesn't even meet the threshold for "terrible", then what does?


Why does it disallow usage in the EU?


You'd have to ask EU's regulators why they wanted Meta to disallow it.

Much like you'd have to ask UK lawmakers why they wanted UK citizens to be unable to keep their own Apple iCloud backups secure.


I bet it’s DNS.


“ Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.

This message was last updated at 16:35 UTC on 29 October 2025”


That was my bet too, then I looked at ISC and noticed there were PoCs released for critical BIND9 vulns yesterday ... might be related?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: