More

kabirgoel · 2025-12-02T21:54:50 1764712490

My team has been using it in prod for about a year now. There were some minor bugs in the runtime's implementation of buffers in 1.22 (?), but that was about the only issue we ran into.

The nice things:

1. It's fast.

2. The standard library is great. (This may be less of an advantage over Deno.)

3. There's a ton of momentum behind it.

4. It's closer to Node.js than Deno is, at least last I tried. There were a bunch of little Node <> Deno papercuts. For example, Deno wanted .ts extensions on all imports.

5. I don't have to think about JSR.

The warts:

1. The package manager has some issues that make it hard for us to use. I've forgotten why now, but this in particular bit us in the ass: https://github.com/oven-sh/bun/issues/6608. We use PNPM and are very happy with it, even if it's not as fast as Bun's package manager.

Overall, Deno felt to me like they were building a parallel ecosystem that I don't have a ton of conviction in, while Bun feels focused on meeting me where I am.

kabirgoel · 2025-05-05T22:44:27 1746485067

This is great. Poking into the source, I find it interesting that the author implemented a custom turn detection strategy, instead of using Silero VAD (which is standard in the voice agents space). I’m very curious why they did it this way and what benefits they observed.

For folks that are curious about the state of the voice agents space, Daily (the WebRTC company) has a great guide [1], as well as an open-source framework that allows you to build AI voice chat similar to OP's with lots of utilities [2].

Disclaimer: I work at Cartesia, which services a lot of these voice agents use cases, and Daily is a friend.

[1]: https://voiceaiandvoiceagents.com [2]: https://docs.pipecat.ai/getting-started/overview

koljab · 2025-05-05T23:46:18 1746488778

It's in fact using Silero via RealtimeSTT. RealtimeSTT tells when silence starts. Then a binary sentence classification model is used on the realtime transcription text which infers blazingly fast (10ms) and returns a probability between 0 and 1 indicating if the current spoken sentence is considered "complete". The turn detection component takes this information to calculate the silence waiting time until "turn is over".

thekaranchawla · 2025-05-06T17:25:29 1746552329

This is the exact strategy I'm using for the real-time voice agent I'm building. Livekit also published a custom turn detection model that works really well based on the video they released, which was cool to see.

Code: https://github.com/livekit/agents/tree/main/livekit-plugins/... Blog: https://blog.livekit.io/using-a-transformer-to-improve-end-o...

kabirgoel · on Dec 21, 2024

Correct. You can also share your location with friends. A lot of friend groups (at least my age) use Find My as a kind of social network.

Nextgrid · on Dec 21, 2024

Does it have any battery impact? I've never tried these always-on location tracking things partly due to (unfounded?) concerns about battery use.

msh · on Dec 21, 2024

its not always on in that way. It will report your location when requested, and optionally just before shutting down.

gomoboo · on Dec 21, 2024

How does that work woth your friends? Always on access or just occasionally?

GeekyBear · on Dec 21, 2024

> Always on access or just occasionally?

You have quite a few granular choices.

> You can share your current location once, temporarily share your location while you're on the way to an expected destination, or share your ongoing Live Location... for an hour, until the end of the day, or indefinitely.

In Messages, you can use Check In to share your location... Your location is shared only if there's an unexpected delay during your trip or activity and you're unresponsive.

https://support.apple.com/en-us/105104

toomuchtodo · on Dec 21, 2024

Always on. You can see where your friends are at both in Find My and under their contact photo in your iMessages chat.

johnisgood · on Dec 21, 2024

Personally I do not find the idea comforting that someone (anyone) may know where I am at all times. I would not even trust Apple either.

proteal · on Dec 21, 2024

This is actually one of the big differences between generations. It’s not just the norm for young people to share locations, but rather almost expected, with real social consequences for not. Yes it’s probably a little weird to have someone’s precise location 100% of the time, but since you’re sharing it with me there’s a good deal of trust implied (though this is not always the case as it has become more normalized). However, if we stop sharing locations, that usually implies a divorce of the relationship. People will shut you out of their life if you stop sharing your location with them, no matter the reason. From that lens, the choice is simple. You’ve gotta share your location, even if it’s a bit icky from a privacy perspective or you risk losing an entire cohort of friends. I will admit, there is a strange level of intimacy for having done it. In a world increasingly dominated by the pixels on this 4x8 screen, it is a nice reminder that the text bubbles on my phone actually come from real people that I can show you on a map.

(Obviously you can find friends who don’t care for it and you can live a normal life and be just fine. I’m privacy conscious but I still share my location with a handful of friends for the above reasons.)

aniviacat · on Dec 21, 2024

> People will shut you out of their life if you stop sharing your location with them

Is the implication of this that such people just don't interact with Android users? That seems like a significant self-imposed limitation. Or are Android phones just extremely unpopular in your area?

proteal · on Dec 21, 2024

Yeah, I switched to an iPhone solely for the blue text bubbles. Among young women in my bubble, 98% have iPhones. I’d get sneers at bars from girls when my first text on their phone was green. People would complain openly about my phone ruining their group chats. While I preferred android tech, switching to iPhone was a no-brainer because it removed a lot of friction in social settings.

47282847 · on Dec 21, 2024

It’s a bit sad that these days I can’t say if you are joking or not.

toomuchtodo · on Dec 21, 2024

https://nypost.com/2024/10/07/lifestyle/are-iphone-users-pet...

sleepybrett · on Dec 21, 2024

you can control who you share your location with and for how long. I think the options are, just once, for an hour, for the day and forever.

rvnx · on Dec 21, 2024

It's a virtual leash for couples.

toomuchtodo · on Dec 21, 2024

Blame the emotionally dysfunctional, not the tool. It’s only a problem if it changes how you would live your life or pressured or coerced (in which case, say no).

haliskerbas · on Dec 21, 2024

Always on, works as a great way to check in on close friends or have them check in on you (like someone going on a first date)

kabirgoel · on Dec 10, 2024

Guessing the shortcomings become starker if you’re spending lots of time in the codebase/building a company on top of it.

penberg · on Dec 10, 2024

Yeah… Attempting to integrate MVCC and then doing vector search gave enough perspective to do this!

akira2501 · on Dec 10, 2024

> building a company on top of it.

So be sure you proceed in such a way that never contributes any money or code back to the original project.

kabirgoel · on Oct 14, 2024

I work at Cartesia, which operates a TTS API similar to Play [1]. I’d be willing to venture a guess and say that our TTS model, Sonic, is probably SoTA for on-device, but don't quote me on that claim. It's the same model that powers our API.

Sonic can be run on a MacBook Pro. Our API sounds better, of course, since that's running the model on GPUs without any special tricks like quantization. But subjectively the on-device version is good quality and real-time, and it possesses all the capabilities of the larger model, such as voice cloning.

Our co-founders did a demo of the on-device capabilities on the No Priors podcast [2], if you're interested in checking it out for yourself. (I will caveat that this sounds quite a bit worse than if you heard it in person today, since this was an early alpha + it's a recording of the output from a MacBook Pro speaker.)

[1] https://cartesia.ai/sonic [2] https://youtu.be/neQbqOhp8w0?si=2n1i432r5fDG2tPO&t=1886

pietz · on Oct 16, 2024

Is your model really open source or did you misunderstand the question?

kabirgoel · on Oct 1, 2024

(Not the author but I work in real-time voice.) WebSockets don't really translate to actual GPU load, since they spend a ton of time idling. So strictly speaking, you don't need a GPU per WebSocket assuming your GPU infra is sufficiently decoupled from your user-facing API code.

That said, a GPU per generation (for some operational definition of "generation") isn't uncommon, but there's a standard bag of tricks, like GPU partitioning and batching, that you can use to maximize throughput.

diggan · on Oct 1, 2024

> that you can use to maximize throughput

While degrading the experience sometimes, little or by a lot, thanks to possible "noisy neighbors". Worth keeping in mind that most things are trade-offs somehow :) Mostly important for "real-time" rather than batched/async stuff of course.

kabirgoel · on Oct 1, 2024

As someone who's attended events run by Daily/Kwindla, I can guarantee that you’ll have fun and leave with your IP rights intact. :) (In fact, I don't even know that they're looking for talent and good ideas... the motivation for organizing these is usually to get people excited about what you're building and create a community you can share things with.)

kabirgoel · on July 9, 2024

Thanks for the shoutout! We're very excited about how this space is evolving and are working on new features and perf improvements to support experiences like this.

kabirgoel · on July 6, 2023

Once you start following people the noise mostly disappears. Think the initial feed is a way for users to bootstrap their follow list.

kabirgoel · on June 15, 2023

Zed has copilot support. I’ve been using it, and it works pretty well.