My audio experiment was much less successful — I uploaded a 90-minute podcast ep...

Rudybega · 2025-11-18T20:38:03 1763498283

I wonder if you could get around this with a slightly more sophisticated harness. I suspect you're running into context length issues.

Something like

1.) Split audio into multiple smaller tracks. 2.) Perform first pass audio extraction 3.) Find unique speakers and other potentially helpful information (maybe just a short summary of where the conversation left off) 4.) Seed the next stage with that information (yay multimodality) and generate the audio transcript for it

Obviously it would be ideal if a model could handle the ultra long context conversations by default, but I'd be curious how much error is caused by a lack of general capability vs simple context pollution.

ant6n · 2025-11-18T20:01:43 1763496103

The worst when it fails to eat simple pdf documents and lies and gas lights in an attempt to cover it up. Why not just admit you can’t read the file?

nomel · 2025-11-18T22:43:58 1763505838

This is specifically why I don't use Gemini. The gaslighting is ridiculous.

satvikpendem · 2025-11-19T07:45:57 1763538357

Now try an actual speech model like ElevenLabs or Soniox, not something not made for it.