More

potatoman22 · 2025-12-10T22:34:37 1765406077

From what I can tell, their official chat site doesn't have a native audio -> audio model yet. I like to test this through homophones (e.g. record and record) and asking it to change its pitch or produce sounds.

dragonwriter · 2025-12-11T03:59:16 1765425556

“record and record”, if you mean the verb for persisting something and the noun for the thing persisted, are heteronyms (homographs which are not homophones), which incidentally is also what you would probably want to test what you are talking about here (distinguishing homophones would test use of context to understand meaning, but wouldn’t test anything about whether or not logic was working directly on audio or only working on text processed from audio, failing to distinguish heteronyms is suggestive of processing occurring on text, not audio directly.)

bakeman · 2025-12-11T04:21:00 1765426860

There are homophones of “record”, such as:

“He’s on record saying he broke the record for spinning a record.”

dragonwriter · 2025-12-11T05:13:01 1765429981

True.

OTOH my point that the thing being suggested to be tested is not testable by seeing whether or not the system is capable of distinguishing homophones, but might be by seeing whether or not it distingishes heteronyms still stands. (The speculation that the record/record distinction intended was one that is actually a pair of heteronyms and that the error was merely the use of the word “homophone" in place of “heteronym”, rather than the basic logic of the comment is somewhat tangential to the main point.)

sosodev · 2025-12-10T23:58:05 1765411085

Huh, you're right. I tried your test and it clearly can't understand the difference between homophones. That seems to imply they're using some sort of TTS mechanism. Which is really weird because Qwen3-Omni claims to support direct audio input into the model. Maybe it's a cost saving measure?

sosodev · 2025-12-11T05:19:41 1765430381

Weirdly, I just tried it again and it seems to understand the difference between record and record just fine. Perhaps if there's heavy demand for voice chat, like after a new release, they load shed by using TTS to a smaller model.

However, It still doesn't seem capable of producing any of the sounds, like laughter, that I would expect from a native voice model.

djtango · 2025-12-11T03:53:37 1765425217

Is record a homophone? At least in the UK we use different pronunciations for the meanings. Re-cord for the verb, rec-ord for the noun.

potatoman22 · 2025-11-26T21:48:13 1764193693

I'd guess S&box is more an extension of Garry's Mod rather than a reaction to Unity

potatoman22 · 2025-10-22T05:23:00 1761110580

After the Roman Republic, they switched to having an emperor. Jesus was crucified during this Roman empire. The kings of Rome were around 600 years before this. They meant the emperor, not the king.

potatoman22 · 2025-10-10T19:21:11 1760124071

You got me into web dev. Thank you!

potatoman22 · 2025-10-07T05:44:00 1759815840

I'm currently making a tycoon game with React, it's not bad for making some games. I use setInterval for a simple game loop along with a zustand store for the game logic. I'm keeping the game logic and state client-side for now, but I might move it over to a server in the future.

fainpul · 2025-10-07T06:25:00 1759818300

Just a note for those planning to make a simple game or animation in JavaScript: in most cases it's preferrable to use `requestAnimationFrame` instead of `setInterval` or `setTimeout`.

https://stackoverflow.com/a/38709924

potatoman22 · 2025-08-27T04:13:44 1756268024

The SpaCy creator has a good blog post on this https://explosion.ai/blog/against-llm-maximalism

btown · 2025-08-27T05:07:08 1756271228

I'd go a step beyond this (excellent) post and posit that one incredibly valuable characteristic of traditional NLP is that it is largely immune to prompt injection attacks.

Especially as LLMs continue to be better tuned to follow instructions that are intentionally colocated and intermingled with data in user messages, it becomes difficult to build systems that can provide real guarantees that "we'll follow your prompt, but not prompts that are in the data you provided."

But no amount of text appended to an input document, no matter how persuasive, can cause an NLP pipeline to change how it interprets the remainder of the document, or to leak its own system instructions, or anything of that nature. "Ignore the above prompt" is just a sentence that doesn't seem like positive or on-topic sentiment to an NLP classifier, and that's it.

There's an even broader discussion to be had about the relative reliability of NLP pipelines, outside of a security perspective. As always, it's important to pick the right tools for the job, and the SpaCy article linked in the parent puts this quite well.

IanCal · 2025-08-27T11:52:22 1756295542

> But no amount of text appended to an input document, no matter how persuasive, can cause an NLP pipeline to change how it interprets the remainder of the document,

Text added to a document can absolutely change how an NLP pipeline interprets the document.

> "Ignore the above prompt" is just a sentence that doesn't seem like positive or on-topic sentiment to an NLP classifier, and that's it.

And simple repeated words can absolutely make that kind of change for many NLP systems.

Have you actually worked with doing more traditional NLP systems? They're really not smart.

ffsm8 · 2025-08-27T15:22:32 1756308152

> And simple repeated words can absolutely make that kind of change for many NLP systems.

That's not what prompt injection is.

And NLP stands for natural language processing. If the result didn't change after you've made changes to the input... It'd be a bug?

IanCal · 2025-08-27T17:59:19 1756317559

No? But repeated words can impact simple nlp setups. I’m not sure what case you’re concerned about where added text impacts classification with an LLM but added words shouldn’t with a different pipeline.

> And NLP stands for natural language processing. If the result didn't change after you've made changes to the input... It'd be a bug?

No, I’d want my classifier to be unchanged by garbage words added. It likely will be, but that impact is a bug not a feature.

ffsm8 · 2025-08-28T11:27:31 1756380451

Prompt injection is about making the model do something else then specified.

Adding words to the text to break the algorithm which does the NLP is more along the lines of providing 1 in a boolean field to break the system. And that's generally something you can mitigate to some degree via heuristics and sanity checking. Doing the same for LLMs is essentially impossible, because it's an effective black box, so you cannot determine the error scenarios and add some mitigations

IanCal · 2025-08-30T22:32:14 1756593134

If you don’t think this happens for simpler methods you’ve never deployed them. It’s the exact same problem on a classifier. Have you actually worked with these and are we discussing real world cases?

mfalcon · 2025-08-27T10:53:26 1756292006

I guess it depends on how you use the LLMs. We implemented some workflows where the LLMs were used only for dialogue understanding, then the system response was generated by classic backend code.

ACCount37 · 2025-08-27T14:48:37 1756306117

If that's an issue for you, you do the year 2018 thing and just train classification heads for a base model LLM.

No instruct tuning means prompt injection is curbed. Classification heads means you get results off a single forward pass.

Xmd5a · 2025-08-27T08:00:58 1756281658

https://www.quantamagazine.org/when-chatgpt-broke-an-entire-...

potatoman22 · 2025-08-24T20:26:53 1756067213

AI doing playtests is an idea I've been thinking about too. The question I can't quite answer is: how do you know the AI play-tester can predict what users find fun? How well does it represent the different kinds of users?

potatoman22 · 2025-08-19T21:51:38 1755640298

Evidence for safe and fair AI systems is possible as long as you define what "safe" and "fair" mean for your usecase. Fairness might look like "no cohort has >5% higher false positive rate than another" and safety might mean "the model must have a false negative rate of less than 15%". Safety more so encompasses the workflows around the model, including human intervention, auditing, monitoring, etc.

Here's a good overview of fairness: https://learn.microsoft.com/en-us/azure/machine-learning/con... and there's plenty of papers discussing how to safely use predictive analytics and AI in healthcare.

I don't know if this product can give proof for safe and fair ML systems, but it's not impossible to use these things safely and fairly.

potatoman22 · 2025-08-19T15:25:56 1755617156

This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? It’s hard to imagine it’s safe to entrust model validation and bias assessments to an automated system, especially in healthcare. Validating clinical AI is pretty complex between finding the right data, ensure event timings are accurate, simulating the model, etc. That’s why I’m guessing Parachute is a little less automated than the landing page makes it out to be, which is maybe a good thing. Regardless, this is cool. Hope you make AI in healthcare more safe.

ariavikram · 2025-08-20T01:19:27 1755652767

That’s a great point. We don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.

jstummbillig · 2025-08-19T17:27:43 1755624463

This line of thinking always leaves me confused about other peoples experience in the Pre-AI world. People and systems around me fail all the time because evaluation fails. Yes, the failure modes are different but I don't consider them favorable without AI. In fact, I consider that they are better.

For example, consider what happens in this video: https://www.youtube.com/watch?v=AZhCYisIQB8&t=2s

Please don't make this mistake of thinking "aha, but you see, a human intervened!" This will never happen in the real world for the vast majority of humans in a similar scenario.

potatoman22 · 2025-08-19T18:47:03 1755629223

I'm afraid I don't quite understand your point. What line of thinking are you referencing? Also risk scores and algorithms have been used in medicine for over 50 years, so evaluating them isn't anything new.

padolsey · 2025-08-20T05:24:32 1755667472

> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents?

Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.

siva7 · 2025-08-19T16:27:07 1755620827

I wouldn't touch a YC company for this use case. All the marketing from the landing pages is just that - blabla.

znxnnxnx · 2025-08-19T16:10:10 1755619810

My friend, you are overthinking! The funding round just came in and its smooth sailing ahead. Well for the next six months.

fehudakjf · 2025-08-19T17:05:16 1755623116

"mummmble murmble ummbble... but that can|will be easily fixed|addressed|solved by future models"

potatoman22 · 2025-06-14T15:03:14 1749913394

Is anyone else sick of hearing "AI agent"? I can't fully explain the feeling, but it's nauseating. Y Combinator is probably the worst culprit in using it, especially on their YouTube channel.

owebmaster · 2025-06-14T15:05:42 1749913542

Nope. It is the same as getting sick of hearing "new JS framework" before there was a real new JS framework fatigue. Or: get used, it's just the beginning.