Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s interesting to see a Chinese LLM like DeepSeek enter the global stage, particularly given the backdrop of concerns over data security with other Chinese-owned platforms, like TikTok. The key question here is: if DeepSeek becomes widely adopted, will we see a similar wave of scrutiny over data privacy?

With TikTok, concerns arose partly because of its reach and the vast amount of personal information it collects. An LLM like DeepSeek would arguably have even more potential to gather sensitive data, especially as these models can learn from and remember interaction patterns, potentially accessing or “training” on sensitive information users might input without thinking.

The challenge is that we’re not yet certain how much data DeepSeek would retain and where it would be stored. For countries already wary of data leaving their borders or being accessible to foreign governments, we could see restrictions or monitoring mechanisms placed on similar LLMs—especially if companies start using these models in environments where proprietary information is involved.

In short, if DeepSeek or similar Chinese LLMs gain traction, it’s quite likely they’ll face the same level of scrutiny (or more) that we’ve seen with apps like TikTok.



An open source LLM that is being used for inference can't "learn from or remember" interaction patterns. It can operate on what's in the context window, and that's it.

As long as the actual packaging is just the model, this is an invalid concern.

Now, of course, if you do inference on anyone else's infrastructure, there's always the concern that they may retain your inputs.


You can run the model yourself, but I wouldn't be surprised if a lot of people prefer the pay-as-you-go cloud offering over spinning up servers with 8 high-end GPUs. It's fair to caution that doing might be handing over your data to China.


In the same way, using ChatGPT is handing your data over to America, and using Claude is handing your data over to Europe.


Claude is from the American company Anthropic, maybe you meant mistral?


You can just spin up those servers on a Western provider.


It's usually wildly uneconomical to serve such large models yourself unless you're serving a massive amount of users that you can saturate your hardware. Thus most people will opt for hosted models, and most of the big ones will collect your data for future AI training in exchange for a discounted or free service.


Is ChatGPT posting on HN spreading open model FUD!?

> especially as these models can learn from and remember interaction patterns

All joking aside, I'm pretty sure they can't. Sure the hosted service can collect input / output and do nefarious things with it, but the model itself is just a model.

Plus it's open source, you can run it yourself somewhere. For example, I run deepseek-coder-v2:16b with ollama + Continue for tab completion. It's decent quality and I get 70-100 tokens/s.


What hardware are you running this on? I’m interested in trying out local models for programming, and need some pointers on hardware


For most of the world this is a good argument for being cautious of using US-based AI services (and closed-models) as well.

As someone living in America's Hat, without any protections from PRISM-like programs, and who can't even reach DeepSeek without hopping through the US, it's probably less risky for me to use Chinese LLM services.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: