More

raxxor · on Jan 29, 2025

That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.

Sure, you could say that only running the 600+b model is running "the real thing"...

raxxor · on Jan 29, 2025

You can run the quantized versions of DeepSeek locally with normal hardware just fine, even with very good performance. I have it running just now. With a decent consumer gaming GPU you can already get quite far.

It is quite interesting that this censorship survives quantization, perhaps the larger versions censor even more. But yes, there probably is an extra step that detects "controversial content" and then overwrites the output.

Since the data feeding DeepSeek is public, you can correct the censorship by building your own model. For that you need considerably more compute power though. Still, for the "small man", what they released is quite helpful despite the censorship.

At least you can retrace how it ends up in the model, which isn't true for most other open weight models, that cannot release their training data due to numerous reasons beyond "they don't want to".

raxxor · on Jan 29, 2025

Mastodon mods being that happy to ban people is the reason for me I never even bothered. And their behavior reflects on any instance, technically correct or not.

I can just as well use a Discord channel because economic interests are at least predictable and the same rules apply to anyone more or less. Or I could use a reddit sub with my political alignment because mods there are equally ban happy as well.

I would take a lot to convince me that members on prominent Mastodon instances are curious about other opinions, but something in the larger picture just doesn't add up.

That is fine though, but Mastodon currently cannot be a place for everyone, regardless of technical possibilities. And it should be said that a lot of Mastodon users were involved in the witch hunt against Stallman, so at least some of the prominent users seem to be toxic.

zoobab · on Jan 29, 2025

Nostr to the rescue?

nout · on Jan 29, 2025

Mastodon is "choose your dictator" type of setup. On nostr you can't technically be banned, but it attracts a lot of freedom focused people and bitcoiners.

raxxor · on Jan 29, 2025

Surprised they didn't rename it to "Golf Of Peace And Freedom And Cuba".

raxxor · on Jan 29, 2025

> Your idea of a perfect date is explaining why everyone should run their own email server

That is the price for living in a better world!

I have a young account because I forgot the pwd of my old one and that is probably because for my old account it says...

> You probably use a flip phone and tin foil hat to avoid big tech surveillance

...I didn't provide a mail info for HN, famously part of big tech.

raxxor · on Jan 28, 2025

The Chinese company also released the means to correct this though.

raxxor · on Jan 28, 2025

The CEO did gave a statement about their motivation. Could be a lie, but he delivered and it is also vastly more sensible that what we often hear from other companies. Google and Meta are an exception for this space though.

Also, because not only the weights, but also the data is open, any propaganda can be identified and corrected. This is not the case for other models and what we have seen from Gemini, there certainly are "adaptations". I don't think Google had ill intent here, but this would fit what some would classify as propaganda.

raxxor · on Jan 28, 2025

Which is ironic because Google needs to improve their reputation about sunsetting early. This is one of the main arguments for why many businesses for why they do not employ their alternatives

raxxor · on Jan 28, 2025

Or any leading CEO in recent times. Could of course be the usual deceit, but at least in this case he already delivered.

All I heard from OpenAI was that we need regulation which maybe happen to fit their business interest.

raxxor · on Jan 28, 2025

Oh yes, I am firmly on Team China here because US companies got too greedy. Meta is an exception here though and they also propelled AI development massively.

DeepSeek is awesome. Any AI task yet implemented in our business can be run from my local PC with just the smaller models. And my PC is fairly crappy to begin with.

OpenAI looks quite silly with their "we have to close everything".

manmal · on Jan 28, 2025

Can you elaborate which models you are using? I‘m running an R1 distilled Qwen coder with 32B Q4, and while it’s giving useful answers, it‘s quite slow on my M1 Max. Slow enough that I keep reaching for cloud models.

raxxor · on Jan 28, 2025

Not on my machine currently, I use the 14b Q4 model I think, which delivers very good answers. I run a 4060 with 16gb memory and performance is quite good. I used the largest model that was recommended with this amount of VRAM, I think it was the 14b one.

I do have some applications that process images, text and pdf files and I use smaller models for extracting embeddings. I think my system wouldn't be able to handle it with decent speed otherwise.

I do run LLM on a M1 16gb macbook air and performance is surprisingly good. Not for image synthesis though and a PC with a dedicated GPU is still significantly faster with LLM responses as well. Haven't tried to run deepseek on the macbook yet.

manmal · on Jan 30, 2025

Interesting, I didn’t like the quality of the output of the 14B models. Could be the quantization though, apparently some are a bit broken.

nejsjsjsbsb · on Jan 28, 2025

I'm on team open source. To me the exciting thing was ollama downloading the 7B and running it on a 5yo cheap lonovo and getting a token rate similar to the first release of ChatGPT.

Running local on CPU opens so much possibilities for smart and privacy focused home devices that serve you.

In my test it hallucinated confidently but my interest is in simple second brain like rag. "Hey thingy, what is my schedule today?"

Need it to be a bit faster though as the thinking part adds a lot of latency.

raxxor · on Jan 28, 2025

The thinking is quite fascinating though, I love reading it. Especially when it notices something must be wrong. It will probably be very helpful to refine answer for itself and other models.

It does add latency of course, but I still think that I could provide all AI needs of my company (industrial production) with a simple older off the shelf PC. My GPU is decently recent, but the smallest model of the series and otherwise the machine is a rusty bucket.

I didn't test it thoroughly yet, but I have some invoices where I need to extract info and it did a perfect job until now. But I don't think there is any LLM yet that can do that without someone checking the output.

blackeyeblitzar · on Jan 28, 2025

The US companies got too greedy? How? They invented this entire space, literally. DeepSeek built their base models off Llama releases and OpenAI outputs (or so it’s thought), and while they added some optimizations on top, it seems like they’ve lied about the costs to produce their models by simply being vague about their base model and training data, and quoting the cost of their final training run.

And then there’s all the dystopian propaganda baked into these models, which threatens to misinform users at scale based on a government driven agenda. Hard to be on that team, let alone firmly, knowing that it’s giving power to a dictatorial regime.

wkat4242 · on Jan 28, 2025

The US models are also full of censorship. For example the US is much more sensitive to anything related to sexuality and here in Europe it's quite frustrating to deal with that censorship.

infecto · on Jan 28, 2025

I think we will find that each region will have their own flair of censorship. The only reason it stands out more from a Chinese perspective is the requirement to have alignment with PRC/CCP rhetoric.

wkat4242 · on Jan 28, 2025

Yes that's what I mean. I wish all models were uncensored and it would just be up to the implementer to decide how to finetune on top of that. Save for the super crazy stuff of course.

mvc · on Jan 28, 2025

> The US companies got too greedy? How? They invented this entire space, literally

And when they thought they were the only game in town, they tried to corner the market in GPUs and lock out any users who can't pony up £200/mo. Reminds me of when the likes of Oracle and IBM had companies by the balls buying bigger and bigger servers and then Google came along and showed everyone how to do horizontal scaling of cheap hardware.

raxxor · on Jan 28, 2025

That was perhaps a bit too general, but aside from meta and Google they didn't share their research and tried to sell AI products as fast as possible and tried to lobby legislation to keep their head start. I would also include nvidia here, that has some moat through software integrations.

I haven't tested deepseek for censorship yet, but they shared their release and even their input data. And in this case you could correct its shortcomings, so propaganda would be difficult.

famouswaffles · on Jan 28, 2025

>DeepSeek built their base models off Llama releases and OpenAI outputs (or so it’s thought)

The first one is definitely not true and the 2nd one is not necessarily true in the way you imagine i.e crawls of the internet will have gpt chat logs now.

timeon · on Jan 28, 2025

> DeepSeek built their base models off Llama releases and OpenAI outputs

Those models are also trained on data that was ignoring licenses / copyrighted content.

ithkuil · on Jan 28, 2025

That's a problem for sure. But why would that argument play in favour china which would be even less constrained by licenses / copyright ?

timeon · on Jan 28, 2025

I didn't mean it as favour for China. Just that this industry is unfortunately like that.