More

maven29 · 2025-08-02T17:52:25 1754157145

It's mostly the US and a few other small markets that even have millimeter wave 5G NR. This is mostly due to the fact that FCC had not wound down analog broadcasts in time, and mmWave/FR2 was the only way to do 5G in the US initially, as lower C-band were not freed up until 2021. Deployments of mmWave exist solely due to the sunk cost of existing equipment and narrow use-cases like stadiums and concerts.

The article predates our current reality where C-band (3.5GHz) is available for 5G

maven29 · 2025-08-02T11:21:07 1754133667

There is an A16z company that does exactly this, called yupp.ai. They need genuine labelling/feedback in return, but you get to either spend credits on expensive APIs or cash out. Likewise, openrouter has free endpoints from some providers who will retain your sessions for training.

maven29 · 2025-08-01T18:11:49 1754071909

A warning shot to guard against an AT&T Bell-style forced divestiture?

imchillyb · 2025-08-01T18:37:08 1754073428

I believe this is the simplest and most succinct answer given the current anti monopoly climate the courts and prosecutors have.

maven29 · 2025-07-28T06:50:54 1753685454

How do you do abuse detection for free-tier without these?

nottorp · 2025-07-28T12:58:56 1753707536

Provide a light version of your app for the free tier that does not use any remote resources ofc.

Then you don't have to worry about "abuse".

Y_Y · 2025-07-28T16:38:07 1753720687

Counter-abuse is hardly the answer

bravesoul2 · 2025-07-28T08:40:45 1753692045

Same way Linux does it.

maven29 · 2025-07-25T14:16:04 1753452964

This is mostly licensed from IBM research, and IBM research already has significant BSPDN IP, so I don't see why they couldn't also license that.

maven29 · 2025-07-11T16:51:32 1752252692

32B active parameters with a single shared expert.

JustFinishedBSG · 2025-07-11T16:54:05 1752252845

This doesn’t change the VRAM usage, only the compute requirements.

selfhoster11 · 2025-07-11T19:11:08 1752261068

It does not have to be VRAM, it could be system RAM, or weights streamed from SSD storage. Reportedly, the latter method achieves around 1 token per second on computers with 64 GB of system RAM.

R1 (and K2) is MoE, whereas Llama 3 is a dense model family. MoE actually makes these models practical to run on cheaper hardware. DeepSeek R1 is more comfortable for me than Llama 3 70B for exactly that reason - if it spills out of the GPU, you take a large performance hit.

If you need to spill into CPU inference, you really want to be multiplying a different set of 32B weights for every token compared to the same 70B (or more) instead, simply because the computation takes so long.

refulgentis · 2025-07-11T19:31:12 1752262272

The amount of people who will be using it at 1 token/sec because there's no better option, and have 64 GB of RAM, is vanishingly small.

IMHO it sets the local LLM community back when we lean on extreme quantization & streaming weights from disk to say something is possible*, because when people try it out, it turns out it's an awful experience.

* the implication being, anything is possible in that scenario

selfhoster11 · 2025-07-12T06:34:07 1752302047

Good. Vanishingly small is still more than zero. Over time, running such models will become easier too, as people slowly upgrade to better hardware. It's not like there aren't options for the compute-constrained either. There are lots of Chinese models in the 3-32B range, and Gemma 3 is particularly good too.

I will also point out that having three API-based providers deploying an impractically-large open-weights model beats the pants of having just one. Back in the day, this was called second-sourcing IIRC. With proprietary models, you're at the mercy of one corporation and their Kafkaesque ToS enforcement.

refulgentis · 2025-07-12T12:06:19 1752321979

You said "Good." then wrote a nice stirring bit about how having a bad experience with a 1T model will force people to try 4B/32B models.

That seems separate from the post it was replying to, about 1T param models.

If it is intended to be a reply, it hand waves about how having a bad experience with it will teach them to buy more expensive hardware.

Is that "Good."?

The post points out that if people are taught they need an expensive computer to get 1 token/second, much less try it and find out it's a horrible experience (let's talk about prefill), it will turn them off against local LLMs unnecessarily.

Is that "Good."?

jimjimwii · 2025-07-13T17:18:48 1752427128

Had you posted this comment in the early 90s about linux instead of local models, it would have made about the same amount of sense but aged just as poorly as this comment will.

I'll remain here happily using 2.something tokens / second model.

apitman · 2025-07-14T03:29:09 1752463749

But local aka desktop Linux is still an awful experience for most people. I use Arch btw

selfhoster11 · 2025-07-15T08:15:52 1752567352

I'd rather use Arch over a genuine VT100 than touch Windows 11, so the analogy remains valid - at least you have a choice at all, even if you are in a niche of a niche.

homarp · 2025-07-11T21:56:27 1752270987

agentic loop can run all night long. It's just a different way to work: prepare your prompt queue, set it up, check result in the morning, adjust. 'local vibe' in 10h instead of 10mn is still better than 10 days of manual side coding.

hereme888 · 2025-07-12T09:14:39 1752311679

Right on! Especially if its coding abilities are better than Claude 4 Opus. I spent thousands on my PC in anticipation of this rather than to play fancy video games.

Now, where's that spare SSD...

maven29 · 2025-07-11T16:56:34 1752252994

You can probably run this on CPU if you have a 4090D for prompt processing, since 1TB of DDR4 only comes out to around $600.

For GPU inference at scale, I think token-level batching is used.

zackangelo · 2025-07-11T17:45:36 1752255936

Typically a combination of expert level parallelism and tensor level parallelism is used.

For the big MLP tensors they would be split across GPUs in a cluster. Then for the MoE parts you would spread the experts across the GPUs and route to them based on which experts are active (there would likely be more than one if the batch size is > 1).

t1amat · 2025-07-11T17:13:21 1752254001

With 32B active parameters it would be ridiculously slow at generation.

selfhoster11 · 2025-07-11T19:15:28 1752261328

DDR3 workstation here - R1 generates at 1 token per second. In practice, this means that for complex queries, the speed of replying is closer to an email response than a chat message, but this is acceptable to me for confidential queries or queries where I need the model to be steerable. I can always hit the R1 API from a provider instead, if I want to.

Given that R1 uses 37B active parameters (compared to 32B for K2), K2 should be slightly faster than that - around 1.15 tokens/second.

CamperBob2 · 2025-07-12T18:56:30 1752346590

That's pretty good. Are you running the real 600B+ parameter R1, or a distill, though?

selfhoster11 · 2025-07-14T01:54:07 1752458047

The full thing, 671B. It loses some intelligence at 1.5 bit quantisation, but it's acceptable. I could actually go for around 3 bits if I max out my RAM, but I haven't done that yet.

apitman · 2025-07-14T03:32:05 1752463925

I've seen people say the models get more erratic at higher (lower?) quantization levels. What's your experience been?

selfhoster11 · 2025-07-15T08:14:36 1752567276

If you mean clearly, noticeably erratic or incoherent behaviour, then that hasn't been my experience for >=4-bit inference of 32B models, or in my R1 setup. I think the others might have been referring to this happening with smaller models (sub-24B), which suffer much more after being quantised below 4 or 5 bits.

My R1 most likely isn't as smart as the output coming from an int8 or FP16 API, but that's just a given. It still holds up pretty well for what I did try.

maven29 · 2025-05-08T23:25:44 1746746744

Enforcement of Copilot premium request limits moved to June 4, 2025 https://github.blog/changelog/2025-05-07-enforcement-of-copi...

maven29 · 2025-01-30T15:57:56 1738252676

They're both European. Look at the author names on the llama paper.

resource_waste · 2025-01-31T09:44:28 1738316668

That is a very European thing to say/do/claim.

cpldcpu · 2025-01-31T15:40:36 1738338036

a goof part of team is actually located in europe

resource_waste · 2025-01-31T17:13:26 1738343606

I said it was a very European thing to say, because only a European would stretch that hard.

Merikan company with Merikan investment get the credit. No one cares except Europeans about the interchangable workers residency is.

I'm trying to remember the other case where people lol'd at Europe/Italy for taking credit for something that was clearly invented in the US. I think the person was born there, and moved to the US, but Italy still took credit.

lol no. Its probably even more embarrassing that they left Europe.

cpldcpu · 2025-02-01T16:39:48 1738427988

I thought now everything is about meritocracy? Have we been duped?

maven29 · on Nov 19, 2024

You could also check the world catalog to see if a library near you offers the ebook for lending. Universities typically allow the general public to walk in and look at books without registration.

https://search.worldcat.org/title/1409698868

maven29 · on Nov 13, 2024

Just like the heart rate monitor, this was also moved into smartwatches for optimizing revenue from those who truly need it.

resoluteteeth · on Nov 13, 2024

Are you saying that smartphones used to have heart rate monitors but they were removed to force people to buy smart watches?

astrange · on Nov 13, 2024

Any phone that can record 120fps video has a heart rate monitor, if you can calibrate it.

connicpu · on Nov 13, 2024

I forget which generation it was now, but many years ago I had a Samsung Galaxy phone that had a sensor that could measure your heart rate if you put your finger over it

superhuzza · on Nov 13, 2024

The Galaxy S7 definitely had this feature.

The problem is, it was basically useless. The main use case for heart rate monitoring is continuously throughout the day/night, or during exercise. A watch is very good at this. An optical sensor on the back of your phone is not.

Periodically checking your heart rate by holding your phone in a specific way is not a useful feature for that many people.

chucksmash · on Nov 13, 2024

+1. If a phone has a stopwatch, you can get your bpm at a given moment with a finger and a multiplication (or patience). Given the limited real estate on a mobile phone it's crazy to devote any space for something so trivial.

6510 · on Nov 13, 2024

For me, just having something that you can check 1 time per year is much better than nothing. The anecdote: I was working two heavy manual labor jobs 7 days per week and felt absolutely fantastic, glowing with power. Turned out the heart rate at rest was 180 lmao. Like a rabbit. Took a few days off, it dropped below 70.

idiot900 · on Nov 13, 2024

If your heart rate at rest is truly 180, you have an arrhythmia and need to see a doctor.

6510 · on Nov 13, 2024

It should decline some 60 sec after physical activity. I think I measured after 10-15 minutes. Fatigue, dehydration, lack of sleep, a diet with lots of coffee. I just added those to the todo list. I went home to sleep (and confiscated the heart rate meter), the next morning it was around 100 bpm which is still terrible shape. Over the day it sunk to 80, over the 2nd day to 70, 3rd a bit lower. Back at work it would barely elevate. The moral of the story: Don't work out 11 hours per day for 6 months straight.

telgareith · on Nov 13, 2024

It's not a separate sensor- 30fps on an iPhone was more than enough.

Turns out, when you have a known luminance, white balance, and frame-rate... the DSP to grab heart rate from a finger is trivial.

ssl-3 · on Nov 14, 2024

Cameras can do it, yes.

But on the Galaxy S7 that I think is being alluded to here, it was definitely a separate sensor -- a MAX86902, IIRC.

varelse · on Nov 13, 2024

The heart rate monitors in Google and Fitbit devices are insanely inaccurate during exercise. I had suspected as much already. Whenever a brisk walk indicated my pulse was 150 or so. But first, I could not reproduce this wearing a halter device and later I could not reproduce this wearing a polar heart rate monitor or a Frontier X2 ECG.

Conclusion, Fitbit and Google heart rate monitors on those wearables are hot garbage. Cue some snooty googler insisting I'm doing it wrong somehow.