More

MaKey · 2025-12-02T09:30:56 1764667856

> But also, google spent a mountain of money advertising chrome.

That money was also used to increase the user base via drive-by installations, e. g. while installing Adobe Reader you had to deselect the Chrome installation, otherwise you'd find yourself with a new standard browser afterwards.

MaKey · 2025-11-18T13:24:46 1763472286

Maybe this incident will make people rethink putting Cloudflare blindly in front of every website.

senfiaj · 2025-11-18T14:27:12 1763476032

In theory even a single company service could be distributed, so only a fraction of websites would be affected, thus it's not a necessity to be a single point of failure. So I still don't like this argument "you see what happens when over half of the internet relies on Cloudflare". And yes, I'm writing this as a Cloudflare user whose blog is now down because of this. Cloudflare is still convenient and accessible for many people, no wonder why it's so popular.

But, yeah, it's still a horrible outage, much worse than the Amazon one.

mallets · 2025-11-18T15:13:00 1763478780

The "omg centralized infra" cries after every such event kind of misses the point. Hosting with smaller companies (shared, vps, dedi, colo whatever) will likely result in far worse downtimes, individually.

Ofc the bigger perception issue here is many services going out at the same time, but why would (most) providers care if their annual downtime does or doesn't coincide with others? Their overall reliability is no better or worse had only their service gone down.

All of this can change ofc if this becomes a regular thing, the absolute hours of downtime does matter.

senfiaj · 2025-11-18T15:30:13 1763479813

Exactly.

MaKey · 2025-11-13T18:32:54 1763058774

I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.

mads_quist · 2025-11-13T20:28:25 1763065705

You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.

No need for SREs. Just add 2 more Hetzner servers.

spwa4 · 2025-11-13T22:55:35 1763074535

The sad part of that is that 3 Hetzner servers are still less than 20% of the price of equivalent AWS resources. This was already pretty bad when AWS started, but now it's reaching truly ridiculous proportions.

from the "Serverborse": i7-7700 with 64GB ram and 500G disk.

37.5 euros/month

This is ~8 vcpus + 64GB ram + 512G disk.

585 USD/month

It gets a lot worse if you include any non-negligible internet traffic. How many machines before for your company a team of SREs is worth it? I think it's actually dropped to 100.

mads_quist · 2025-11-14T06:26:58 1763101618

Sure, I am not against Hetzner, it's great. I just find that running sth in HA mode is important for any service that is vital to customers. I am not saying that you need HA for a website. Also, I run many applications NOT in HA mode but those are single customer applications where it's totally fine to do maintenance at night or on the weekend. But for SaaS this is probably not a very good idea.

lewiscollard · 2025-11-13T18:49:26 1763059766

Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.

immibis · 2025-11-14T14:53:19 1763131999

How bad are the effects of an interruption for you? Google has servers running every day, but you with one server can afford to gamble on it, since it probably won't fail for years - no matter the hardware though, keep a backup, because data loss is permanent. Would you lose millions of dollars a minute, or would you just have to send an email to customers saying "oops"?

Risk management is a normal part of business - every business does it. Typically the risk is not brought down all the way to zero, but to an acceptable level. The milk truck may crash and the grocery store will be out of milk that day - they don't send three trucks and use a quorum.

If you want to guarantee above-normal uptime, feel free, but it costs you. Google has servers failing every day just because they have so many, but you are not Google and you most likely won't experience a hardware failure for years. You should have a backup because data loss is permanent, but you might not need redundancy for your online systems. Depending on what your business does.

smartbit · 2025-11-14T06:21:05 1763101265

Normal Accidents https://en.wikipedia.org/wiki/Normal_Accidents

PunchyHamster · 2025-11-13T21:38:00 1763069880

HA can be hard to get right, sure, but you have to at least have (TESTED) plan for what happens

"Run a script to deploy new node and load last backup" can be enough, but then you have to plan on what to tell customers when last few hours of their data is gone

MaKey · 2025-10-26T20:55:31 1761512131

Why would you get this when a Ryzen AI Max+ 395 with 128 GB is a fraction of the price?

zamadatix · 2025-10-26T21:14:59 1761513299

Theoretically it has slightly better memory bandwidth, (you are supposed to get) the Nvidia AI software ecosystem support out of the box, and you can use the 200G NIC to stick 2 together more efficiently.

Practically, if the goal is 100% about AI and cloud isn't an option for some reason, both options are likely "a great way to waste a couple grand trying to save a couple grand" as you'd get 7x the performance and likely still feel it's a bit slow on larger models using an RTX Pro 6000. I say this as a Ryzen AI Max+ 395 owner, though I got mine because it's the closest thing to an x86 Apple Silicon laptop one can get at the moment.

d3m0t3p · 2025-10-26T20:59:14 1761512354

Because the ML ecosystem is more mature on the NVidia side. Software-wise the cuda platform is more advanced. It will be hard for AMD to catch up. It is good to see competition tho.

shikon7 · 2025-10-26T21:06:05 1761512765

But the article shows that the Nvidia ecosystem isn't that mature either on the DGX Spark with ARM64. I wonder if Nvidia is still ahead for such use cases, all things considered.

bigyabai · 2025-10-26T21:39:25 1761514765

On the DGX Spark, yes. On ARM64, Nvidia has been shipping drivers for years now. The rest of the Linux ecosystem is going to be the problem, most distros and projects don't have anywhere near the incentive Nvidia does to treat ARM like a first-class citizen.

simlevesque · 2025-10-26T21:00:51 1761512451

moffkalast · 2025-10-27T18:01:14 1761588074

WOULDA

SHOULDA

pjmlp · 2025-10-26T20:59:50 1761512390

Complete computer with everything working.

simjnd · 2025-10-26T21:06:26 1761512786

The complete Framework Desktop with everything working (including said Ryzen AI Max 395+ and 128 GB of RAM) is 2500 EUR. In Europe the DGX Spark listings are at 4000+ EUR.

Mars008 · 2025-10-26T23:00:18 1761519618

It's a different animal. Ryzen wins on memory bandwidth and has 'AI' accelerator (my guess matrix multiplication). Spark has times lower bandwidth, but much better and more generic compute. Add to that CUDA ecosystem with libs and tools. I'm not saying Ryzen is bad, actually it's great Mac substitute for poor man. $2K for 128GB version on Amazon now.

z3ratul163071 · 2025-10-27T06:34:22 1761546862

the macs are indeed the best consumer hw out there. they have a big downside: mac os only.

the reason we use ryzens is because we run linux with almost no problems on them.

pjmlp · 2025-10-26T21:12:48 1761513168

Framework doesn't sell in Europe and they are sponsoring the wrong kind of folks nowadays.

simjnd · 2025-10-26T21:20:34 1761513634

Framework does absolutely sell in several countries in Europe.

pjmlp · 2025-10-27T08:10:05 1761552605

Media market, Cool Blue, FNAC, Saturn, Publico,.... where?

simjnd · 2025-10-27T08:25:17 1761553517

Online only at https://frame.work AFAIK. I don't think people shelling out 2-4k for an AI training machine are concerned whether or not they can find it at a hardware store locally or online, but I may be wrong.

zamadatix · 2025-10-26T21:09:13 1761512953

The vast majority of Ryzen AI Max+ 395s (by volume at least) are sold as complete system offerings as well. About as far as you can go the other way is getting one without an SSD, as the MB+RAM+CPU are an "all or nothing" bundle anyways.

pjmlp · 2025-10-26T21:11:36 1761513096

Including a Linux distribution with working drivers?

overfeed · 2025-10-26T21:43:47 1761515027

Fortunately, AMD upstreams its changes so no custom distro is required for Strix Halo boxes. The DGX is the platform more at risk of being left behind on Linux - just like Jetson before it, which also had a custom, now-abandoned distro.

rubatuga · 2025-10-27T02:10:51 1761531051

This right here, Jetson is abandoned - while Strix Halo is x86 and will run new Linux distributions for years (decades?)

nullfield · 2025-10-27T02:48:26 1761533306

Does NVIDIA really not have a defined support lifetime/cycle?

spwa4 · 2025-10-27T13:22:55 1761571375

You may remember this: https://www.youtube.com/watch?v=iYWzMvlj2RQ

zamadatix · 2025-10-26T21:22:01 1761513721

Needing a customized spin of Ubuntu to have working video drivers is an Nvidia thing. One can also choose a Windows option, if they like, and run AI from there as it's just a standard x86 PC. That might actually be the best option for those worried about pre-installed OSs for AI tinkering.

The userspace side is where AI is difficult with AMD. Almost all of the community is build around Nvidia tooling first, others second (if it all).

z3ratul163071 · 2025-10-27T06:35:45 1761546945

i cannot state how much i despise this 'old ubuntu needed' state of affairs with the ai stuff

TiredOfLife · 2025-10-26T22:46:18 1761518778

Amd works with recent kernels oob. DGX runs on custom Ubuntu with a year old kernel

pjmlp · 2025-10-27T08:11:45 1761552705

It is not what the Romc experience tells.

zamadatix · 2025-10-27T22:45:15 1761605115

Does Romc=ROCm, or something else? If the former, ROCm is just a userspace compute library for the in-kernel amdgpu driver. The "kernels" it runs are GPU compute programs, not customized Linux kernels.

MaKey · 2025-10-21T08:51:04 1761036664

> Same meme would work for Aws today.

Not really, there are enough alternatives.

jonplackett · 2025-10-22T13:46:54 1761140814

How any just run on AWS underneath though?

And it’s not lie there aren’t other brands of chocolate either…

MaKey · 2025-10-20T00:29:21 1760920161

What's the reason for not considering Proxmox?

SteveNuts · 2025-10-20T01:03:17 1760922197

They seriously need to invest in a well engineered multi node cluster filesystem. VMFS made VMware into the behemoth it is.

Without that your options for HA shared storage is Ceph (which proxmox makes decently easy to run), or NFS.

garganzol · 2025-10-20T00:38:37 1760920717

My 2 cents: Proxmox is too rigid. For example:

1. Proxmox cannot even join a network using DHCP requiring manual IP configuration.

2. Disk encryption is a hell instead of checkbox in installer

3. Wi-Fi - no luck (rarely used for real servers, but frequently for r&d racks)

Of course, it is a Debian core underneath and a lot of things are possible given enough time and persistence, but other solutions have them out of the box.

SirMaster · 2025-10-20T14:59:05 1760972345

My Proxmox seems to use DHCP just fine by putting "iface eno1 inet dhcp" in /etc/network/interfaces

ohdeardear · 2025-10-20T00:53:50 1760921630

[flagged]

kaliszad · 2025-10-20T01:11:27 1760922687

If you were more polite, you could have a good entry to the discussion.

Yes, Proxmox is built on Debian so anything Debian can do Proxmox VE can mostly do as well without major issues.

stego-tech · 2025-10-20T15:59:51 1760975991

Proxmox wasn’t considered because of the audience (leadership) and Proxmox’s perceived market (SMBs/homelabs). I couldn’t even get them to take Virtuozzo seriously, so Proxmox was entirely a non-starter, unfortunately.

FWIW, I use Proxmox at home. It’s a bit obtuse at times, but it runs like a champ on my N100-based NUCs.

MaKey · 2025-10-18T10:13:18 1760782398

Interesting that aiven is still around after they've lost customer data a few years back.

oskari · 2025-10-21T11:52:39 1761047559

I believe you're referring to our January 2020 Kafka incident where a logic bug caused data loss for a customer. It was a serious failure and a huge learning moment for us.

The platform we operate today is fundamentally different and far more resilient than it was five years ago. We've scaled significantly (recently passing $100M ARR) because we took those early lessons seriously and continue to prioritize reliability.

MaKey · 2025-10-18T09:13:42 1760778822

They are pretty strict with account sign-ups because they are so cheap, which attracts abuse. Unfortunately this prevents some regular sign-ups too. You could try Netcup, I've also had a good experience with them.

MaKey · 2025-10-18T09:04:02 1760778242

Terraform/OpenTofu is good for infrastructure but it becomes a pain if you use it for k8s deployments. I suggest using GitOps (ArgoCD / flux) for everything inside the cluster and OpenTofu for the cluster itself.

e12e · 2025-10-18T14:38:27 1760798307

Downside with that is having DNS, managed databases, cloud storage (s3) outside K8s - When deployments are also managed with tf, it's easy to connect a deployment in K8s to resources outside K8s (via tf state).

MaKey · 2025-10-18T16:27:32 1760804852

You can still do that by reading the values you need directly from the state file. I suggest that you define outputs and access them through the state file from other projects, otherwise the external dependency will be hidden.

e12e · 2025-10-19T02:01:06 1760839266

And splice into deployment definitions via something like kustomize?

MaKey · 2025-10-19T10:48:24 1760870904

It depends on your use case what the best option would be. You can put variables into kustomize templates and replace them via envsubst. For helm charts you can just supply the values for the variables during the install / upgrade.

Another possibility would be to create a ConfigMap with the values you read from the state file.

MaKey · 2025-10-17T09:02:45 1760691765

> Using the internet in the UK/EU is such a horrible experience, every cookie pop-up is a reminder how badly thought out these rules are.

Technical cookies don't require any consent so every time you see a cookie banner the website owner wants to gather more data about you than necessary. Furthermore, these rules don't require cookie banners, it's what the industry has chosen as the way to get consent to track their users.

mnmalst · 2025-10-17T10:29:37 1760696977

Or the website owner doesn't want to take the risk and ads a banner even if the site strictly doesn't need one.

ryandrake · 2025-10-17T15:49:08 1760716148

So when I see a tracking cookie dialog on a web site, either 1. the site collects more data than they need to in order to run the site or 2. they don't and the site's management is incompetent. Both are pretty good reasons to avoid that particular web site.

littlestymaar · 2025-10-17T11:40:33 1760701233

There's no risk, they know what they are doing because the law doesn't just mandate the banner, it mandates you to know which third party service you're sharing the data to.

Check the banner next time, you'll see how many “partners” they do sell your data to.

aveao · 2025-10-17T10:48:56 1760698136

that seems like an issue with the website owner to me

reorder9695 · 2025-10-17T17:20:17 1760721617

A lot of websites for smaller businesses will not be run by technical people, they'll be run by business people or otherwise who don't understand cookies beyond "I see cookie banners on every website I visit, therefore to avoid legal trouble I need one too", you can't expect someone like that to understand the difference between tracking cookies and technical cookies.

Aachen · 2025-10-17T17:35:31 1760722531

We're a small business, <10FTE, and have no cookie notice at all. We don't track people.

account42 · 2025-10-20T14:42:19 1760971339

Ah yes of course. How could I forget about poor Mom & Pop Co. and their 186 business partners that they want to share my personal data with. Surely we can't expect such a small operation to know what they are doing.

reorder9695 · 2025-10-20T16:21:16 1760977276

That's not the point I'm making, I'm saying whoever in Mom & Pop Co. set up the website may well not understand the difference between the cookie types and even if they are using no tracking cookies and sharing no data, they may well put a cookie notice on their website anyway as they're so common they think they're normal, the law allows for huge fines, and they're doing it out of an abundance of caution.