Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, if the Zastava had 5-10x the amount of horsepower and storage space of the BYD for the same amount of money. Because that’s what is often the reality. Bare metal is unreasonably efficient compared to cloud services for not that much more know-how.

I do tech DD work for investment funds etc and one thing I often see are slow, complex and expensive AWS-heavy architectures that optimize for problems the company doesn’t have and often will never have. In theory to ensure stability and scalability. They are usually expensive and have nightmarish configuration complexity.

In practice complexity tends to lead to more outages and performance issues than if you had a much simpler (rented) bare metal setup with some spare capacity and better architecture design. More than half of serious outages I have seen documented in these reviews came from configuration mistakes or bugs in software that is supposed to manage your resources.

Nevermind that companies invest serious amounts of time in trying to manage complexity rather than remove it.

A few years ago I worked for a company that had two competing systems. One used AWS sparingly: just EC2, S3, RDS and load balancers. The other went berserk in the AWS candy shop and was this monstrosity that used 20-something different AWS services glued together by lambdas. This was touted as “the future”, and everyone who didn’t think it was a good idea was an idiot.

The simple solution cost about the same to run for a few thousand (business customers) as the complex one cost for ONE customer. The simple solution cost about 1/20 to develop. It also had about 1/2500 the latency on average because it wasn’t constantly enqueuing and dequeueing data through a slow SQS maze of queues.

And best of all: you could move the simpler solution to bare metal servers. In fact, we ran all the testing on clusters of 6 RPIs. The complex solution was stuck in AWS forever.





All aws is selling a web gui on top of free software. You still have to know ins and outs of the software to manage it properly.

Heck their support is shit too. I have talked to them to figure out an issue on their own in house software, they couldn’t help. My colleague happened to know what was wrong and fixed the issue with a switch of a checkbox.


Hetzner doesn't even have an RDS service. I've heard rumors for years but they haven't done it. Also, while I agree that leaning too much on the cloud leads to lock-in - this is an abstract concept that needs to be guarded against when managing technology, always, anyway - and vendor-driven hellish architectures, "vanilla cloud" offers other conveniences other than compute, bucket, storage managed and load balancers, like IAM, good CLIs, secrets management, etc. Only Scaleway or OVH seem to be timidly developing what I would consider "vanilla cloud".

Here’s a key question most people don’t actually ask: do you need RDS? If you’re on AWS the answer is fairly simple because it boils down to what IO capacity you are willing to pay for. Decent performance will cost you so you might as well pay the premium and have AWS manage it too.

But if you have bare metal with fast disk drives, everything changes. You can get decent performance at a lower price in exchange for taking on a bit more responsibility. So then the question becomes how much of a burden it is to manage what is essentially just another application.

AWS doesn’t just rent computers, it rents relief from responsibility, and prices raw performance to make that trade feel inevitable.

Most people do not operate services that cannot bear very occasional downtime. But they have been conditioned to think they do. Or to not consider other factors that influence their actual downtime.

For instance: we ran a service that in itself achieved 99.99% uptime (allowing about 52 minutes of downtime per year). We even survived a big AWS outage that took out almost everyone else because we had as much redundancy as we could afford. However, the service depended completely on a system totally outside our control that would, on average have multiple outages every day (well usually at night, but not always). Ranging from 30 second blips to an hour. Meaning that the customers would have to deal with this anyway. No matter how stable our systems were.

And yet, for years we obsessed about uptime needlessly. Our customers didn’t care. They had to deal with the unreliability of the upstream system anyway. It didn’t cost us that much money, but it did make everything more complex.

Now, back to the question: do you need RDS? When was the last time you set up and ran Postgres? When was the last time you set up replication and live backups? How hard was it the first time? How hard was it to repeat after doing it once?

If you are already on bare metal servers you may want to at least try to set up Postgres a few times and track cost in terms of time, money and complexity. Because if you use RDS, chances are it isn’t the only thing you are managing in the cloud.


Out of interest, what control plane do you use for a Hetzner/metal setup? Kubernetes ecosystem?

I use Coolify for side projects, haven’t investigated whether I’d want to use it for bigger/importanter stuff.


A surprising number of solutions can be realized in ways that don't actually need much of a control plane if you introduce a few design constraints.

But if you do need one, I guess Kubernetes is perhaps the safe bet. Not so much because I think it is better/worse than anything else, but because you can easily find people who know it and it has a big ecosystem around it. I'd probably recommend Kubernetes if I were forced to make a general recommendation.

That being said, this has been something that I've been playing with a bit over the years. I've been exploring both ends of the spectrum. What I realized is that we tend to waste a lot of time on this with very little to show for it in terms of improved service reliability.

On one extreme we built a system that has most of the control plane as a layer in the server application. Then external to that we monitored performance and essentially had one lever: add or remove capacity. The coordination layer in the service figured out what to do with additional resources. Or how to deal with resources disappearing. There was only one binary and the service would configure itself to take on one of several roles as needed. All the way down to all of the roles if you are the last process running. (Almost nobody cares about the ability to scale all the way down, but it is nice when you can demo your entire system on a portable rack of RPis - and then just turn them off one by one without the service going down)

On the other extreme is having a critical look at what you really need and realize that if the worst case means a couple of hours of downtime a couple of times per year, you can make do with very little. Just systemd deb packages and SSH access is sufficient for an awful lot of more forgiving cases.

I also dabbled a bit in running systems by having a smallish piece of Go code remote-manage a bunch of servers running Docker. People tend to laugh about this, but it was easy to set up, it is easy to understand and it took care of everything that the service needed. The kubernetes setup that replaced it has had 4-5 times the amount of downtime. But to be fair, the person who took over the project went a bit overboard and probably wasn't the best qualified to manage kubernetes to begin with.

It seems silly to not take advantage of Docker having an API that works perfectly well. (I'd research Podman if I were to do this again).

I don't understand why more people don't try the simple stuff first when the demands they have to meet easily allow for it.


+1 for bare metal! I wish I could convince more C level people that that's what we need most of the time.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: