For me personally, I get a little bit salty about it due to imagined, theoretica...

drawnwren · on Aug 9, 2024

Genuine question: how are you handling load balancing, log aggregation, failure restart + readiness checks, deployment pipelines, and machine maintenance schedules with these “simple” setups?

Because as annoying as getting the prometheus + loki + tempo + promtail stack going on k8s is —- I don’t really believe that writing it from scratch is easier.

moduspol · on Aug 9, 2024

* Load balancing is handled pretty well by ALBs, and there are integrations with ECS autoscaling for health checks and similar

* Log aggregation happens out of the box with CloudWatch Logs and CloudWatch Log Insights. It's configurable if you want different behavior

* On ECS, you configure a "service" which describes how many instances of a "task" you want to keep running at a given time. It's the abstraction that handles spinning up new tasks when one fails

* ECS supports ready checks, and (as noted above) integrates with ALB so that requests don't get sent to containers until they pass a readiness check

* Machine maintenance schedules are non-existent if you use ECS / Fargate, or at least they're abstracted from you. As long as your application is built such that it can spin up a new task to replace your old one, it's something that will happen automatically when AWS decommissions the hardware it's running on. If you're using ECS without Fargate, it's as simple as changing the autoscaling group to use a newer AMI. By default, this won't replace all of the old instances, but will use the new AMI when spinning up new instances

But again, though: the biggest selling point is the lack of maintenance / babysitting. If you set up your stack using ECS / Fargate and an ALB five years ago, it's still working, and you've probably done almost nothing to keep it that way.

You might be able to do the same with Kubernetes, but your control plane will be out of date, your OSes will have many missed security updates. Might even need a major version update to the next LTS. Prometheus, Loki, Tempo, Promtail will be behind. Your helm charts will be revisions behind. Newer ones might depend on newer apiVersions that your control plane won't support until you update it. And don't forget to update your CNI plugin across your cluster, too.

It's at least one full time job just keeping all that stuff working and up-to-date. And it takes a lot more know-how than just ECS and ALB.

NewJazz · on Aug 9, 2024

It seems like you are comparing ECS to a self-managed Kubernetes cluster. Wouldn't it make more sense to compare to EKS or another managed Kubernetes offering? Many of your points don't apply in that case, especially around updates.

moduspol · on Aug 9, 2024

A managed Kubernetes offering removes only some of the pain, and adds more in other areas. You're still on the hook for updating whatever add-ons you're using, though yes, it'll depend on how many you're using, and how painful it will be varies depending on how well your cloud provider handles it.

Most of my managed Kubernetes experience is through Amazon's EKS, and the pain I remember included frustration from the supported Kubernetes versions being behind the upstream versions, lack of visibility for troubleshooting control nodes, and having to explain / understand delays in NIC and EBS appropriation / attachments for pods. Also the ALB ingress controller was something I needed to install and maintain independently (though that may be different now).

Though that was also without us going neck-deep into being vendor agnostic. Using EKS just for the Kubernetes abstractions without trying hard to be vendor agnostic is valid--it's just not what I was comparing above because it was usually that specific business requirement that steered us toward Kubernetes in the first place.

If you ARE using EKS with the intention of keeping as much as possible vendor agnostic, that's also valid, but then now you're including a lot of the stuff I complained about in my other comment: your own metrics stack, your own logging stack, your own alarm stack, your own CNI configuration, etc.

drawnwren · on Aug 9, 2024

(Apologies for the snark, someone else made a short snarky comment that I felt was also wrong and I thought this thread was in reply to them before I typed it out -- thank you for the reply)

- ALBs -- yeah this is correct. However ALBs have much longer startup/health check times than Envoy/Traefik

- Cloudwatch - this is true, however the "configurable" behavior makes cloudwatch trash out of the box. you get i.e. exceptions split across multiple log entries with the default configure

- ECS tasks - yep, but the failure behavior of tasks is horrible because there're no notifications out of the box (you can configure it)

- Fargate does allow you to avoid maintenance, however it has some very hairy edges like i.e. you can't use any container that expects to know its own ip address on a private vpc without writing a custom script. Networking in general is pretty arcane on Fargate and you're going to have to manually write and maintain the breakages from all this

> You might be able to do the same with Kubernetes, but your control plane will be out of date, your OSes will have many missed security updates. Might even need a major version update to the next LTS. Prometheus, Loki, Tempo, Promtail will be behind. Your helm charts will be revisions behind. Newer ones might depend on newer apiVersions that your control plane won't support until you update it. And don't forget to update your CNI plugin across your cluster, too.

I think maybe you haven't used K8S in years. Karpenter, EKS, + a GitOps (Flux or Argo) makes you get the same machine maintenance feeling as ECS but on K8S without any of the annoyances of dealing with ECS. All your app versions can be pinned or set to follow latest as you prefer. You get rolling updates each time you switch machines (same as ECS, and if you really want to you can run on top of Fargate).

By contrast, if your ECS/Fargate instance fails you haven't mentioned any notifications in your list -- so if you forgot to configure and test that correctly, your ECS could legitimately be stuck on a version of your app code that is 3 years old and you might not know if you haven't inspected the correct part of amazon's arcane interface.

By the way, you're paying per use for all of this.

At the end of the day, I think modern Kubernetes is strictly simpler, cheaper, and better than ECS/Fargate out of the box and has the benefit of not needing to rely on 20 other AWS specific services that each have their own unique ways of failing and running a bill up if you forget to do "that one simple thing everyone who uses this niche service should know".

mrgaro · on Aug 9, 2024

ECS+Fargate does give you zero maintenance, both in theory and in practise. As someone, who runs k8s at home and manages two clusters at work, I still do recommend our teams to use ECS+Fargate+ALB if they satisfy their requirements for stateless apps and they all love it because it is literaly zero maintenance, unlike you just described what k8s requires.

Sure there are a lot of great feature with k8s which ECS cannot do, but when ECS does satisfy the requirements, it will require less maintenance, no matter what kind of k8s you compare it against to.

Bjartr · on Aug 9, 2024

Depending on use case specifics, Elastic Beanstalk can do that just fine.

felixgallo · on Aug 9, 2024

He named the services. Go read about them.

drawnwren · on Aug 9, 2024

I’m not sure which services you think were named that solve the problems I mentioned, but none were. You’re welcome to go read about them, I do this for a living.

angio · on Aug 9, 2024

I think you're just used to AWS services and don't see the complexity there. I tried running some stateful services on ECS once and it took me hours to have something _not_ working. In Kubernetes it takes me literally minutes to achieve the same task (+ automatic chart updates with renovatebot).

moduspol · on Aug 9, 2024

I'm not saying there's no complexity. It exists, and there are skills to be learned, but once you have the skills, it's not that hard.

Obviously that part's not different from Kubernetes, but here's the part that is: maintenance and upgrades are either completely out of my scope or absolutely minimal. On ECS, it might involve switching to a more recently built AMI every six months or so. AWS is famously good about not making backward incompatible changes to their APIs, so for the most part, things just keep working.

And don't forget you'll need a lot of those AWS skills to run Kubernetes on AWS, too. If you're lucky, you'll get simple use cases working without them. But once PVCs aren't getting mounted, or pods are stuck waiting because you ran out of ENI slots on the box, or requests are timing out somewhere between your ALB and your pods, you're going to be digging into the layer between AWS and Kubernetes to troubleshoot those things.

I run Kubernetes at home for my home lab, and it's not zero maintenance. It takes care and feeding, troubleshooting, and resolution to keep things working over the long term. And that's for my incredibly simple use cases (single node clusters with no shared virtualized network, no virtualized storage, no centralized logs or metrics). I've been in charge of much more involved ones at work and the complexity ceiling is almost unbounded. Running a distributed, scalable container orchestration platform is a lot more involved than piggy backing on ECS (or Lambda).

mountainriver · on Aug 9, 2024

I hear a lot of comments that sound like people who used K8s years ago and not since. The clouds have made K8s management stupid simple at this point, you can absolutely get up and running immediately with no worry of upgrades on a modern provider like GKE