Alas, we’ve found the first thing I _dont_ like about fly. They’ve sadly bought into the TSDB model that everything is a counter instead of the more modern model that everything is a histogram.
Google has long since abandoned the Borgmon data model for histograms with monarch. The closest non google implementation is probably circonus. Sadly neither is available as open source software.
I can’t really blame fly for not individually building an open source modern metric db. But it’s sort of sad that the infra team I’m most impressed with has to use metric systems from 15 years ago when the rest of their stack is so cutting edge.
So, most shops (at least the ones that have been existing for a few years) are just starting their move to Prometheus (or in the middle of it) and we already have a newer, better implementation to migrate to? I love the tech world! (I'm half ironic and half serious with my last sentence)
Well yeah because Prometheus was inspired by Google's Borgmon and soon there will be another product inspired by Google's Monarch.
Google's SRE stack is lightyears ahead of anything else out there simply because they're at the scale where they can afford to hire dedicated developers to just write internal ops software where most other SREs are understaffed and overworked just on operational projects.
Vicky is about the best case scenario for a counter based metrics db. I am routinely impressed by the things it accomplishes but monarch & irondb from circonus use histograms as their base data structure which means they avoid all the hacks that counter based tsdb have to deal with. The obvious one in the Prometheus stack being having to pre-declare your histogram bounds.
I recently started using Fly to host the anycast DNS name servers for my DNS hosting service SlickDNS (https://www.slickdns.com/).
There were some initial teething issues as Fly only recently added UDP support, but they were very responsive to the various bugs I reported and fixed them. My name servers have been running problem free for several weeks now.
The Fly UX via the flyctl command-line app is excellent, very Heroku-like.
For apps that need anycast the only real alternative to Fly that I found is AWS Global Accelerator, but it limits you to two anycast IP addresses, it's much more expensive than Fly and you're fighting the AWS API the whole time.
You can get a reasonable idea of the problems people run into on our community forum. I use Fly for everything but that may not be the best signal for you: https://community.fly.io/
I'm not a heavy cloud services user, so the prices aren't that important to me, but fly.io seems to be on par with other providers.
I think of Heroku as fly's most direct competitor. Heroku charges $50/mo for a dedicated VM with 1 GB RAM, whereas fly charges $31/mo for dedicated VM with 2 GB RAM.
> Telegraf, which is to metrics sort of what Logstash is to logs: a swiss-army knife tool that adapts arbitrary inputs to arbitrary output formats. We run Telegraf agents on our nodes to scrape local Prometheus sources, and Vicky scrapes Telegraf. Telegraf simplifies the networking for our metrics; it means Vicky (and our iptables rules) only need to know about one Prometheus endpoint per node.
Normally you just use a regular Prometheus server to do this. Why add another, different technology to the stack?
> We spent some time scaling it with Thanos, and Thanos was a lot, as far as ops hassle goes.
It really isn't -- assuming you're not trying to bend Prometheus into something it isn't. Prometheus works using a federated, pull-based architecture. It expects to be near the things it's monitoring, and expects you to build out a hierarchy of infrastructure, in layers, to handle larger scopes.
This is structurally different to what I'll call the "clustering" model of scale, where you have all your data sources pushing their data, aggregating maybe on the machine or datacenter level, but then shuttling everything to a single central place, which you scale vertically from the perspective of your users. This appears to be what you want to do, based on the prevalence of push-based tech in your stack.
Prometheus doesn't work this way. Some people really want it to work this way, and have even created entire product lines that make it look as if it works this way (Cortex, M3db) but it's fundamentally just not how it's designed to be used. If you try to make it work this way yourself, you'll certainly get frustrated.
> Normally you just use a regular Prometheus server to do this. Why add another, different technology to the stack?
Our physical hosts have hundreds of services exporting metrics. And many of those exported metrics are from untrusted sources. So we can both rewrite labels and decrease the scrape endpoint discoverability problem by aggregating them in one place.
> Not actually Prometheus-compatible, sloppy code, spotty docs. I have no idea why this dumb product continues to attract users.
Because it works incredibly well, it's easy to operate, and handles multi tenancy for us.
> Our physical hosts have hundreds of services exporting metrics. And many of those exported metrics are from untrusted sources. So we can both rewrite labels and decrease the scrape endpoint discoverability problem by aggregating them in one place.
OK, but Prometheus can do all of this just fine?
> Because it works incredibly well, it's easy to operate, and handles multi tenancy for us.
Again, Prometheus itself ticks all of these boxes, too, if you're not trying to force it to be something it's not.
We're not forcing Prometheus to be anything, since we're not using it. What Prometheus wants to be is not really a relevant constraint in our design space. A topologically simple, scalable, multi-tenant cluster that presents as just a giant bucket of metrics to our users is what we wanted, and we got it.
There's an interesting discussion to be had about how our infrastructure works; for example, in the abstract, I'd prefer a "pure" pull-based design too. But things appear and disappear on our network a lot, and remote write simplifies a lot of configuration for us, so I don't think it's going anywhere.
I think you're reading a critique of Prometheus that isn't really present in what we're writing. Prometheus is great! Everyone should use it! Our needs are weird, since we're handling metrics as a feature of a PAAS that we're building.
> I think you're reading a critique of Prometheus that isn't really present in what we're writing.
I'm observing that you've used pull-based, horizontally-scaled tools to build a push-based, vertically-scaled telemetry infrastructure. It can be made to work, sure, but the solution is an impedance mismatch to the problem.
I agree with you here. Using Prometheus, federated Prometheus, and Thanos on top of it for good measure, would probably get you better results without using a hodge-podge of non-Prometheus-compatible tools.
So, just so you understand where our heads are at: we want our users to light their apps up with lots of Prometheus metrics. Then we want them to be able to pop up a free-mode Grafana Cloud site, aim it at our API, add a dashboard, start typing a metric name and have it autocomplete to all possible metrics.
That pretty much works now?
I see the ideological purity case you two are making for "true Prometheus", but it is not at all clear to me how doing a purer version of Prometheus would make any of our users happier.
Well, with the requisite glue code that would inform each user's Prometheus instance how to scrape the service instances -- yes, more or less.
> is not at all clear to me how doing a purer version of Prometheus would make any of our users happier.
If the only things you care about when you build systems are "works" and "direct impact on customers" then there's not really a point to this conversation. The things I'm speaking about, the architectural soundness of a distributed system, are largely orthogonal to those metrics, at least to the first derivative.
Oh, sorry, I misunderstood your meaning when you wrote "That pretty much works now?" — I thought it was a question as to whether a more traditional Prom architecture could do it, but I see now you're just saying you already have this set up.
Right. But also: I'm not trying to be dismissive. We both know that we're looking at this through different lenses. I'm genuinely curious how your lens could inform mine; like, is there something I'm missing? Where, by deploying a much more conventional Prometheus architecture, I could somehow make our users happier? I don't see it, but I'm a dummy; if there's something for me to learn, I'm happy to learn it.
I'm pretty confident that the end result would be simpler in an architectural sense (i.e. fewer components), it would be easier to understand and maintain, and it would behave both more predictably and more reliably.
But these are subjective claims! Not everyone thinks the same way!
How big is the fly eng team at this point? You all seem to be doing a ton, I’m always kind of surprised these posts don’t end with the usual “we’re hiring” blurb that’s become the norm on these sorts of tech posts.
Hadn't heard of promxy before. In the past to reduce cardinality/deduplicate metrics, I've just ran another instance of Prometheus entirely in-memory and used rewrite rules.
Exposing a metrics endpoint for customers is nice. How do you manage the cardinality? I haven't used Victoria before, is it just better at high cardinality time series?
They mentionned that they decided against thanos for the storage of metrics, but would be curious to hear if other TSDB were considered. It is a hot space, I know about M3BD, Clickhouse, Timescale, influx, QuestDB, opentsdb, etc.
We kind of naturally got from Prometheus -> VictoriaMetrics. Vicky is very simple to operate, which was a win. I'm a huge fan of Timescale, though, and we have big Postgres plans that I hope include Timescale. :)
> When it comes to automated monitoring, there are two big philosophies, “checks” and “metrics”.
There's a third, "events". Just push an event out whenever something interesting happens, and let the monitoring tool decide whether to count, aggregate, histogram, alert, etc.
Events require less code in the app (no storage, no aggregation, no web server), and allow more flexibility in processing. I have used events to great effect. I am baffled as to why monitoring people still only talk about metrics.
> If you’re an Advent of Code kind of person and you haven’t already written a parser for this format, you’re probably starting to feel a twitch in your left eyelid. Go ahead and write the parser; it’ll take you 15 minutes and the world can’t have too many implementations of this exposition format. There's a lesson about virality among programmers buried in here somewhere.
Huh? Who gets excited about writing a parser?
What was wrong with "${key} ${value}" on separate lines?
Is "excited" the word? I don't know. I think "obsessively compelled" is more what I was going for. It's one of those formats where --- despite apparently being originally intended for human consumption --- you can immediately see the rule of construction for, like you're not just reading the data, but also the pseudocode for how it's formatted.
Well I suppose what's wrong with it is that these metrics are more complicated than key=>value. The metric referred to in your quote contains a metric name, multiple attributes (status, app ID etc) and a value.
I’m trying to understand the market they’re operating in. Big ol enterprises would probably want to run on AWS/GCP right? So would startups? What’s the long game? Genuine question.
It all starts with a developer at BigCo who is frustrated with the complexity of LegacyCo’s stack... so she build it on NewCo’s coolness. And then suddenly BigCo wants enterprise features from NewCo, and so it goes...
We looked at Cortex but running a Cassandra cluster was a little too much for us. I don't think we saw M3 until after we'd started using VictoriaMetrics.
Not to pick on Fly (seems nice), but on the trend for containers:
>if you’ve got a Docker container, it can be running on Fly in single-digit minutes.
I used to laugh at the old Plan 9 fortune, "... Forking an allegro process requires only seconds... -V. Kelly". Guess I'm not laughing anymore?
FWIW, performance of components is the barrier to composition in system design and development. You can't compose modules that take seconds to act, and still have something that is usable real-time.
I can see how someone would misunderstand that claim though: the idea of getting up and running on a new hosting provider that quickly seems so unlikely that thinking "well they must be talking about container launch times here instead" is an understandable mistake.
Container launch times are frequently in minutes if we include pulling the images from a repo, which we reasonably should.
In whichever case, that doesn't mean we can't compose them into a performant product as the OP suggests, it just means you run them as daemons so you can amortize the startup cost across many invocations. This isn't specific to containers--we do the same thing for web servers, databases, virtual machines, physical machines etc. Anything with a startup cost that you don't want to pay each time.
Why would you include pulling the container image as something that should be a part of the container launch time? It’s a one time event after a code push.
It's not a one time event, each host that runs a container image needs to pull it at least once. Minimizing pulls is a good optimization but you have to work pretty hard to really cut them down.
fwiw, I moved my prod env from a VPS (at linode) to fly.io twice in one week. Once to try it out and get a feel for what I was getting into and the second time for real. Took about 90 minutes each (plus some thinking time and doc reading). It's a small app without a ton of data to move nor a lot of traffic at this point, so take my experience with a grain of salt.
> Fly.io transforms container images into fleets of micro-VMs running around the world on our hardware.
Oh boy!
> None of us have ever worked for Google, let alone as SREs. So we’re going out on a limb
Oh.... boy.
> We spent some time scaling it with Thanos, and Thanos was a lot, as far as ops hassle goes.
You know, they have these companies now, that will collect your metrics for you, so that you don't have to deal with ops hassle.
> In each Firecracker instance, we run our custom init,
... in Rust. Yes, the thing that is normally a shell script, is now a compiled program in a new language, that mostly just runs mkdir(), mount() and ethtool(). (https://github.com/superfly/init-snapshot/blob/public/src/bi...). In a few years, when that component is passed off to a dedicated Ops team, and they find it hard to hire a sysadmin who also knows Rust, there will be some poor intern who learned Rust over the summer whose job is to rewrite that thing back into a shell script.
Now I like Fly even more. I used to be an SRE for AMZN and I 100% on board with replacing shell scripts and init with types configuration files (like Dhall for example) and a system that is written in Rust for parsing and understanding those files. I think this is the best part of SystemD. There are other projects like S6 for example. I need to understand more about Fly's implementation of init, I am very curious.
Not literally, no, as you need an init program that performs certain steps first which bash can't do. But once those few operations are done you can pass execution off to a script.
It looks like your init is running the gamut of typical linux init steps: handling signals, spawning processes, handling TTYs, running typical system start-up commands. Then it gathers host and network metrics, reports results, and operates as some kind of websocket server? All of that other stuff should live in a separate dedicated program that the init script runs. Unless I missed something and that's not possible?
We can't assume any particular environment inside a VM; we don't own the "distribution", or really anything other than init itself, including libraries. We could make something work, but it'd be more effort than what we do now, which is just some trivial straight-line Rust code, in an init that was going to be Rust anyways.
Yeah, so you'd use an initrd as "your" environment, run your programs, pivot_root (or the modern equivalent) and then pass execution to your guest's environment, which can have its own init.
I say it will eventually be rewritten as a shell script (minus the stats, which will be replaced by a monitoring agent) because people messing with embedded Linux often try writing their own compiled init to bundle with an initrd. It eventually becomes a hassle, so they either make their own feature-filled init replacement, or they go back to a shell script. Most go for the shell script.
I mean, it's a DSL for composing programs. It's too powerful for what you need to do, but all its other advantages (portability, flexibility, simplicity, tracing, environment passing, universal language, rapid development, cheaper support, yadda yadda) make it a win over time. The only reason I can see not to use a shell script is if speed is your highest priority.
I like shell scripts too. I once wrote a shell script that converted X.509 certificates into shell scripts that generated the same X.509 certificate, field by field. Shell scripts are great. Not the right tool here, but, great.
Google has long since abandoned the Borgmon data model for histograms with monarch. The closest non google implementation is probably circonus. Sadly neither is available as open source software.
I can’t really blame fly for not individually building an open source modern metric db. But it’s sort of sad that the infra team I’m most impressed with has to use metric systems from 15 years ago when the rest of their stack is so cutting edge.