Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is cool! I’m always excited by people trying simpler things, as a big fan of using Boring Technology.

But I have some bad news: you haven’t built a system without a database, you’ve just built your own database without transactions and weak durability properties.

> Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk.

This is actually not an easy thing to do. If your shutdowns are always clean SIGSTOPs, yes, you can reliably flush writes to disk. But if you get a SIGKILL at the wrong time, or don’t handle an io error correctly, you’re probably going to lose data. (Postgres’ 20-year fsync issue was one of these: https://archive.fosdem.org/2019/schedule/event/postgresql_fs...)

The open secret in database land is that for all we talk about transactional guarantees and durability, the reality is that those properties only start to show up in the very, very, _very_ long tail of edge cases, many of which are easily remedied by some combination of humans getting paged and end users developing workarounds (eg double entry bookkeeping). This is why MySQL’s default isolation level can lose writes: there are usually enough safeguards in any given system that it doesn’t matter.

A lot of what you’re describing as “database issues” problem don’t sound to me like DB issues, so much as latency issues caused by not colocating your service with your DB. By hand-rolling a DB implementation using Raft, you’ve also colocated storage with your service.

> Screenshotbot runs on their CI, so we get API requests 100s of times for every single commit and Pull Request.

I’m sorry, but I don’t think this was as persuasive as you meant it to be. This is the type of workload that, to be snarky about, I could run off my phone[0]

[0]: https://tailscale.com/blog/new-internet



> This is actually not an easy thing to do. If your shutdowns are always clean SIGSTOPs, yes, you can reliably flush writes to disk. But if you get a SIGKILL at the wrong time, or don’t handle an io error correctly, you’re probably going to lose data.

Thanks for the comment! This is handled correctly by Raft/Braft. With Raft, before a transaction is considered committed it must be committed by a majority of nodes. So if the transaction log gets corrupted, it will restore and get the latest transaction logs from the other node.

> I’m sorry, but I don’t think this was as persuasive as you meant it to be.

I wasn't trying to be persuasive about this. :) I was trying to drive home the point that you don't need a massively distributed system to make a useful startup. I think some founders go the opposite direction and try to build something that scales to a billion users before they even get their first user.


Wait, so you’re blocking on a Raft round-trip to make forward progress? That’s the correct decision wrt durability, but…

I’m now completely lost as to why you believe this was a good idea over using something like MySQL/Postgres/Aurora. As I see it, you’ve added complexity in three different dimensions (novel DB API, novel infra/maintenance, and novel oncall/incident response) with minimal gain in availability and no gain in performance. What am I missing?

(FWIW, I worked on Bigtable/Megastore/Spanner/Firestore in a previous job. I’m pretty familiar with what goes into consensus, although it’s been a few years since I’ve had to debug Paxos.)

> I was trying to drive home the point that you don't need a massively distributed system to make a useful startup. I think some founders go the opposite direction and try to build something that scales to a billion users before they even get their first user.

This reads to me as exactly the opposite: overengineering for a problem that you don’t have.

For exactly the reasons you describe, I would argue the burden of proof is on you to demonstrate why Redis, MySQL, Postgres, SQLite, and other comparable options are insufficient for your use case.

To offer you an example: let’s say your Big Customer decides “hey, let’s split our repo into N micro repos!” and they now want you to create N copies of their instance so they can split things up. As implemented, you’ll now need to implement a ton of custom logic for the necessary data transforms. With Postgres, there’s a really good chance you could do all of that by manipulating the backups with a few lines of SQL.


> As implemented, you’ll now need to implement a ton of custom logic for the necessary data transforms. With Postgres, there’s a really good chance you could do all of that by manipulating the backups with a few lines of SQL.

Isn’t writing «a few Lines of SQL» also custom logic? The difference is just the language.

It is also possible that the custom data store is more easily manipulated with other languages than SQL.

SQL really is great for manipulating data, but not all relational databases are easy to work with.


> Wait, so you’re blocking on a Raft round-trip to make forward progress? That’s the correct decision wrt durability, but…

Yeah. I hope it was clear in my post that the goal was developer productivity, not performance.

The round trip is only an issue on writes, and reads are super fast. At least in my app, this works out great. The writes also parallelize nicely with respect to the round trips, since the underlying Raft library just bundles multiple transactions together. Where it is a bottleneck is if you're writing multiple times sequentially on the same thread.

The solution there is you create a single named transaction that does the multiple writes. Then the only thing that needs to be replicated is that one transaction even though you might be writing multiple fields.

> it’s been a few years since I’ve had to debug Paxos

And this is why I wouldn't have recommended this with Paxos. Raft on the other hand is super easy for anyone to understand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: