Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m probably alone here and this goes against the HN consensus but, as great a piece of tech Postgres is, I’ve often found it to be a premature optimization to pick as the main DB for newer companies. If you don’t know what your data models are, you’re going to end up in worlds of pain constantly running schema migrations like the OP describes.


It's not Postgres, specifically, as much as any SQL or non-schemaless database, right?

And if we're saying that's a problem, then sounds like we're going back into the NoSQL debates from a decade ago.

Hopefully not.

I think it's better to understand your schema as much as possible, and have a sane process for applying changes when needed. Defining a schema forces you to think about what you're building.

OTOH, the idea that developers on a project are just going to throw whatever new attributes they need into a document as they go along is a recipe for bugs and inefficiency. Also, near-instant technical debt, as early work frequently survives longer than anticipated.

You also don't completely escape data changes without pain when using a NoSQL database. If for instance you change a string to an int you'd still need to figure out what to do with existing data, either via conversion or handling in-code.


> Defining a schema forces you to think about what you're building.

YES. Thank you. Sit down with pencil and paper, write down a table name, and start putting attributes into it. Then define a PK, and ask yourself if every attribute is directly related to the PK (a user named foo has an id of 1, and lives in country bar). Repeat. Then ask yourself how you’d join the tables. If you find that something _could_ be represented as a join, but isn’t, consider doing so.


Not quite any - in CockroachDB, all schema changes are non-blocking: https://www.cockroachlabs.com/docs/stable/online-schema-chan... . Yugabyte seems to be getting there with them also.

Still risks involved in migrations (mostly from the migration executing too quickly and creating high load in the cluster - the admission control system should have reduced this) and we have extra review steps for them, but it's been very useful to be able to migrate large tables without any extra application-level work.


I'd say Postgres is very often the right choice for newer company as it's well understood, easy to operate and you know you don't have to switch to a new DB because the vendor got acquired / merged / shutdown after 2 years or is going through growing pains and deprecations themselves.

If you give your schema a good thought (The one place where you shouldn't rush and take shortcuts at the beginning) and for example use jsonb columns and later move data out of it if you notice you need to query on it more performantly you can get very far.

The pain of data model migrations is also usually not that big if the company isn't very large and has a lot of legacy yet.


> give your schema a good thought

> use jsonb columns

These two statements are mutually exclusive in most cases. If you want JSON, don’t use a relational database.

IME, the “we’ll normalize this later” event never occurs.


Not really, it doesn't mean you put everything into a jsonb field. It could mean that for example if you have some user specific settings you just drop them in a jsonb on the user itself instead of building a schema with mapping tables, permissions etc. as you don't know yet which options you want to support. Once thing stabilize you can pull it out of there.

From my personal experience this works really well and is a nice balance between a strict schema, and still allowing some flexibility for experiments.


This will work, but you risk referential integrity violations, as well as performance problems at scale.

The main issue is what I said in “IME…” – tech debt builds, and people never want to go back and fix it. Just upsize the hardware, easy-peasy.

I would rather see a wide table with a bunch of bools indicating various options, personally. When that gets annoying or unperformant, split it out into logical groups. Color schemes can be their own table, email marketing preferences can be their own table, etc.


Which is precisely the caveat I mentioned at the beginning:

> and for example use jsonb columns and later move data out of it if you notice you need to query on it more performantly you can get very far.

If you put data inside where you want the database to enforce integrity...then it's the wrong place for the data. If you are getting problems on scaling, you are relying on data in jsonb columns for heavy queries which you should not. In that case it should've been moved out already.

As always it's about tradeoffs and being pragmatic. There's no 100% way of saying jsonb is always the wrong choise, or always the right choice. You still have to be smart about when to reach for it.

> I would rather see a wide table with a bunch of bools indicating various options, personally. When that gets annoying or unperformant, split it out into logical groups. Color schemes can be their own table, email marketing preferences can be their own table, etc.

The point is to exactly avoid this kind of overhead when you have zero paying customers, as that's premature optimization. Of course from a pure data model perspective it's nice, but from a business perspective you don't need that until it hurts and you have to split it out.


> The point is to exactly avoid this kind of overhead when you have zero paying customers, as that's premature optimization.

It should take a day at most to work out the data model for a startup. I don’t see that as a lot of overhead.

Do as you will, but from my perspective, I’d rather do it correctly from the start.


> premature optimization

I'm extremely curious to hear what you consider a better/simpler choice. At least postgres gives you the tools to do schema migrations, and if you're operating at a scale where such migrations become a problem (i.e. probably not for a while) you really ought to know what you're doing.


I'm confused by the use of premature optimization here. What exactly are you suggesting is the better default choice?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: