Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The API database architecture – Stop writing HTTP-GET endpoints (fabianzeindl.com)
93 points by fzeindl on May 10, 2024 | hide | past | favorite | 164 comments


In reality postgREST sucks... it's fine for simple apps, but for something bigger its pain in the butt.

* there is no column level security – e.g. I want to show payment_total to admin, but not to user (granted a feature for postgres likely). * with the above, you need to create separate views for each role or maintain complex functions that either render a column or return nothing. * then when you update views, you need to write sql... and you can't just write SQL, restart server and see it applied. You need to execute that SQL, meaning you probably need a migration file for prod system. * with each new migration it's very easy to loose context of what's going on and who changed what. * me and my colleague been making changes to the same view and we would override changes from each other, because our changes would get lost in migration history – again it's not one file which we can edit. * writing functions in PlSQL is plain ass, testing them is even harder.

I wish there would be some tool w/ DDL that you can use to define schemas, functions and views which would automatically sync changes to staging environment and then properly these changes on production.

Like when you can have a flask kind of app build in whatever metalanguage w/ ability to easily write tests, then and only then postgREST would be useful for large-scale systems.

For us, it's just easier to build factories that generate collection/item endpoints w/ a small config change.


>I wish there would be some tool w/ DDL that you can use to define schemas, functions and views which would automatically sync changes to staging environment and then properly these changes on production.

A long time ago, I joined a company where the "DB Migration Script" was an ever-growing set of migrations that took close to an hour to run. There were the same issues of lack of context, history, etc.

Since we were a Microsoft shop, I took the following approach to use Visual Studio DB projects and the SQL Server sqlpackage tool. Every table, stored procedure, schema, user, index, etc. was represented by a single file in the project, so it would have full git history and DB changes and the code that relied on them would be in the same commmit. (Data migrations still had to be stored separately in up/down migration files)

The "build" was to create the SQLPackage dacpac file from the DB project, and deploy was to apply the dacpac to the target database and then run data migrations (which were rare). Since the dacpac represented the desired end state of the database, it didn't require a DB to be in a specific state first, and it allowed the same deploy to run as part of CI, manual testing, staging deploy, and production deploys. It also generally took less than 5 seconds.


Why wouldn't the "script" be all of the necedssary commands to create the entire database?

If any migration was necessary to transform one table structure to another, that wouldn't be useful to keep around long term, nor interesting once the new table is established. It might be kept as a historical artifact, but why would you on average care beyond what the current schema is now, along with its documentation?


> Why wouldn't the "script" be all of the necedssary commands to create the entire database?

That pre-dated me, so I have no idea. It's also why I simply jettisoned the schema migration script altogether, since the dacpac covered both migrations and creating new databases.

Some other "fun" things in the database were "expansion columns" just in case new columns were needed. Many tables had "int_1", "int_2", "int_3" etc that were not used, but ready to be used should the need arise.


> If any migration was necessary to transform one table structure to another, that wouldn't be useful to keep around long term, nor interesting once the new table is established.

This bears repeating.

Migrations are transient. You run them once, you update your backups, and you get rid of them. That's it.

The only scenario where I ever needed to have multiple migration scripts at hand was when working on client-side databases, and we had to support scenarios where users running old versions of their clients were prompted to upgrade while skipping multiple releases. Nevertheless, for that scenario I also had in place a "delete and start from scratch" scenario.


> * with each new migration it's very easy to loose context of what's going on and who changed what.

> * me and my colleague been making changes to the same view and we would override changes from each other, because our changes would get lost in migration history – again it's not one file which we can edit.

> I wish there would be some tool w/ DDL that you can use to define schemas, functions and views which would automatically sync changes to staging environment and then properly these changes on production.

Declarative schema management solves this, and allows you to manage your database definitions (SQL CREATE statements) just like any other codebase: https://www.skeema.io/blog/2023/10/24/stored-proc-deployment...

With a declarative system, the tooling is responsible for diff'ing the CREATE statements between your current state and desired state, and then generating the appropriate DDL to transition between those states.

My tool Skeema is specific to MySQL/MariaDB, but I've linked some solutions for Postgres here: https://news.ycombinator.com/item?id=39236111


> there is no column level security – e.g. I want to show payment_total to admin, but not to user (granted a feature for postgres likely).

Its a feature postgres has.

  GRANT SELECT 
    ON public_table
    TO webuser;
  REVOKE SELECT 
    ON public_table (payment_total) 
    FROM webuser;
Not sure why you think it doesn't exist or doesn't work with PostgREST.


The Postgres documentation seems to disagree:

> A user may perform SELECT, INSERT, etc. on a column if they hold that privilege for either the specific column or its whole table. Granting the privilege at the table level and then revoking it for one column will not do what one might wish: the table-level grant is unaffected by a column-level operation.


That doesn't mean there isn't column-level security. It just means that `grant` and `revoke` alone are not the way to do it. The response to this criticism is the same as the response to many other criticisms among these comments: use views.

https://dba.stackexchange.com/a/239656/228983


You're right, its a feature I don't use, and I didn't read enough if the docs to get the right grants.


This is exactly why I've never used postgrest


This is also how permissions work in most other systems (eg with Unix permissions, chmod u-r matters only if there isn't a group or world read permission).


While the OP was wrong on that, the problem I found is that Postgrest threw only slightly helpful errors when someone would try to access a column that they didn't have permission to, rather than exclude that column from the result set and maybe throw a warning, which meant that you had to maintain multiple different versions of queries. This was a while ago, but coupled with my dislike of the query string syntax and some other stylistic choices I migrated to Hasura.


So you have different queries per access level/role? Like userProducts, managerProducts, supportProducts, adminProducts...?


Yep, if the products table had columns that were only accessible to some users, SELECT * would throw an error in those cases.


In my case I use Postgraphile. If you select a column that you can't access, you get an error - I don't see a problem there.

PostgREST doesn't support column selection?


It does, but if you `SELECT *` and you don't have access to all the columns you will get an error. You need to select columns that you have access to and if you select all columns but don't have access to all columns you get an error. I think it's the same in Postgraphile, no?


So either the web app has one policy as a whole, or you have to pass authentication to the database?


I don’t think passing auth to the database is crazy. It’s almost an exact parallel to filesystems, which everyone is fine with.

It’s far weirder to me is that the accepted status quo is to run transactions as a user that can read anything, and then try to have the app filter out stuff the user shouldn’t see.

Just make the transaction run as a user that only has permissions to read what you want, and stop worrying about whether your auth system handles edge cases.

This status quo is like having a filesystem where everything is owned by root, then writing a Fuse driver that lets users read those file by trying to infer permissions from the path. It’s a weird thing.

I’ve always assumed the reason databases had such poor access control (or little usage of the good access control) was because it was too slow in ye olden days when disks were slow, clock cycles were limited, and read replicas were not yet common.

In the modern world, I don’t know why anyone would prefer that their database can leak data in SQL injection attacks. The costs seem low, and the benefits seem high.


I'd love to hear an explanation of how a modern web or mobile app backend could reasonably be expected to create a new native DB role for every new end user and manage their DB access credentials securely, such that the overall risk of a SQL injection attack is lessened compared to the single-role-per-app model.


> reasonably be expected to create a new native DB role for every new end user and manage their DB access credentials securely, such that the overall risk of a SQL injection attack is lessened compared to the single-role-per-app model.

They don't have to be native DB roles. Row security policies extend far beyond an "owner".

A naive version would use SQL to set a local variable to the user's ID, with row level security policies on each row that check that local variable. This is still very vulnerable to SQL injection, though, because if attackers can execute arbitrary SQL then they can also set that variable.

A less naive version would involve setting a local variable to a JWT or other client-side secret, and having RLS validate access against a stored hash of that JWT or other secret. The app's DB account has no access to anything other than SELECTs on the JWT hash -> user ID mapping table, with RLS on all the other tables. JWT's would need to be generated by another service using it's own DB account that only has access to check passwords and write into the JWT table so that they can't use the compromised connection to just generate a new JWT they can hash themselves.

That login service can be written to be virtually immune to SQL attacks because it doesn't have to handle generalized queries. Ban all the SQL control characters in usernames/passwords and 400 any request that contains them. Hash and base64 both the username and password so they get converted to something where SQL injections are impossible because of the character set used. There's a bunch of options; this doesn't have to handle general-purpose SQL queries, so it can afford to do stuff that would break normal queries like b64'ing the inputs.

You end up with a system where you need a valid JWT to hit the API, but you also need to include a valid JWT in any SQL injection attacks for them to work. There's no point in SQL injection attacks; one of the prerequisites is "a valid authentication token for the user", at which point the attacker could just connect to the API as them.


> Ban all the SQL control characters in usernames/passwords and 400 any request that contains them. Hash and base64 both the username and password so they get converted to something where SQL injections are impossible because of the character set used.

The problem that this tries to solve has been solved by every SQL database for a long time. Bind-parameters in queries are your friend, never build a query using string concatenation.


Yeah, and I use them, but I still get paranoid. Maybe that's due to my lack of understanding of them, my mental model is that it still resembles string concatenation on the backend. Now that I type it out, that sounds wrong, so I probably need to take a look at that.


You definitely need to spend a little time with them, they are safe and don't require all the crazy workarounds you've detailed to solve a non-issue.


This feels like a custom solution, which will require some code in the database. It might work, and it might even be a good idea if the database is shared between many apps, but if the DB serves one app, I'd rather write code in a coding language than in (PL/)SQL.


> That login service can be written to be virtually immune to SQL attacks because it doesn't have to handle generalized queries. Ban all the SQL control characters in usernames/passwords

Why? If that login service is a third-party provider, what does this accomplish?


I tend to think of the database less as a filesystem and more as a block store. The filesystem layer in a real system is in fact implemented by the driver (and OS), and it might be implemented in terms of primitives that are persisted in the block device, but the block device doesn’t have a magical internal scripting language or query engine that the driver delegates to. The driver is in charge, and the persistence layer stores and loads what the driver tells it to.


> I'd love to hear an explanation of how a modern web or mobile app backend could reasonably be expected to create a new native DB role for every new end user and manage their DB access credentials securely.

Create all the individual user roles as NOLOGIN and the grant a single authenticator role used by the app rights to assume the user roles.

It's exactly as secure as the “single app account with authz managed by the app” model, but lets the DB deal with authz. In either case, authn is on the app side, except for authn of the app to its DB account.

But it also means that, since the app role itself has virtually no access and can only transition into a user role, you can't compromise multiple users worth of data at a time via an injection attack.


I responded to the other user, but you can get around the app role being able to transition to other roles by passing the session auth token to the DB via a local variable in the transaction/query.

Create an extra session table with a column that stores the hashes of session auth tokens (JWT, shared secret, whatever) and another column that stores the user ID that owns it. Give the app's DB account access to read from, but not write to, that table.

Then set up RLS on each row with a policy that reads that JWT/etc local variable, hashes it, looks up the hash in that table and compares the user ID to the "owner" of the row.

The app account is unable to create JWT's, so the attacker will need either a valid JWT or a hash collision (which means you have bigger problems anyways).

Then isolate the "login service" into it's own application with its own DB account that has access to check passwords and write sessions, and that's it. Make it super picky about what it will run; it only handles usernames and passwords, so it can probably just 4xx any requests that contain SQL characters or just base64 all of its inputs so the character set doesn't even include injection as a possibility.

It ends up with an app account that basically only has CONNECT permissions unless the attacker gives it a valid JWT, at which point they could just impersonate the user anyways.


You'd need another role to create the user roles and grant them to the app role. It should be possible, though you'd need to use a separately authenticated connection if not a separate app to ensure proper isolation.


I'm getting lost. How do the DB and app know about each other in this scenario?


Not OP, but I'm presuming they mean something like adding a "db_account_password" column to the users table to store the password for that user's database account (i.e. not their password, but the password the app uses to log in to the database as them), so the app can get the credentials to log in to a user's database account before running queries as them.

You'd configure it like a normal webapp with the host, port, username, etc. It wouldn't be something you could just add to the config for an existing app, it would take custom code.


No app I've ever deployed does passthrough authz to the filesystem, the app itself runs as in the context of a particular user and manipulates files as it's own principal which is basically the same as what happens with databases.


I have, fairly frequently in systems that are designed to be able to propagate auth like NTLM. The app itself runs as a user that has virtually no permissions, and attaches the auth context to file access/database queries/HTTP calls/etc.

Last I checked, it was also the suggested auth implementation for "platform as a service" stuff like Firebase/Supabase. Saving people from having to write their own authz is a huge selling point for those platforms.


+1 to this


What happens when I’m allowed to view a column for some entities but not others?

In the real world authorization logic can get very complicated.

And who wants to migrate this kind of stuff once you’ve implemented it one way?

Honestly it’s best to keep your database doing what it does best: storing data.


> What happens when I’m allowed to view a column for some entities but not others?

Column-level rules are a DB design smell, because if some people have a reason to view a table that is coherent without a column, that suggests that you've got at least two different fact shapes communicated by the table that should be normalized into separate tables.

Each of those tables then needs appropriate RLS rules. Sometimes you do a join (OUTER on one side) of these tables, so that they look like one, and this also solves the combined row/column security challenge that comes up when you have the less normalized table.


Views seem much more appropriate a tool for customizing/restricting column access but I'm no DBA.


I'm looking at a table with columns intentionally removed because they're none of my business, but with calculations I'd want to do on those columns provided in a separate table as anonymity allows. Column-level rules seem like they'd make generating this export fairly easy, no?


The first is an option, but the second isn't if you choose not to do the first, because PostgREST is designed to handle this:

https://postgrest.org/en/v12/references/auth.html#jwt-impers...


Admit, I am partially wrong.

I was working with supabase and in there you can't set a role for the user, it's always `authenticated`.

I was looking for something more like

``` create policy "No payment_total for user" on todos for select (payment_total, invoice_number, payment_hash) using ( -- Dynamic code example (select auth.uid()) = user_id AND get_user_role(auth.uid()) != 'admin' );

```


Ok, this is not provided in the UI but why don't you use something like this?

- https://github.com/point-source/supabase-tenant-rbac - https://github.com/vvalchev/supabase-multitenancy-rbac


There's this extension https://postgresql-anonymizer.readthedocs.io/en/stable/ which lets you do this, eg: `MASKED WITH NULL` - though I don't have much hands on experience with it / haven't looked into it's implementation details so not sure what trade-offs it might come with.

My general feeling is that it's an extension you'd apply to your human users, whilst exempting your machine users, but it feels like it could work for both in the context of something like postgrest


> there is no column level security – e.g. I want to show payment_total to admin, but not to user (granted a feature for postgres likely)

So don't put it in the same table.

> me and my colleague been making changes to the same view and we would override changes from each other

You don't use version control and a test suite?


> So don't put it in the same table.

Nope, this suggestion sucks. Meaning I have to come up w/ table structure that caters to front-end API. Better idea is to create schemas, e.g. `public_users` and create views inside.

> You don't use version control and a test suite?

that's what I'm saying... we have like a 1k migration files already, what I want instead is `my-function.sql` that always has the current code and will automatically generate necessary migrations and shenanigans, so I don't have to worry about it.


> Nope, this suggestion sucks. Meaning I have to come up w/ table structure that caters to front-end API.

Yes custom schemas are preferable, but I think you're missing the point. If the only real cost of not having to write the font-end manually at all is massaging the db schema a little bit to partition access by role, that's still a pretty great tradeoff for a lot of scenarios. The system is simple to understand/audit and fast to create, the only apparent downside is that it rubs our sense of aesthetics the wrong way.

> we have like a 1k migration files already, what I want instead is `my-function.sql` that always has the current code and will automatically generate necessary migrations and shenanigans, so I don't have to worry about it.

I use entity framework a lot which does this sort of schema migration based on class definitions. It works well like 80% of the time, but when it doesn't it's annoying.

In any case, what I was suggesting to address the specific program from your original post is that if you have lots of conflicting schema updates, then your table schema should be versioned and any migration should specify what schema version it depends on, eg. have a TableVersions table that names all of your tables and associates a version number with it. Your migration scripts then specify what version they were written against and fail if the number differs. It's the same basic optimistic concurrency control we use elsewhere in databases, just applied to schemas.


I understand what you mean, however, I still think this is a bad DX overall.

There are a lot of good suggestions in these threads about possible tools and approaches, but it just misses the point that these solutions are bad DX and require maintenance.

I think general idea of postgres is good, it's just not ready yet. There needs to be a framework that handles views/functions in postgres as if it's a web app and then applies changes as if it's a terraform. All while taking back-ups in case something goes wrong.

Otherwise it's just more maintenance and it's easier to build API's w/ something else.


> but it just misses the point that these solutions are bad DX and require maintenance.

Everything requires maintenance, that's not an argument that it requires more maintenance than anything else. There are fewer moving parts overall, and fewer moving parts generally means less maintenance.


> * there is no column level security – e.g. I want to show payment_total to admin, but not to user

I think what you want is dynamic data masking. RDMBSes like IBM DB2[1] and SQL Server[2] offer it out of the box.

For PostgreSQL, there's the postgresql_anonymizer[3] extension. For which you could do something like:

  SECURITY LABEL FOR user ON COLUMN invoice.payment_total
  IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';
Last time I tried it though, it didn't play well with transaction variables (`current_setting(my.username)`), so you could not combine it with RLS logic and application users[4]. I'll be exploring an alternative on https://github.com/steve-chavez/pg_masking.

To be fair, this discussion is more related to a PostgreSQL feature than PostgREST. PostgREST relies on PostgreSQL for all authorization logic by design.

[1]: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-...

[2]: https://learn.microsoft.com/en-us/sql/relational-databases/s...

[3]: https://postgresql-anonymizer.readthedocs.io/en/latest/decla...

[4]: https://www.2ndquadrant.com/en/blog/application-users-vs-row...


When I used PostgREST for a readonly b2b API I recall that the recommended way (from the PostgREST docs) to employ it was creating postgresql schemas for different use cases: you create a schema that has views for tables in another schema and PostgREST exposes only the views in that schema. The views defined in the schema can, naturally, control what columns are visible, or provide whatever other abstractions one might want.

I suspect a lot of people ignore this documentation and expose tables directly, and then wonder where/how to control such things.

Yes, my memory is correct: from the PostgREST "Schema Isolation" documentation:

    "A PostgREST instance exposes all the tables, views, and stored procedures of a single PostgreSQL schema (a namespace of database objects). 
    This means private data or implementation details can go inside different private schemas and be invisible to HTTP clients.

    It is recommended that you don’t expose tables on your API schema. Instead expose views and stored procedures which insulate the internal 
    details from the outside world. This allows you to change the internals of your schema and maintain backwards compatibility. It also keeps 
    your code easier to refactor, and provides a natural way to do API versioning."
This has all been embiggened since I last used it. Now you can configure multiple schemas to be exposed. The active user ROLE (as determined by auth) controls access to exposed schemas, and a header in HTTP requests can specify the desired schema.

Given all of this, it is entirely possible to achieve precise control over column visibility, version control, and surface whatever abstraction you can imagine. However, you are expected to fully inculcate PostgreSQL schemes, roles and other database affordances. This has always been the mentality of PostgREST: it's supposed to be a "function" that exposes some subset of a database without any magic beyond the database itself. Implicit in this is the need for adequate CI/CD tools to control database objects: you're doomed the minute you have >1 developer involved without such tools.


Thanks for your response.

You are right that care needs to be taken in the authentication layer.

And yes, developing SQL in a shared environment is a complex task and needs to be accompanied by proper tooling.


> In reality postgREST sucks... it's fine for simple apps, but for something bigger its pain in the butt.

In reality REST sucks. It should and only be used for files like WebDAV or S3.


I hear this sentiment echoed a fair amount without really saying why it sucks. And sometimes I wonder if the writer omitted it for brevity, or is actually just echoing something they read on the internet without really believing or understanding it themselves, because saying something sucks where clearly other people don’t know that it sucks (or else why would it be so popular) positions the writer in a dialogue as mysteriously knowledgeable. Hard to argue against points you don’t even know the other person is making and all.


> without really saying why it sucks

I think you've heard quite a lot. There is a reason why people invent GraphQL or BFF

In a nutshell, I think today's Web APIs can't be simply descriped as a bunch of "resources" or "states" (which Roy Fielding used in his PhD paper about REST). A database system like PG is much more complex. The SQL itself is an API and comes with a data transfer protocol. You get a leaky abstraction by mapping it to REST.


> There is a reason why people invent GraphQL or BFF

Sure, it was because creating servers--REST or otherwise--with code (looking at you, Spring, Django, and RoR) is such a slow, laborsome task that rapidly iterating frontend teams got tired of waiting around for ill-fitting backends to catch up, that they built their own abstraction layers: BFF and then GraphQL. That wouldn't be a problem if backend teams were as nimble. Things like PostgREST, Prisma, Hasura, etc. allow them to be that nimble. Ironically, that brings folks like Wundergraph full circle, where they relegate GraphQL to being a development language for defining REST endpoints. If you're going to use a query language like GraphQL to define your REST services, then you might as well use the OG query language SQL to define your REST services, if circumstances allow. That's precisely what PostgREST does.

So, yes, I have heard a lot arguments for why REST sucks. I just don't think they apply anymore (if they ever did).


so why not just pipeline sql n' everything inside h2? Or else you have to deal with shit like

> Essential for the architecture is the conceptual split between data retrieval via HTTP GET and data-modifying requests which use the HTTP-verbs POST, PUT, PATCH, DELETE

The whole argument from OP's article is about the god damn HTTP verbs, I totally blame REST on this.

Just use the CONNECT or PROXY verb for the sake of it and move on.


But surely that abstraction, leaky or otherwise, is there to prevent (hopefully) the security issues related to SQL?


Yeah, he omitted it for brevity alright & you would just have to deal with it. Some day you might, too, become Mysterio Mister Knowledgeable


what are you supposed to use instead, JSON-RPC?


"I wish there would be some tool w/ DDL that you can use to define schemas, functions and views which would automatically sync changes to staging environment and then properly these changes on production."

There is. Liquibase (or Flyway, or...) Add it to your build/CICD/deployment process.

I have used it even for TDD stored procedure automated integration testing + CICD deployment alongside .jar/(JVM-using Scala) regular code + SQL DML/DDL deployment for a serious production app.


We solved some of this by having a "src" folder with subfolders: "functions", "triggers" and "views". Then a update-src.sql script that drops all of those and recreates them from source files. This way we can track history with git and ensure a database has the latest version of them by running the script and tests (using pgtap and pg_prove).


Many of the migration tools I've worked with include the concept of scripts that are run every time migrations are applied, in addition to your standard migration files. So things like views, functions, SPs can go in these "every time" files rather than being duplicated in migrations every time they change.


The biggest issue with postgREST for me is that it doesn't support transactions well. You can't two 2 insert and keep a consistent state if one fails. That alone is a deal breaker.


You have explicit control over transactions with functions https://postgrest.org/en/latest/references/api/functions.htm....

I think this sentiment stems from users of postgrest-js[1], which is a JS library that gives an ORM feel to PostgREST requests. Under that abstraction, users don't realize they're using a REST API, instead of a direct postgres connection.

So in this case users are really asking for "client-side transactions"[2], which are not supported in PostgREST.

[1]: https://github.com/supabase/postgrest-js

[2]: https://github.com/PostgREST/postgrest/issues/286


Sure you can. Just write a function.


You posted a lot of comments in response to complaints or critiques of postgrest (almost every one at the time of my reading).

Most of them are very terse, rude/dismissive, and in my view fall on the wrong side of hackernews etiquette. eg, "Sure you can. Just X" is not a educational or persuasive construction and neither are several other comments you've made on this submission.

If nothing else, I'd encourage you to respond to the best version (steelman) of these critiques, which will improve the quality of discussion and be more persuasive.


I encourage you to read the HN guidelines and then read the comments that I've replied to, because many of them could with considerable justification be regarded as unkind, snarky, and smug.

I would also encourage you not to make and take things personally, and instead stick to the substance. Every one of my comments has been about the technological approach under consideration. If they're terse, it's because I don't want to waste people's time. If they're "not educational or persuasive" to you then you're free to disregard them, or say why exactly they fail. Plenty of the comments here, in my view, are "not educational or persuasive", so I said so and gave my reasons for having that view. I would encourage you to do the same.


Yeah, I fully agree. The tooling for putting that much logic into the database is just not great. I've been decently happy with Sqitch[0] for DB change management, but even with that you don't really get a good basis for testing some of the logic you could otherwise test in isolation in app code.

I've also tried to rely heavily on the database handling security and authorization, but as soon as you start to do somewhat non-trivial attribute-/relationship-based authorization (as you would find in many products nowadays), it really isn't fun anymore, and you spend a lot of the time you saved on manually building backend routes on trying to fit you authz model into those basic primitives (and avoiding performance bottlenecks). Especially compared to other modern authz solutions like OPA[1] or oso[2] it really doesn't stack up.

[0]: https://github.com/sqitchers/sqitch

[1]: https://www.openpolicyagent.org

[2]: https://www.osohq.com


REST itself is crap, at least when it comes to "APIs" serving things besides webpages. There's quite literally no obvious reason that an API actually should be RESTful besides that a lot of web developers believe that it's easier to understand. If REST is so great, then why do you keep finding yourself relying on documentation in order to use it?


I won't say REST is perfect, but I much prefer it to an unstructured api where anything goes. You didn't suggest that, but you really didn't suggest any alternative.

What's the alternative to relying on documentation? Is relying on documentation even a bad thing?


I didn't suggest an alternative because, while I have more specific opinions on that matter, almost any alternative a person pulls out of a hat would be superior to REST.

> I much prefer it to an unstructured api where anything goes.

You're entitled to your opinion, and while I'm sure you didn't mean it to be a straw man, it's essentially the type of straw man I hear a lot when I broach this subject.

Whether an API is "unstructured" doesn't depend that much on what said API is advertised to be acting like. Plenty of RESTful APIs in the wild don't completely adhere to REST or supplemental "standards" like JSON:API. My point about documentation is that, because using a REST API inevitably means reading documentation, and because assumptions about a REST API cannot always be made, then one might as well abandon REST and build an API that doesn't include the extras of REST that are rarely necessary. This doesn't imply unstructuredness. Most programmers don't like building things that don't have a useful amount of predictability to them, so to me the worry about structure is actually concern over very junior programmers doing very junior programmery things. I'm just not interested in that problem, and I don't think most programmers need to be.

So let's just say a programmer, or a team of programmers, implement an API that uses their own convention that they invented, and they provide extremely readable and thorough documentation. Where's the problem?

Documentation is a necessity. One of my arguments against REST is that it implies a high amount of intuitiveness that it can only even attempt to possess with extremely simplistic data. As soon as it makes sense to have a POST endpoint that acts more like an RPC than a REST API, that throws the entire decision to adhere to REST under question, and that sort of thing is not uncommon.


This doesn't seem like a good idea when it comes to consistently hardening an access layer. You mean I get to double the ops maintenance cost of my existing service by adding a new one? You mean I need to figure out how to set up RBAC, rate limiting, logging, and error handling in two places instead of just one?

By and large, opinions in API design has suggest against directly mirroring table structure for quite some time. The reasons are many, but they include things like migrating data sources, avoiding tight coupling with the database schema, and maintaining ultimate control over what the payload looks like at the API layer. Just in case you want to do something dynamic, or hydrate data from a secondary service. And if you still want to generate you response payloads like they came straight from the database, there are plenty of code generation or metaprogramming solutions that make providing access via an existing API layer quite simple.

This solution seems simpler only because it ignores the problems of most practical service-oriented API architectures. If end users need a direct line to the database, then get your RBAC nailed down and open the DB's port up to the end user.


> If end users need a direct line to the database, then get your RBAC nailed down and open the DB's port up to the end user

I do, when users are willing to speak libpq. When they insist on speaking HTTP, then I use PostgREST.


> If end users need a direct line to the database, then get your RBAC nailed down and open the DB's port up to the end user.

I'd really consider creating a new table just for this "database interface" That'll let you keep evolving your internal architecture while preserving backwards compatibility for the client. That won't work in all cases obviously, but I think it's suitable for most of the "sensible" ones.


I thought the prevailing advice was to expose views (materialized or not) rather than tables directly, but I could be wrong or outdated.

I still probably wouldn’t do it, though. My “lazy” solution is usually OpenAPI code generation so I basically only have to fill in the SQL queries.

Not having any code gives me the nagging feeling that very soon I will hit a problem that’s trivial to fix in code but impossible or very stupid to do via a SQL query.


> You mean I get to double the ops maintenance cost of my existing service by adding a new one? You mean I need to figure out how to set up RBAC, rate limiting, logging, and error handling in two places instead of just one?

PostgREST has a 10 line config file, so it's trivial to setup. Monitoring, rate limiting, logging and error handling could be done by central tools like API-gateways or on the reverse proxy.


I don't understand what's the benefit of adding additional third party code into the hot-path. Adding yet another endpoint is hardly a lot of work, sometimes even auto-generated and has the benefit of being available in your existing monitoring / instrumentation environment already.

Also what about caching if people can "craft" random queries and send them straight to your PG instance?


Agree. I've always thought that PostgREST is an interesting project for some niche use-cases and teams. However, his argument about replacing GET request handling with a new tool that lives outside of/alongside your existing application architecture is not a particularly compelling argument. With properly-factored application code adding a GET (list) or GET-by-id is fairly trivial.

The only complexity I've ever run into there is implementing a pagination scheme that is not the typical "OOTB framework" limit/offset mechanism. I still don't think this makes the argument much stronger.


> a new tool that lives outside of/alongside your existing application architecture

It need not live either outside of or alongside application code. Substituting application code with PostgREST is an option.

> With properly-factored application code adding a GET (list) or GET-by-id is fairly trivial.

If it's trivial then it sounds like needless busywork to me. I'd rather generate it or have ChatGPT write it than pay a developer to write it, if it's being forced on me. I'd rather dispense with it altogether if I'm allowed.


> I don't understand what's the benefit of adding additional third party code into the hot-path

PostgREST doesn't add an additional third-party library. It replaces one or more third-party libraries: Spring, Django, RoR, etc.

> Also what about caching if people can "craft" random queries and send them straight to your PG instance?

Put the SQL that you would've put into your controller code into SQL functions, expose only those through PostgREST, and call it a day.


> I don't understand what's the benefit of adding additional third party code into the hot-path. Adding yet another endpoint is hardly a lot of work, sometimes even auto-generated and has the benefit of being available in your existing monitoring / instrumentation environment already.

If your existing endpoints already support vertical and horizontal filtering, proper pagination, joining, limiting etc. then you won't get much benefit.

> Also what about caching if people can "craft" random queries and send them straight to your PG instance?

Queries can be blocked or rewritten on the reverse-proxy.


Haven't used postgREST but similar tools and these can useful for small/internal apps where you have trusted users


Coupling your API to your database schema is a bad idea. Once you have clients consuming that API you can no longer make changes to your database schema without also updating all of those clients and coordinating their deployments. Advice like this reads like it's coming from somebody that's never stayed in a role long enough to deal with the consequences of their actions.


> Coupling your API to your database schema is a bad idea. Once you have clients consuming that API you can no longer make changes to your database schema without also updating all of those clients and coordinating their deployments.

This is not correct. Whenever you need to change an API you need to upgrade your clients or you add a new version of the API. With PostgREST the versions of the API are served as views or functions that internally map to internal tables. It is absolutely possible to change the tables without changing the API.


You can work around this with SQL views. Design your externally facing API using views that expose a subset of your overall schema.

If you need to change that schema you can update the views to keep them working.


> you can no longer make changes to your database schema without also updating all of those clients

and that was a solved problem from 90s C/S architecture, it's called views.

https://www.postgresql.org/docs/current/tutorial-views.html

Today's B/S shit was a painful and slow reinvention of old things.


Well, the key thing here is writing SQL views to do the access. Once you’re willing to do that, it’s a fairly minor distinction between using PostgREST or writing a thin, possibly even generated, API layer. But that’s exactly what people are typically not willing to do.


100% agree. And once you are writing SQL views, why use PostgREST and not just use your own framework which abstracts away your choice of Postgres as backend database with something more database agnostic that can point to a view in any database technology? Isn't a main part of the value of an API layer to decouple you from a particular database implementation? If so, why use PostgREST?


> Isn't a main part of the value of an API layer to decouple you from a particular database implementation?

I think the value of an API layer is to provide an interface to interact with, which is a) at the right level of abstraction for the task at hand, and b) in a format that is easy to use.


> why use PostgREST and not just use your own framework which abstracts away your choice of Postgres as backend database with something more database agnostic

Because I'm sticking with PostgreSQL and don't need database agnosticism, and because I don't want to write my own framework when a better one already exists?


> Data retrieval generally does not require any custom business logic, while data-modifying requests do

This just doesn't seem to be the case in my experience, at least not in most cases. Perhaps for internal services which are completely locked down and you can freely just expose the data via a REST API, but for public facing services I just haven't found this to be the case in practice, except in extremely limited circumstances.


Yeah this is it. It's all fine until a product manager asks for analytics on this via their analytics tool of choice, and you have to say "sorry can't do that", or you have to build a complex data pipeline all because you can't do an HTTP POST to an external service.

Also database migrations are notoriously hard to get right, at scale of traffic, at scale of development pace, at scale of team, and often require a bunch of tooling. This pattern pushes even more into database transactions.

I'd rather take a boring ORM plus web framework, where you need a little boilerplate, but you get a stateless handler in Python/whatever to handle this. So much more flexibility for very little extra cost.


I don't see what the problem is with producing analytics.

As for data database migrations, if anything about them is "notoriously hard" it isn't changing views and procedures.


Analytics would need to be written to a table, then you'd need a batch job to come along and read them out and send them into the analytics system of choice in the company. Now you've got extra database load, write load, you can't do it on a read replica but it needs to be on the database primary, rolling back a transaction now has bigger implications than a read-only process, and you're hosting another job to post to the analytics system. I'm assuming external analytics here because almost every product team I've seen wants them.

As for database migrations, views and procedures are easier than data migrations, sure, but I've not seen anything around progressive rollout, canarying, etc. Do you do that separately on different replicas?

All of these things are kind of solved problems in basic web apps, but all of these would require a bunch of "odd" stuff to make it work hosting in a database. Is any one part bad? Not necessarily, but I generally like to minimise things I have to apologise to new starters for, and this would cause many of those sorts of conversations.


> then you'd need a batch job to come along and read them out and send them into the analytics system of choice in the

I wouldn't need that. I would just use the logs for analytics.

> I've not seen anything around progressive rollout, canarying, etc. Do you do that separately on different replicas?

No, I would put the new feature into a new view or procedure, and use database roles to manage progressive rollout, canary releases, and feature flags.

> I generally like to minimise things I have to apologise to new starters

You do "you" as is your right. But what I hear when I read this is not that there are technical obstacles to the approach under consideration, but rather that there are social obstacles. Using the database this way "isn't how you're supposed to do it." I grant that has been the prevailing view for about thirty years or so, but to me it's unexamined received wisdom that deserves scrutiny every now and then.


Well you do you too, but in an org shipping a product I couldn't see this sort of thing working.

> I grant that has been the prevailing view for about thirty years or so

This is true, and in fact it's something I have tried to push back on. I've tried this sort of thing. I've tried to push more things into the database. I've tried things like pushing more logic into the database, or pushing more into the frontend webserver, etc, and my conclusion is that it doesn't work.

> this is not that there are technical obstacles to the approach under consideration, but rather that there are social obstacles

The way I see it is that there are no technical obstacles if you assume zero cost of development. If we value our time at zero we can make all of this work, and in some cases in nicer ways. This is how I feel as a hobbyist software developer, but I learnt early on in my career as a software engineer that this is not viable at work. Optimising for development time, and optimising for product flexibility or requirements changes, are almost always worth doing (and yes I realise those are in tension with each other).

Moving more logic to the database like this, or eschewing a logic layer entirely in favour of configuration, may feel like an elegant solution, but I've been bitten by these sorts of things too many times. I'll take a little boilerplate, something a junior engineer will understand, and a bunch of flexibility for product direction, any day.


HN guidelines urge us not to cross-examine and I wouldn't want to do that anyway, so let me just say this.

If you want to share the details of how you were bitten by these sorts of things, I'll welcome it as your "lived experience" as they say. My own experience was as a system integrator from 1997 to 1999 for a client-server CRM product called Onyx, which had a VB front-end and a MSSQL back-end with all the logic in sprocs. I didn't write it, so I don't know what that was like, but I customized it, and it worked. I was a Java developer from 2000 to 2002 building first dot-com boom internet applications this way in a small team for admittedly small clients, and it worked. I was a Java developer from 2002 to 2005 building "intranet" applications this way in a small team for a big client (Wells Fargo) with Oracle sprocs, and it worked, though already by then the industry was 5 years into the fashion of "N-tier architectures", "application servers", and "ORMs" so this way was definitely unorthodox. When I came back from grad school this way was history, with Ruby on Rails, Django, and Spring plus Hibernate being the standard. I was again a Java developer at an enterprise back-office software company from 2010 to 2016 and we definitely did not do it this way. We also had 200+ engineers and I'm quite sure it would've been chaos if we'd tried.

The way I see it is that there are few technical obstacles that aren't dwarfed by social obstacles. No software architecture or development practice will succeed whatever its merits if people don't want to use them. In my experience, what engineers have wanted is to program in anything but SQL. Building database-backed web services (for example) in the orthodox way is inefficient but it works, and more important it feels like work. It demands large teams, and then rewards large teams with forward progress and a code base that makes sense, but only in the context of a large team. My hunch is that building applications in the database with large teams would fail, but also that large teams are unnecessary when building applications in the database.

But hey. Maybe I'm wrong. You tell me. How did the projects you built in this way stuggle?


Very interesting, I knew about PostRest, but never checked it out. A lot of the boilerplate can probably be cut away with using PostRest.

Just wondering, what about security and especially if someone decides to DDOS your server with high-load queries? Do you try to filter/block those with NGINX?


Speaking of postgrest, it looks like the article links to `www.postgrest.org` which has been "hijacked"? The correct url should be https://postgrest.org


Hijacked seems like a strong word for a 404 page. I was expecting a crypto scam or gift voucher directory.


Depends on who is consuming your API. Are they internal, maybe ok. Anything else, you likely need a transformation layer and also take into account Auth, rate limiting etc. PostgREST sounds like giving read only access to your DB and I doubt it would be as customizable.


> Anything else, you likely need a transformation layer and also take into account Auth, rate limiting etc

You can do flexible auth in PostgREST, but also on an api-gateway. Are you right now doing rate limiting in the backend?


SQL views are a transformation layer.


I have a mixed real-world experience with PostgREST's friend: Postgraphile (I/E like PostgREST but generate GraphQL instead of REST).

It's great, but some tools are a bit rough, and people hate you for forcing them to write 8 lines of SQL.

The great: authentication works (both internal and using an external service like Okta), most (if not all) standard operations work: get, post, put, patch, delete, filter, sort, select specific columns, autogenerated doc. You have better RBAC than most (if not all) web frameworks out there.

The good: proper transactions everywhere, especially if you use a database-backed queue. Use can use views to evolve your interface independent of your data. You keep data consistency at all times.

The bad: PostgreSQL permissions can be hard, especially the permissions of views that expose underlying data, and around different schemas. Triggers can get complicated. You need a good tool to evolve your data, and most likely a different tool to evolve your views/stored procs, because they change way more often and for different reasons.

The horrible: people. They see SQL and they hate it. They rush to replace 8 lines of SQL with 300 lines of ORM and call it an improvent because there is no longer SQL. "Remove SQL" becomes an objective. It doesn't matter if they replace it with plague-induced gonorreic syphillis, they think it's better. They see a trigger that inserts into a SQL-based queue, now they try to introduce a redis queue, backed by nothing (not even writing to disk), and three endpoints and functions to create and manage it. They claim it to be better, even now that the three functions are unrelated and harder to track.

So well, would I do it for my personal projects? Totally. Would I do it in an enterprise project? Nah, they don't deserve such an improvement. Let them have their millions of lines of code to replace thousands of SQL.


Measured by LOC, a lot of code in systems i've worked on is just copying data from one type of object to another. One frustrating bug I've dealt with was due to someone copying the wrong value between two similarly-named fields, but the request went through so many layers of the system before it was processed by the buggy code that it took hours to track down.

I've spent a lot of time thinking about how to write less of this code, and I think what I want is something similar to Postgrest, but with a mechanism for some sort of hook, where I can write some code to manipulate the request/data in a type-safe way.

The closest I've seen to this was early in my career - one of my first jobs was working at a WebObjects consultancy. Because WebObjects provided the full stack from HTML templating engine to ORM - and by that time also community-driven frontend libraries - you had to write very little of this type of code.

I suspect also that some of the resistance to the Postgrest-style approach in enterprise environments comes from tighter controls around data, and requirements/expectations for stricter change control around databases. Buggy code can always be rolled back, but a botched database change could be a much bigger problem. (Of course, the fact that buggy code could corrupt or delete data almost as easily is ignored in this calculation). I still remember weeks of meetings at one employer trying to get a column added with the ultimate answer being 'no'.


The title does not reveal that the suggestion applies only to Postgres. It is a bad idea to tie your architecture to a specific technology.


> The title does not reveal that the suggestion applies only to Postgres.

I don't think the OP intends the suggestion to apply only to Postgres:

> The name-giving PostgreSQL database is often already available and in systems where it is not, *it can be easily introduced with minimal effort.*

… i.e., don't have PG? Not a problem, just introduce it into your architecture! (/s, from me, but I think the OP is serious.)

> It is a bad idea to tie your architecture to a specific technology.

But I agree. These sort of "just keep wiring boxes together until it works" architectures cause so many headaches IME; so many pieces, each with its own failure modes and bugs. This is where I find the claim the OP makes,

> [this design, with PostgREST] is more flexible than one that is developed manually.

… dubious. No, the advantage of doing it in code I control is that I control t hat code, and if something needs to change, it can be. With wired-together-boxes-of-random-tooling like this, I have to pray my change fits into the whims of its config files.


> … i.e., don't have PG? Not a problem, just introduce it into your architecture! (/s, from me, but I think the OP is serious.)

I do think this is fairly simply for many companies.

> But I agree. These sort of "just keep wiring boxes together until it works" architectures cause so many headaches IME; so many pieces, each with its own failure modes and bugs. This is where I find the claim the OP makes,

I agree, but PostgREST is a very very simple box...

> With wired-together-boxes-of-random-tooling like this, I have to pray my change fits into the whims of its config files.

... that needs almost no configuration except the database connection string.


Just as you can write an HTTP server in many languages, you can write an HTTP server on many databases.

PostgREST, Oracle APEX, MySQL Reson.

PostgreSQL being the most popular among developers who choose this approach.


The principle appliesw outside of PostgREST and PostgreSQL: Hasura, Prisma, Chilicream, there are several tools that provide automatic web APIs for a variety of databases.


How do you not tie your architecture to a specific technology?


By wrapping an interface around your technologies so you can swap them out?


When is an app ever this simple? 95% of the time, you're transforming or combining data in the backend, between the client and the DB. Tables are never so pristine.


SQL is pretty good at transforming data.


I have seen and tried enough tools like this to know I should run in the other direction. For rapid prototyping this might be acceptable, as long as you plan to replace every endpoint before launch or maybe for a fully internal tool it would be ok.

Security is the problem and no, I don’t want to create a bunch of views to attempt to get it right. Different users have different sets of permissions that give them access to different parts of the data in different contexts. You will pull every one of your hairs out trying to make that work with something like this.

I’m not trying to be mean but I find things like this or even Firebase-style tools to be massive foot-guns. Sure, if you get the permissions/visibility perfect it might work for you (at this point in time, good luck as you modify it over time) but why take that risk? It’s not like CRUD endpoints are hard to write and I greatly prefer having full control over what I allow in and out of my system in code. That lets me keep all my auth/visibility rules in one place instead of spreading them out over multiple systems, which again is foot-gun.

I find “I want a tool that does everything for me”-type thinking and “look, it’s magic and it just works”-type tools to be something junior devs flock to (myself included years ago) before realizing they have given up all the control for something that’s really not that hard to do yourself. Same arguments for if you like key-value/document-based data stores because there is no schema. There are valid reasons to use both types of data store but if you reason is “this way I can easily change my schema” then I question your ability to write stable systems.


> I don’t want to create a bunch of views to attempt to get it right

Then don't.

> You will pull every one of your hairs out trying to make that work with something like this

No. I won't. I've already made this work, several times.

> It’s not like CRUD endpoints are hard to write

Then why write them? I prefer to automate that part, freeing me up to solve harder problems.

> That lets me keep all my auth/visibility rules in one place instead of spreading them out over multiple systems

My auth/visibility rules are all in one place, in one system: the database.

> I find “I want a tool that does everything for me”-type thinking and “look, it’s magic and it just works”-type tools to be something junior devs

I don't know who you're quoting. I'm not aware of anybody claiming that PostgREST does everything or is magic.


I'm not a big boy architect and I write small software. To me it seems like "modifying microservices" usually want to read from the database aswell and when I have all my DTOs in place and everything I might aswell implement a GET endpoint?

Is this just for performance considerations? When you have lots of reads and maybe few writes so you need an extra service to handle the load?

Can someone give a practical example?


The GET-endpoints by postgREST are very flexible and support limiting, pagination, ordering, selection of fields etc., which typical hand-written APIs do not support.


https://www.postgrest.org links me to a weird and broken WordPress news site.

https://postgrest.org on the other hand links me to the correct documentation site for PostgREST.

Anyone else having the same problem?


Yes, sorry about that. We're looking at it on https://github.com/PostgREST/postgrest/issues/3503.


Don't see any support for CTEs, window functions or joins. Admittedly joins should probably be handled as views, but window functions and CTEs can be very useful in some circumstances.

It does seem like for basic CRUD this should be all you need. But you will probably run into situations where it can't handle something, and then you have to write your own endpoints anyway.

Writing your own endpoints also allows more flexibility if yu want to do some post/pre-processing steps.


> Don't see any support for CTEs, window functions or joins. Admittedly joins should probably be handled as views, but window functions and CTEs can be very useful in some circumstances.

They can be used in a view that is then exposed. Parameterized functions for more complex queries can also be exposed.



Yet another solution that makes easy things easier, while making just about everything else significantly harder if not impossible. RBAC, monitoring, telemetry, merging data from multiple sources, migrating between data stores, database structure optimizations, caching... These are just of the top of my head of issues that this is likely to make harder, all to prevent writing some of the most simple code you can write.


perhaps its because ORMs are considered bad by some people, so to them writing these GET API endpoints is very tiresome, having to always construct these complex filter queries by hand. From that vantage point it would make sense to move toward runtime database introspection, but then they realize that the API data model is different from the data layer so they introduce table views, and then they realize their permission system doesn't quite fit into this so they introduce row-level security, ...

I don't know why they want to write SQL so much, its a 50 year old language, it sucks ass :P


It's like there's a massive army of developers solely focused on writing GET APIs even though we have the tools to completely automate this and more but they always find excuses to not use these tools - this thread is full of them. I don't get it, what they're really doing is just translating a to b over and over again. "Same but different". That's really the last thing I want to spend my time on.


> even though we have the tools to completely automate this

PostgREST is literally a tool to automate this.


Well yes but people seem very keen on not using it (or another tool that does the same or similar).


People are free not to use it, of course, but when they offer their reasoning I just don't think it holds up. For example, sometimes they say, "PostgREST doesn't save much because writing REST services is so trivial that it can even be automated." Again, my response to this is: if it's trivial, then why pay an expensive developer to do it? If it can be automated, then why do it it at all? If it can be automated, then automate it. Which...is precisely what PostgREST does.


The point I made at the beginning of this thread is that it’s a long term high cost and a potential dead end, that really didn’t save you much time. If you are in early stage and just need to move fast and find your product fit it could make sense. But again the problem it solves isn’t hard, and quickly makes things harder than they need to be.


Yes, that's what I understood your belief to be, right from the beginning of this thread. I just haven't encountered any reason anywhere among these comments to persuade me to share your belief.


PostgREST is not an ORM. I suspect you know that, but others might be confused so I wanted to point this out.


None of those things are made harder by using PostgREST.


Well, database introspection isn't new, we could've done that decades ago, but there are good reasons for the layer of abstraction.

Also if you go down this route, I think Postgraphile is a much better realization of that idea (uses GraphQL, not REST): https://www.graphile.org/postgraphile/


I see no particular reason to believe that Postgraphile, while a good tool, is any better (or any worse) than PostgREST, just as I see no reason to believe GraphQL is better than REST.

As for layers of abstraction, one can have layers of abstraction with PostgREST: views and procedures.


It's not better because it's gql, it's better because it's more feature complete (like RLS), offers more/easier customization, isn't written in Haskell, etc., but ofc your milage may vary.

> As for layers of abstraction, one can have layers of abstraction with PostgREST: views and procedures.

I can't even imagine a more closely coupled and leaky abstraction than views and procedures in a database


The database has RLS, and I don't care what PostgREST and Postgraphile are implemented in, so personally I'm unmoved by these two claims.

As for things we're unable to imagine, what I can't imagine is how table structure (for example) could possibly leak through a view or procedure. You're welcome to try to explain it, though you're not obliged to.


Thanks for your comment, I will look into postgraphile.


> I call this approach the "API database architecture"

Why people can't search for an existing term before creating a new one, it just add confusion into bucket which already contains "DB as API", "DB over API", "DB 2 API", "DB 2 REST", "DB low-code API"


How can you apply permissions logic that is not easily expressed in Postgres itself? That’s the main reason I want to build an API in a “proper” programming language.

I suppose I could build that in front of Postgrest but why add latency?


What permissions logic is not easily expressed in your database?


Here's an alternative I've built for myself https://github.com/vb-consulting/NpgsqlRest


The GET ones are actually easiest to write, like 30 seconds or something.


It also suggests your queries are not complex enough.

From my experience the read APIs are quite complex, the write APIs can be separated into different APIs.


If they're so easy to write, then why write them at all? Why aren't they being generated?


I am not convinced. In an endpoint I can do much more than just interrogating the database. And I usually do.

It might work for simple cases, but then why not also provide POST, PUT, DELETE, PATCH?


> in an endpoint I can do much more than just interrogating the database. And I usually do.

Like what?


In my case, it wpuld be to execute business logic, like complex calculations, validations, or workflows that go beyond simple database operations.

Another case is integrating external services through API calls, interact with other systems or services to fetch or send data.

Format and transform data to prepare data for presentation in JSON before sending it back to the client.


Thanks! I ask because I tend to put business logic into a few broad categories: input validation, data transformation, sequencing, integration. The database can usually handle the first three out of these four well. That tracks with your "complex calculations, validations, or workflows" as well as "format and transform to prepare data for presentation in JSON." PostgreSQL is as good as or better at these tasks than any general-purpose programming language, in my view.

Where it gets tricky is with "interact with other systems or services to fetch or send data." While I personally would be cautious about this for purely architectural reasons that go well beyond how it's implemented, "the customer is always right." If I were consulting for a customer who wanted to integrate with external systems, I do have a few tricks up my sleeve.

What I hope you'll take away from this is that "business logic" isn't the show-stopper that people tend to think it is when considering building applications in the database. Pretty much all of it can be done. Some kinds of business logic can be handled quite naturally. Other kinds perhaps less so, but it can be done.

People may still choose not to do so for various reasons, and that's fine, but there usually aren't technical barriers to putting business logic in the database.


Note: Exposing your database directly to the Internet / external access should always be considered a bad idea & and as a last resort.


I have noted your opinion and have disregarded it.


this is such a bad idea, ya lets just increase our infrastructure burden 5x to not write 3 get requests


Nothing about this imposes an infrastructure burden.


It's interesting reading this because I implemented a Node.js solution for this problem years ago but it fell on deaf ears. GraphQL was getting all the attention at the time.

https://github.com/socketcluster/ag-crud-rethink

I wrote it for RethinkDB but it could be adapted to any database as it doesn't rely on changefeeds.

I then ended up building a complete serverless solution around it: https://saasufy.com/

It borrows many concepts from REST but works over WebSockets. Why WebSockets? Two major reasons:

- It had to support real time subscriptions/updates so that the views could automatically update themselves when data changed (e.g. with concurrent users). I didn't want to force the developer to manage channels manually as it can be a major headache to get the subscription order right and to recover from disconnections without possibility of missing any update messages. Also, my library SocketCluster already supported client side pub/sub with clustering/sharding on the back end so I wanted to leverage that mechanism.

- WebSocket frames are tiny and don't have all the overhead of HTTP requests so it's possible to have field-level granularity which is important for avoiding resource update conflicts. This is something that the GraphQL developers also figured out at some point. But the challenge with true end-to-end field-level granularity is that loading a single resource would require a potentially large number of requests to be made; hence HTTP requests are not suitable for this (imagine HTTP headers containing cookies being sent for every single field of a resource), however, WebSocket frames are ideal for that as they have tiny headers. You can handle maybe 100 WebSocket frames for the same cost as a single HTTP request.

End-to-end, field-level granularity is powerful as it allows subscriptions to be set up automatically per-field and access control can be enforced automatically at both the resource and field level (for each of create, read, update, delete and subscribe operations). It's also very useful for caching because different views may display different fields of the same resources with some shared fields so caching field values allows different views to share cache.

The overarching philosophy behind end-to-end field-level granularity is that it allows the system to treat each field of a resource as an independent entity with its own subscription/synching mechanism, access control and cache.

SocketCluster was designed to facilitate extremely cheap pub/sub channel creation (both in terms of CPU and memory) and automatic cleanup so it seemed like a good use case to build on top of my existing work. The code of ag-crud-rethink is quite simple... Only about 1.5k lines of code and probably could have been a lot smaller without all the bells and whistles.

The view mechanism also supports real time updates. It can be thought of as a parameterized collection. You define a 'view' with one or more parameters to control the filtering and/or sorting (though you can construct essentially any query on the back end). The idea is that the parameters for the view are provided by the client. This means that you can represent any view of a collection as a simple string which can be used as a channel name; this is useful for efficiently delivering real time updates since we only want to deliver resource change notifications to views which include that resource. During writes, field names can be cheaply matched against view parameters to decide which instances of the views need to receive the notification. Only clients which are looking at an affected view right now will receive the notification.


Can you please add rss to your blog?


I will at some point.


The particular tool Fabien recommends may be "PostgREST", but the general approach is "API database architecture", which has been adopted by a number of other tools as well: PostGraphile, Prisma, and Hasura come to mind. There is a lot of criticism of this approach in the comments here, but they exhibit a lot of repetition, so let me consolidate them--along with my responses--in one place and be done with it.

You should never expose your database.

Let me stop you right there. Please don't tell me what I should do, especially if you don't know what my circumstances are. You'll get a less frosty response if you make your criticism impersonal and frame it in terms of trade-offs.

Fine. 'One' should never expose their database.

Unless it's accompanied by reasoning and evidence, I'm going to regard criticisms like these as received wisdom.

OK. One should never expose their database because of security concerns.

PostgREST (for instance) addresses these concerns through a combination of web tokens, database roles, database permissions, and row level security. Other tools (Hasura, PostGraphile) are similar. If this strategy is inadequate, well that's an interesting topic. Please elaborate. If there's valid criticism, perhaps there's something the PostgREST team can do about it.

Also, one should never expose their database because the API should be decoupled from the data model.

Again with the received wisdom.

Fine. The API should be decoupled from the data model because of good reasons, like freeing the underlying data model to change without breaking clients.

With this approach, the API can be decoupled from the data model by using schema, views, and procedures in SQL. If that's inadequate, again that's an interesting topic. Let's hear more.

If the database is going to be exposed, then just go all the way and open up the database port.

Remember the thing above about "circumstances"? Sometimes, the circumstances don't allow this. For example, sometimes circumstances demand an HTTP API. That's going to be difficult for many databases without a little help. That's where things like PostgREST comes in.

This approach doesn't save much effort because creating APIs in code is so trivial that it can even be automated.

If it's trivial, then why pay an expensive developer to do it? If it can be automated, well that's what exactly what PostgREST is doing: automating APIs.

This approach adds another layer

No, this approach replaces one kind of layer with a different kind of layer. It replaces (for example) Spring Boot with PostgREST. If Hibernate is also being used, then arguably PostgREST is replacing two layers.

ORMs are bad

I'm sympathetic to that point-of-view but regard it as a non-sequitur since PostgREST isn't an ORM. Neither is Prisma, for that matter, despite what their marketing material says.

REST is bad

Traditionally, APIs of any stripe were difficult to code by hand. Web APIs tended to be REST APIs for a long time. Ergo, REST APIs were difficult to code by hand, not because they were REST but just because that was the nature of things. Consequently, REST APIs got a bit of a bad rap. The ease of providing REST APIs with PostgREST, however, warrants revisiting that criticism "REST is bad."

But this leaves me no place to put my business logic

Well, some kinds of business logic (input validation, sequencing, and data transformations) can be handled in the database. Other kinds are more challenging (side-effects, for example). I'd love to hear the details.

Even if the general approach of "API database architecture" can be good, why use PostgREST instead of writing your own framework which is more database-agnostic

Two reasons: One, I don't want to write the code and the PostgREST developers are probably better than I am anyway. Two, I don't need the database agnosticism. People who do need support for another database might consider Prisma or Hasura. If that doesn't fit the bill then yes, in that case custom code probably is in order.

Building applications in the database, in SQL, feels awkward.

Many things feel awkward at first.

Building applications in the database just isn't how it's done at my organization.

Fair enough, but that's a social obstacle to this approach, not a technical obstacle.


Oh great, now you exposed your database schema to outside consumers. Guess you're not making any migration then :)


You can absolutely have versioned views that you expose. An URL like /api/v1/customers could be rewritten to /views/v1_customers.

That way you can change your schema internally.


I’m mostly with you here, but to be fair to the OP, he does suggest exposing database views which does give a certain amount of decoupling from the schema of your tables.


Speak for yourself. Personally, I've only exposed views and procedures.


Oh this is an ad


Yeah, let's put another server that can fail in the path of your product because more failure points are better.

But at least you're not providing your user a direct link to your database, which is almost always a bad idea.


It's not another server. It's a replacement server. It replaces bespoke code that a developer was going to write.

Hand built servers can fail too, you know.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: