This makes no sense and is typically the opposite of what I'd build. The more business logic (various validation rules and processing) you have in the database (via stored procedures) the less problems you have, not least because all that processing becomes ACID.
The reducing load on the database claim is nonsense since the application would have to query the database anyway and less efficiently that the database could, avoiding numerous round trips between application and database.
If you implement business logic in front end applications, you're asking for trouble. Many places I've worked didn't allow software outside the database to do anything more than call a stored procedure.
Application servers did come into fashion at one point, with predictable results: subtle timing bugs, inconsistent processing with different results depending on which application server processed the data, a general difficulty in finding the source of bugs, security problems.
Everyone would be a lot less stressed if they did everything in the database.
That is the exact opposite of my experience. I try very hard to have the database as dumb as possible.
- Having mixed workloads on the database servers make it almost impossible to profile and tune. By definition, a database answers a lot of queries from a lot of clients. If these queries are hybrid data fetching / compute, then it's pretty much impossible to make sense of any system metrics.
- Having hybrid workloads usually lead to considering DB servers as black boxes (because you cannot make sense of their resource usage easily). This in turn leads to having hyper pumped up DB servers with both crazy RAM, crazy CPU, crazy caches, crazy SSDs and crazy HDDs.
- DB vendors of course understand the previous point, thus why Oracle switched to a "pay per core on the machine" model...
- The DB logic layer ecosystem is most of the time subpar compared to a more traditional programming language. If you depend on your DB server performing a lot of work, you will want to make these awesome queries accessible. That is, you will want stored procs, extensions, etc. Dealing with those is a huge pain to test, version, update...
- Overall all RDBMS have special SQL extensions all over the place. You're probably using tons of those without even knowing it. With time, vendor locking will become an immense burden and you'll be forced to pay millions to Oracle to keep your business running.
>The reducing load on the database claim is nonsense since the application would have to query the database anyway and less efficiently that the database could, avoiding numerous round trips between application and database.
Ah, but by shifting the logic to the app, I can now query multiple databases (of disparate types even! vector, document, object, relational, etc.), each optimized for its particular data and query types and combine the result. That's very....not easy in stored procedures. I can also scale out different tiers of the application based on its needs (heavy compute, gpu-based, etc.). How do you do that when everything is a stored proc on a database?
Well depends on the requirements, does it? I'm not suggesting you try to stick all your processing of all things inside a database, I'm saying put all critical processing of data stored in said database within it. Put all related data in the same database where is makes sense. Whatever data doesn't belong in that database can be merely referenced from it.
> I can now query multiple databases
This is another thing I disagree with. Wherever possible related data should be centralised to improve ACID properties and to create a reliable single "source of truth". I know this is unfashionable. with everyone claiming they need massive scale (which does cause some issues with this design), but much like microservices, it ends up being mostly busy work for programmers and related professions.
Having said that, multiple databases, if largely orthogonal to each other present no real problems, and the same design concept is applied to each.
FWIW I think I agree with you if you consider "critical processing of data" data validations (IME via constraints, checks, triggers, etc...). I think where things can really go wrong is when the DB accepts anything the application layer thinks is valid.
Hmm well if I understand you correctly that's what I was saying as well:
Things go wrong when the DB trusts the application layer (rather than doing its own validation) -- e.g. the DB should control what is valid data, not the application layer.
All of the PLSQL dialects are pretty much universally mediocre at best. They don’t (de-)compose well, they don’t version-control well, they don’t document well, they don’t debug well — they pretty much lack any of the learnings and tooling that occurred in application languages over the last 20 years. You’re basically dealing with all of the problems of a poorly designed DSL made by your favorite coworker.
You could do something more sane like running lang-extensions on the db so you have access to a “real” language to specify your business logic in, but then you haven’t really changed anything except to run your app logic on the DB server and doesn’t really have anything to do with the RDBMS at that point.
Moving business logic into the schema/queries can be fine, but you still have the issues that you can’t document it particularly well, can’t version control it well, can’t test it well, can’t trust your performance tuning well, etc. Which all derives from DBMS expecting to be treated as “living systems”, and has nothing to do with knowing/not knowing SQL.
And the problem with modifying living systems is that you really don’t want to be fucking around with it because anything could potentially send it reeling (see: surgeons). You really want it to be stable… so moving your unstable business logic out is almost a necessity
You're making my point here. RDBMSes are not any more difficult or mysterious than any other software except for the fact that you're more familiar with that software and language, thus your 'doesn't X well'.
What you will find though is any sufficiently complicated business application software implementing an ad hoc informally-specified bug-ridden slow implementation of half of what the RDBMS is giving you for free.
> except for the fact that you're more familiar with that software and language, thus your 'doesn't X well'.
That’s exactly not my point. My point is that, knowing SQL, knowing what RDBMS’s have and what they are, RDBMS’s are a bad place to stuff your business logic. You have a variety of options that don’t work well. Schemas/queries are the only relevant place where RDBMS’s excel, and because it’s a living system, you don’t really want something so unstable as business logic to permeate through your schema significantly.
It’s not a matter of familiarity. It’s the wrong tool for the job being discussed.
What's not a "living system"? You can update RDBMSes much more cleanly and reliably than almost all other software. Stored procedures (in a variety of languages) and the like do not necessitate changes to your data schema and the data that ends up in your schema belongs there, regardless of how its produced. That is a data design issue.
Being able to roll forward and back with code and data (including an updated schema) that is synchronised and works is a huge win. In most places where I've worked with serious RDBMS systems developers have no access whatsoever to production. The fact that this isn't the case universally illustrates the strength of these RDBMS approaches.
why aren't stored procedures on a DB not able to be version controlled well? People already version control database schemas and migrations, why would stored procedures not also be able to be version controlled just as well? You can test it by creating shadow databases (what Prisma does) to make sure all migrations can be done on an empty database just fine, add test data, call the procedure, and check the output. Genuinely please do enlighten me as to why this isn't possible or doable, I would love to learn more about this part of the field.
A lot of the stuff I see people writing application code for doesn't require any more complex SQL features than joins, aggregations and filters. i.e the stuff plain old SQL is really good at.
That’s just born of either psychosis or naivety; but even when one does know SQL, there’s more to consider. RDBMS’s are awkward systems to work with, and SQL knowledge isn’t the issue
moving business logic into schema/queries is usually fine, (and perhaps the only sane place for the DB to directly interact with business logic) though schema changes has all the issues described. Queries get to live with your application code, and so can be treated sanely, but you only really get to a good setup with a decent query builder library (ORM-haters smashing strings together like an ape is naturally full of issues, and ORM-lovers get all the fun of table/object model-mismatch and poor query generation). But then you run into the issue that every database library tries to be database-agnostic, and you lose access to all the fancy features your database provides
> but you only really get to a good setup with a decent query builder library (ORM-haters smashing strings together like an ape is naturally full of issues, and ORM-lovers get all the fun of table/object model-mismatch and poor query generation). But then you run into the issue that every database library tries to be database-agnostic, and you lose access to all the fancy features your database provides
I've found that smashing strings together works excellently so long as:
- You use something like JavaScript's template strings so that interpolated fragments are inline
- You have named placeholder support (so that you're not trying to count parameter placements which gets hugely complex for optional parameters).
- You mainly use it for SELECTS and have minimal query builder for INSERTs/UPDATEs/UPSERTs
And you get full access to all of the database's features.
If you’re maintaining a layer between construction and finalization (the query builder), then you’re not really in the string smashing group I was describing. The main thing I was trying to get at there is no one should make the mistake of “ORMs are shit, I can just write SQL strings directly” because that road is plagued with problems (because SQL is not a language that composes well..).
And it’s not just named parameters; with a sane query builder you don’t need to totally rewrite things when dealing with subqueries vs top-level, aggregations, etc.
As long as you don’t go so far as to introduce the horror of an ORM
> The main thing I was trying to get at there is no one should make the mistake of “ORMs are shit, I can just write SQL strings directly”
This has actually been the road I've been heading down on my personal projects. If I were to use this for work I would use something like Kysely so I can have type safety and make sure the application layer is updated to support every migration, but I've taken time to really practice SQL and write queries & migrations manually and avoid ORMs completely. I love writing SQL by hand, however I know that doesn't scale without some automated checking going on.
This is a common pattern with NoSQL databases, but really, the author of this article should just spend more than $10 a month on their Postgres instance.
Here is a scenario that I've encountered a number of times.
We needed to serve up some data for an API endpoint. Great! We'll write a SQL query for it. Just need a couple of joins and where-clause parameters -- it will be speedy, and easy to maintain.
Before long, we are asked to make our query more robust, including a couple of optional parameters. Some null-coalescing in our join expressions and where-clauses is introduced. Performance immediately suffers. (I've seen the same kind of impact when introducing unions or function calls within such expressions -- the query plan can get very messy when you try to scale in this manner.)
So what do we do? Restructuring our data (maybe with a materialized view or something) is one option.
A reasonable alternative is to break up the single query into a few more specialized queries, and then use application logic to inspect parameters and choose the best query to use for each request.
I don't think there's any trade-offs in that situation. It doesn't make sense to have your queries handle optional parameters when you have that info already precent in the application layer and can choose the specialized query. I suppose if you were really rushed you could end up in that starting scenario.
The last place I worked had some JSON columns and doing lots of JSON operations tended to eat up all the Postgres CPU (the servers were generally 16-32 cores)
We had that too. Partial expression indexes mitigated most of the performance problems and CPU load.
We then created consistently-named functions that made the appropriate JSON query. Then an expression index on the function. If the key was missing (NULL result), it wasn't added to the index, and we stripped all JSON nulls before storing.
It wasn't necessarily pretty, but neither was the client's source data, which we had little to no control over. Best of a bad situation.
Jsonb columns required more steps afforded us more flexibility and timeliness when the client kept asking for "one more feature" at the last minute.
well, no. This is because there's a cap (albeit very high these days) scaling vertically - but the bigger advantage is you can also variably scale the application. It's a much bigger pain to scale up and down your DB instance with traffic. application pods can just be shut down etc.
Also you can just wrap your queries in a session/transaction on the client, so no worries there. But yes, don't do this until you need to.
I’m not convinced by this. What type of logic are we talking about?