Tips for a Healthier Postgres Database

andruby · on Jan 9, 2022

REINDEX concurrently made a huge difference for us.

We have a main database that started at version 9.6 and was upgraded along the way. The largest table is huge (billions of rows, TB’s of diskspace) and gets a lot of deletes and updates.

Vacuums could no longer finish on that table (we killed it after ~90 days).

Reindex (+ vacuum with skip-indexes) dropped our db load from 60 to 20 and fixed the autovacuums, which now take less than a day. The indexes on that table had accumulated a lot of bloat, and I think newer versions also improved the index disk layout.

We now have a monthly cron job to reindex all indexes.

moltar · on Jan 9, 2022

Have you tried pg_repack? We had some great success with it on large tables with lots of bloat

andruby · on Jan 9, 2022

It’ll probably shave 50% off that table’s disk space.

However, we’re actually in the process of sharding the db, and by copying customer by customer we’ll lose the bloat that way. The subsequent shards will be a lot more manageable so we can run pg_repack there with more confidence.

gcau · on Jan 9, 2022

How are you measuring your db load?

hnarn · on Jan 9, 2022

I just assumed they were talking about regular sysload

andruby · on Jan 9, 2022

Correct. Just unix load (ie concurrent busy processes)

tyingq · on Jan 9, 2022

"Set a statement timeout"

I think I would recommend setting log_min_duration_statement first and watching the logs for some time before doing that. So that you know what's going to get whacked, have some opportunity to tune it, etc.

Edit: It is mentioned, so perhaps just talking about that prior to talking about setting the timeout.

craigkerstiens · on Jan 9, 2022

Good catch, will definitely update or perhaps even re-order. I'm not sure either in isolation gives you everything you need, but point noted that some sense of what is getting cancelled is equally important.

uhoh-itsmaciek · on Jan 9, 2022

I think there's a case to be made that a timeout is more important. With an OLTP workload, it's not hard to imagine a runaway query running for days, consuming I/O bandwidth and silently slowing everything down. A timeout may break some things, but it will do so a lot more clearly. (Of course, both settings are a good idea.)

piaste · on Jan 9, 2022

How often would a runaway query be able to run for days without the connection being dropped? I would expect either the incoming client request or the background job that started the query to have their own timeouts, and to dispose of their connections/transactions when killed. If a request or job does last for hours, it's probably easier to notice compared to a query, due to generally better observability tools.

The only time I've seen a running query lasting for days in prod was when a human database toucher had forgot their DBeaver tabs open after running a complex query with too few filters.

uhoh-itsmaciek · on Jan 10, 2022

Due to some Postgres limitations, a query can run even after the connection drops (the backend process only checks if there's still a client to send results to after query completion). I think I saw some work being done on this in 14 or the upcoming 15 (i.e., to re-check periodically during query processing), but a lot of Postgres versions are still affected. I worked on the Heroku Data team and we regularly saw queries running for days. I think starting with an aggressive timeout to fail fast where you can't afford a slow query anyway, and then make exceptions where it makes sense, is easier to work with than tracking down problem cases and putting in limits after they cause issues.

eric4smith · on Jan 9, 2022

Weekly VACUUM ANALYZE on some of the busiest write tables made me go from sleepless nights to bliss.

And every 6 months we noticed certain queries getting slower and slower. That's because the amount of data growth means we have to have a different approach to indexing or the way how we access it. So our queries change a few times a year.

Otherwise, on a 650 Gigabyte database, there is remarkably little maintenance needed, except testing restores on daily backups and testing replicas.

forinti · on Jan 9, 2022

My bad luck is knowing Postgresql. It brings me great suffering when working with Oracle which is unnecessarily complicated and excessively buggy.

You don't need support to work with Postgresql. Everything is simple and works as documented.

jeltz · on Jan 9, 2022

The buggyness of Oracle surprised me. I expected complicated (but underestimated how complicated they make things) but what I did not expect was the bugs.

After having worked with PostgreSQL, MySQL and Oracle I cannot understand why anyone would pick Oracle. Its advantages can't be worth the hassle even if we ignore the license fee.

GordonS · on Jan 9, 2022

I've done the same, using pg_cron to execute `VACUUM ANALYZE` - works just fine.

nicoburns · on Jan 9, 2022

The biggest Postgres related lesson I learnt recently is the importance of running ANALYZE and having up to date table statistics. In our case it was the difference between queries taking several seconds to run and taking <100ms.

We get much higher traffic during the daytime, so we now have a cronjob that runs VACUUM ANALYZE each night.

We also have some small metadata tables that are very update heavy as we sync these from another data store, replacing all rows each time. We now run VACUUM FULL on these after each sync (this locks the table but is fast (20-40ms) on such small tables) to avoid bloating them over time.

saberience · on Jan 9, 2022

It seems to me that there's something major missing from Postgres if you need to manually run something daily or else the queries can go from 100ms to "several seconds."

If this is such a standard requirement and the statistics are so vital to performance, why isn't something built-in to the engine to keep these up-to-date without a user intervention?

I don't have to run "ANALYZE" on DynamoDB daily to ensure performance doesn't tank.

fabian2k · on Jan 9, 2022

This is usually done automatically in Postgres in the background by the Autovacuum process. For high volumes of traffic or certain usage patterns you might have to tune the Autovacuum settings, or it might not be able to keep up with the more conservative default settings.

One issue is that there is some counter-intuitive behaviour here, if you see the Autovacuum taking significant resources, the worst thing you can do is to let it run less often. You actually need to make it more aggressive in that kind of situation, and/or fix your usage pattern or add more resources.

If a manual ANALYZE is necessary, this can often indicate a misconfiguration of Postgres, e.g. someone reducing the Autovacuum frequency or disabling it entirely. Postgres also got a lot better at this, so it also matters how old your Postgres version is.

topicseed · on Jan 9, 2022

Agreed, I never faced these issues as our database is relatively small but it's weird to see so many manual commands used for maintenance when these could be built-in PG ongoing process.

Why isn't that so?

tpetry · on Jan 9, 2022

It is built-in. The autovacuum process is doing this. When you have a large database you may need to tune some config parameters, as the default conservative setting for low/mid databases are naturally different.

Running these manual commands is just shifting the config tuning for large databases to periodic „manual“ operations.

jdreaver · on Jan 9, 2022

+1 to both of these approaches. Maybe once per year we would have a query plan regress because of out of date column statistics. This happened on some pathological cases like tables that had old data evicted frequently. Once we committed to running VACUUM ANALYZE nightly on a cron, we never saw query plans regress out of of the blue again.

skunkworker · on Jan 9, 2022

ANALYZE is the first thing I run when looking at a problematic slow query, if you aren't aware of it, you could spend hours trying to figure out why the planner was doing A when it should be doing B.

jeltz · on Jan 9, 2022

Have you looked into tuning autovacuum? Autovacuum should handle it and even it does not you should look at why it failed to run often enough.

nicoburns · on Jan 9, 2022

We could. But seeing as we have pretty much no query volume overnight it seemed simpler just to use the hammer and vacuum everything overnight.

chkal · on Jan 9, 2022

> so we now have a cronjob that runs VACUUM ANALYZE each night.

Doesn't autovacuum typically handle this automatically?

uhoh-itsmaciek · on Jan 9, 2022

It should, but it may need to be tuned to run aggressively enough, which can be tricky. Sometimes it's easier to just schedule a manual job that runs during your off hours. There are some interesting discussions on the Postgres development list on improving VACUUM performance [1] and other improvements to the overall MVCC mechanism to avoid the need for VACUUM in the first place if possible [2]. Postgres, like any complex system, does have pathological cases. It helps to keep an eye on performance and learn the system as your database grows. But it's good to see the community is thinking about how to improve the situation.

[1]: https://www.postgresql.org/message-id/flat/CA%2BTgmoZgapzekb... [2]: https://www.postgresql.org/message-id/flat/CAH2-Wz%3DsSvMX5H...

jdreaver · on Jan 9, 2022

> If you see idle is above 20 it's recommended to explore using PgBouncer.

Exploring pgbouncer when you have lots of idle connections is a great tip, but 20 idle connections feels _extremely_ low to me. I've seen postgres databases on AWS Aurora serving over 13,000 transactions per second with hundreds of idle connections (because of client side pooling with a few dozen backend clients) just fine. In fact, around that scale is when we switched _from_ pgbouncer to client side pooling to simplify our architecture, and we noticed no degradation in any major metrics.

mulmboy · on Jan 9, 2022

Isn't AWS Aurora PostgreSQL a completely different (but compatible) product to PostgreSQL? If so it would make sense that its ability to handle many connections is just an implementation detail

tpetry · on Jan 9, 2022

Aurora forked PostgreSQL very early on (before 9.6) and has been improved independent and a lot has been changed. They are effectively different databases now.

eis_nhan · on Jan 9, 2022

That's not true at all.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

AWS has "only" replaced the storage layer.

tpetry · on Jan 9, 2022

They changed the storage layer and the query planner in some parts to work correctly with their new storage layer. So it‘s a fork and not behaving exactly identical. I have seen some different execution plans for the same model and data because of the implications pf the changed storage layer and query optimizer.

mulmboy · on Jan 9, 2022

Do you know of any credible source for Amazon forking postgresql? I did some googling and can certainly find lots of people saying it, but nothing authoritative from Amazon themselves.

tpetry · on Jan 9, 2022

AWS is basically never sharing any details, only when it‘s benefical in terms of marketing. The changes they made had not been possible with an extension in the past, and they are most probably still not fully possible without forking. But thats not problem, they are free to fork and merge upstream changes.

For MySQL any of the big users like facebook & co. Are running (heavy) forked mysql versions with changes they need.

parhamn · on Jan 9, 2022

Doesnt Aurora effectively have a PG bouncer in front?

jdreaver · on Jan 9, 2022

Nope. There is an AWS blog post [0] from September 2021 about setting up pgbouncer in front of Aurora Postgres, and that blog post references the AWS RDS Proxy service [1], but Aurora doesn't have pgbouncer in front by default. For what it's worth, we also handled hundreds of idle connections just fine on vanilla RDS postgres.

[0] https://aws.amazon.com/blogs/database/set-up-highly-availabl...

[1] https://aws.amazon.com/rds/proxy/

cloudyglider · on Jan 9, 2022

> On versions prior to Postgres 14, connections consumed extra overhead leaving idle connections as wasted space.

Does this mean PGbouncer is unnecessary for postgres > v14?

osigurdson · on Jan 9, 2022

It was improved in 14 but may not eliminate the need for pgbouncer depending on the use case.

https://pganalyze.com/blog/postgres-14-performance-monitorin...

andruby · on Jan 9, 2022

There is still value in constraining the maximum number of concurrent queries. That’s PgBouncers main task.

jeltz · on Jan 9, 2022

It is less necessary but it could still be worth running it since there is still a bit if overhead plus PgBouncer can be used to when doing failovers.

czhu12 · on Jan 9, 2022

If you are using rails, pghero is a great gem to get better visibility on your Postgres performance

https://github.com/ankane/pghero

zmmmmm · on Jan 9, 2022

I would have said, check your shared buffers memory and concurrency are configured right for the environment you are running on. The defaults are wrong for most production environments and the right values are highly server dependent. If you haven't specifically tuned them for your production server, chances are they are wrong and you are leaving substantial memory and / or concurrency on the table.

nicoburns · on Jan 9, 2022

How would one go about determining the right values for these settings?

zmmmmm · on Jan 9, 2022

Looks like crunchdata has blog posts for that too!

https://blog.crunchydata.com/blog/optimize-postgresql-server...

GordonS · on Jan 9, 2022

I don't have links right now, but if you search for something like "online postgres config tool", you should find two different pages that do the same thing - you enter some data about your environment, and it spits out recommended settings.

minaguib · on Jan 9, 2022

+1

Ran into a pathological case where a query that usually took 10-20ms would sometimes, semi-randomly, take > 15 minutes (while the corresponding PG process server-side is running at 100% CPU)

Explain showed a dramatic query plan change, which led us towards bad table statistics.

tharkun__ · on Jan 9, 2022

    There seems to be a common lifecycle of indexes within applications. First you start off with almost none, maybe a few on primary keys

Not to be rude or anything but I hope he doesn't mean this.

Every single postgres primary key (and unique constraint) automatically gets an index. That's how the unique constraint is implemented. Primary keys being naturally unique.

whakim · on Jan 9, 2022

I read this as "maybe a few (which are the ones automatically created on primary keys)", as opposed to redundant indices created on primary keys.

tharkun__ · on Jan 9, 2022

I would agree IFF there wasn't the 'maybe'. If that word wasn't there I would read it as having not many tables in the beginning and thus not many primary keys and thus not many primary key indices.

'maybe' changes that dramatically unfortunately.

JohnBooty · on Jan 9, 2022

I don't disagree, but Mr. Kersteins has quite the track record of Postgres excellence, so he gets every benefit of the doubt from me. I feel quite certain he understands how primary keys work! I'm sure this was just a miswording.

tharkun__ · on Jan 9, 2022

Fair enough. I don't know him though (never heard the name) and after skimming through the first paragraphs of each item and reading this I closed the window instantly as for me it invalidated the information where I had no in depth knowledge myself. As in 'how can I trust any of the rest if something so fundamental is off'.

Very unfortunate if what you say is true. I guess I'll give him the benefit of the doubt then and go read the rest.

whakim · on Jan 9, 2022

I don't know the author, but when I read an article by someone who obviously knows something about Postgres (knowing something about Postgres myself), I feel it's much more likely that they made a slight language error.

tharkun__ · on Jan 9, 2022

I would tend to agree with that. The problem becomes how to come to the conclusion that he "obviously knows something about Postgres".

My skimming of the first few paragraphs was trying to do just that and my conclusion seems to have been the opposite of everyone else :)

buro9 · on Jan 9, 2022

New server and want a basic config for a production server? (I'm assuming brand new app here, no prior monitoring and knowledge on what to tune.)

Use this to get the values you need https://pgtune.leopard.in.ua/#/ .

This saved me a lot of headaches, and it just gets the server into a good enough state from which you can observe and optimise later.

I'd also add in monitoring early, add a Prometheus exporter https://github.com/prometheus-community/postgres_exporter and alerts https://awesome-prometheus-alerts.grep.to/rules#postgresql . There are a few Grafana dashboards available for the prometheus exporter, start with those.

andruby · on Jan 9, 2022

Some schema changing statements drop the statistics.

We learned this the hard way. We altered some integer columns to bigint. That cleared the statistics for those column and caused terrible query plans. An ANALYZE fixes this, but it took us a few days to notice.

tpetry · on Jan 9, 2022

Is there any documentation which statements are doing this. Had watched this in the past too, but can‘t remember which one was responsible for it and whether it changed in the last versions.

andruby · on Jan 9, 2022

I haven’t found any. In our case, we did an alter table .. alter column, which I assumed would he fine. In retrospect, it does make sense that PG recreates that column from scratch and thus it doesn’t keep the statistics.

Since then we typically include an analyze statement whenever we do a large change, or rewrite a lot of rows.

RobertRoberts · on Jan 9, 2022

I would love to drop MySQL/MariaDB and go with PostgreSQL, but information like this makes me nervous that I would setup a footgun and be constantly debugging things because I am not a DB admin expert.

Anyone care to comment on how PostgreSQL works out of the box without being an expert?

(I've run MySQL/MariaDB for almost 20 years and there are very few issues I've been surprised by)

dano · on Jan 9, 2022

I converted a large MySQL DB to PostgreSQL in 2004 and never looked back. Think about how many improvements have occurred in Pg releases since that time and the availability of information. It took my team about six months to do the conversion as there were any number of type conversion problems due to programming assumptions and loose typing in MySQL at that time. There's also an excellent support community on IRC at Freenode (you can google it) and they are very helpful to people of any level of experience.

Optimizations as presented by many are not needed right up front and you'll learn them as you go along. Good luck!

33degrees · on Jan 9, 2022

I’m definitely not an expert and I haven’t found running postgres any more difficult than mysql. There are some differences that take a bit of getting used to, though.

rrickardt · on Jan 9, 2022

There is reindexdb and vacuumdb in bin directory of postgresql instance. They cover 99% of maintenance use-cases when run from cron, mainly those mentioned in comments. For general purpose workload i find default settings reasonable.

RobertRoberts · on Jan 9, 2022

How would I know when to run these? Or if I need them?

I know I can do research, get training, etc... but I have never done maintenance like this with MySQL/MariaDB, so it's the unknown unknowns that worry me.

status_quo69 · on Jan 9, 2022

Like a lot of other complex systems, there's a lot of tweaking you can do in the config, but generally postgres works out of the box pretty well. There's a few gotchas on setup (postgres has a very conservative default configuration for things like memory usage and parallelism), but it's still pretty good.

I'd say the biggest footgun is really not knowing about the process per connection limitation, which is why the article mentions pg bouncer. Everything else in the article is pretty geared towards setting your db up for monitoring so you can head off issues.

yjftsjthsd-h · on Jan 9, 2022

> I would love to drop MySQL/MariaDB and go with PostgreSQL, but information like this makes me nervous that I would setup a footgun and be constantly debugging things because I am not a DB admin expert.

Has MySQL gotten... a lot better, or are you just used to its quirks? I feel like the UTF encoding and ... date format? timezones? are/were all really big footguns.

RobertRoberts · on Jan 9, 2022

The encoding stinks, but it seems "fixed". (utf8mb4) I just migrated a few hundred databases between servers with the old dbs not having proper/modern encoding, and it was a bit messy to clean up.

Date format... not sure, I just use datetime now, timezones I manage in app (never trusted db engine for that...) Maybe I avoided those by accident with unix time until I moved to datetime.

jaytaylor · on Jan 9, 2022

I kind of wish the title was "Healthier and Happier".

Despite this calamitous oversight (wink), articles such as this are a great source to be able to draw on others' experience and reap the benefits of others hindsight without experiencing the outages or time consuming problems yourself.

qwesda · on Jan 9, 2022

I would have opted for "Fitter, happier, more productive"

craigkerstiens · on Jan 9, 2022

:) Love this.

brightball · on Jan 9, 2022

All good tips! I never understood why pgbouncer wasn’t just included by default.

Additionally, the linked article about checking for unused indexes is really helpful IMO.

https://blog.crunchydata.com/blog/cleaning-up-your-postgres-...

saurik · on Jan 9, 2022

It would be awkward if pgbouncer were "included" in postgres as the topology of bouncers you have might involve multiple layers running on different computers to minimize things like connection setup latency.

brightball · on Jan 9, 2022

That makes sense. Never had to get into a layered bouncer setup, but I can imagine how that would happen.

osigurdson · on Jan 9, 2022

Note that LISTEN/NOTIFY only works with pgbouncer session pooling.

rc_mob · on Jan 9, 2022

Does pg_stat_statements slow down my database?

sgt · on Jan 9, 2022

If it does, it's miniscule. I have been running it for years on fairly busy databases without any issues. The value provided is huge.

jeltz · on Jan 9, 2022

Yes, but of I recall correctly not by much. On most workloads it should not be noticeable at all.

12907835202 · on Jan 9, 2022

As someone who's used MySQL for 17 years and dabbled a little in Mongo is there any reason to try and switch to Postgres?

Obviously I love learning new things, but I've just never felt inclined to try it out, whereas I'm always jumping between other languages and frameworks within them. Not sure why that is.

brightball · on Jan 9, 2022

Honestly, after using PG for a couple of years it’s more than a little painful to go back.

You don’t realize how limited you are until you really get into all the additional capabilities that PG brings…and then have them unavailable.

fiddlerwoaroof · on Jan 9, 2022

This former MySQL dev says:

> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.

https://web.archive.org/web/20220104181634/https://blog.sess...

ezekiel68 · on Jan 9, 2022

I always get flamed when I mention this (I don't mind it, btw) but...

I find it pretty interesting that for the DB write-intensive portion[0] of the TechEmpower web framework benchmarks, you must go further than entry 150 (sorted by most requests-per-second) to find MySQL used, compared to PostgreSQL or Mongo. For the DB read-intensive portion[1], only 15 of the top 100 use MySQL (the rest in that bracket use PostgreSQL), and the first entry there comes in at number 44. Important: if you examine the entries, you'll find that some of the frameworks have multiple listings for different configurations, including the use of MySQL vs PostgreSQL (in other words: same framework, but only the DB is different, in the same test).

[0] https://www.techempower.com/benchmarks/#section=data-r20&hw=...

[1] https://www.techempower.com/benchmarks/#section=data-r20&hw=...

zedfoxus · on Jan 9, 2022

IMO there has to be a solid business case to switch a database layer. If MySQL serves your business well, I’d not switch. I like MySQL and PostgreSQL. I’ve used both. I felt the decision to go one way or the other depended more on how the company could maintain it after all decision makers were no longer there. Where MySQL was chosen, getting tooling and support from Oracle or Percona was seen as the biggest benefit. Where PostgreSQL was chosen, quality optimizer and open source was seen as the biggest benefit.

I’d highly recommend to try PostgreSQL yourself and learn it’s config, permissions, replication, administration and querying capabilities. You’ll appreciate PostgreSQL’s casting with :: for example. If you had to start from scratch, you’ll be better informed about PostgreSQL and can include it in your selection process.

When we used MySQL, we loved the ease of replication and tooling such as Monyog/Webyog/Workbench. When we used PostgreSQL, we loved the query optimizer and JSON functions.

stepbeek · on Jan 9, 2022

Killer feature for me was transactional schema changes. We use flyway for db migrations which works fine but a failed migration is left in a half completed state in MySQL - Postgres can execute the migration in a transaction that rolls back on failure.

forinti · on Jan 9, 2022

Newer versions of MySQL are better, but it wasn't long ago that it didn't even check column types or constraints.

I would not choose MySQL over Postgresql because of this, never mind all the other features.

mnahkies · on Jan 9, 2022

I haven't had a chance to use mysql 8 yet (which I believe narrows the gap between postgres and mysql), but whenever I'm interacting with databases on mysql 5.7 I miss features like common table expressions from postgres

nijave · on Jan 9, 2022

I think Postgres tends to have a bigger SQL dialect and it's faster on the same hardware (afaik, based on light testing)

MySQL clustering and operations tooling seems much better. No vacuums to worry about

SergeAx · on Jan 9, 2022

How is it normal to (seemingly) mandatory use third party connection management proxy for production environment?

My primary database these days is MongoDB, so I am a bit distanced from RDBMs now.