Postgres did an awful job on that query. Small values of LIMIT should not requir...

neilc · on Oct 13, 2014

> Postgres did an awful job on that query. Small values of LIMIT should not require a full sort, even when there's no index.

Postgres will not necessarily do a full external merge sort for the query plan shown in the article: when there is a LIMIT k clause, the Postgres sorting code has optimizations to only keep the top k values in memory and do an in-memory sort (obviously, a full seqscan of the input is still needed without an index). Search for TSS_BOUNDED and tuplesort_set_bound() here:

https://github.com/postgres/postgres/blob/master/src/backend...

pat_shaughnessy · on Oct 13, 2014

By the way, I used a simple query on purpose to keep things simple for readers (and for me, the author!). By using such a simple SQL statement I was able to avoid complex optimizations and algorithms I would haven't been able to follow - and that wouldn't have helped readers much anyway. I wanted to get the basic idea across to people who have no prior knowledge of DB internals.

Plus I wasn't interested in comparing one DB vs. another, as much as I was interested in understanding how any DB works.

davidw · on Oct 13, 2014

My reading of the cost statements is that the sort doesn't matter at all: the sequential scan is what took up virtually all of the time.

But I'm hardly a PG expert.

Excellent article in any event, it's really interesting to see how things work under the hood.

yangyang · on Oct 13, 2014

Indeed: it's estimating that a single row will be returned from the seq scan. That will clearly take no time to sort!

Animats · on Oct 13, 2014

Depends on the contents of the database. The query optimizer doesn't know, at optimization time, how many hits such a statement will produce. Given a database with only a few entries with the same name, the sort cost is tiny. If there are a lot of hits, it's expensive.

yangyang · on Oct 13, 2014

It can estimate though, using per-column statistics. ANALYZE (and the autoanalyze background process) update these in PostgreSQL.

These stats are a huge part of cost-based query optimisation, which all major DBMSs do these days.

Details here: http://www.postgresql.org/docs/9.3/static/planner-stats.html