Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Postgres did an awful job on that query. Small values of LIMIT should not require a full sort, even when there's no index. I think MySQL has a special case for small values of LIMIT.

Looking forward to the next installment. That query is so simple that you're not seeing what a real database can do. Let's see something with a JOIN and lots of indices, so the optimizer can do some work.



> Postgres did an awful job on that query. Small values of LIMIT should not require a full sort, even when there's no index.

Postgres will not necessarily do a full external merge sort for the query plan shown in the article: when there is a LIMIT k clause, the Postgres sorting code has optimizations to only keep the top k values in memory and do an in-memory sort (obviously, a full seqscan of the input is still needed without an index). Search for TSS_BOUNDED and tuplesort_set_bound() here:

https://github.com/postgres/postgres/blob/master/src/backend...


By the way, I used a simple query on purpose to keep things simple for readers (and for me, the author!). By using such a simple SQL statement I was able to avoid complex optimizations and algorithms I would haven't been able to follow - and that wouldn't have helped readers much anyway. I wanted to get the basic idea across to people who have no prior knowledge of DB internals.

Plus I wasn't interested in comparing one DB vs. another, as much as I was interested in understanding how any DB works.


My reading of the cost statements is that the sort doesn't matter at all: the sequential scan is what took up virtually all of the time.

But I'm hardly a PG expert.

Excellent article in any event, it's really interesting to see how things work under the hood.


Indeed: it's estimating that a single row will be returned from the seq scan. That will clearly take no time to sort!


Depends on the contents of the database. The query optimizer doesn't know, at optimization time, how many hits such a statement will produce. Given a database with only a few entries with the same name, the sort cost is tiny. If there are a lot of hits, it's expensive.


It can estimate though, using per-column statistics. ANALYZE (and the autoanalyze background process) update these in PostgreSQL.

These stats are a huge part of cost-based query optimisation, which all major DBMSs do these days.

Details here: http://www.postgresql.org/docs/9.3/static/planner-stats.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: