More

petereisentraut · 2025-11-17T11:32:19 1763379139

The problem with this and similar requests is that it would change the identifier scoping in incompatible ways and therefore potentially break a lot of existing SQL code.

petereisentraut · 2025-11-17T11:29:10 1763378950

The working group also discussed ORDER BY ALL, but for some reason most participants really did not like it.

petereisentraut · 2025-11-17T11:26:58 1763378818

This was also discussed at the last SQL WG meeting but was postponed for further refinement. But it’s likely to be added soon.

petereisentraut · on Aug 13, 2024

Git 2.38.0 is the version where git archive uses an internal gzip implementation instead of calling the actual external gzip. This internal implementation has two improvements for this purpose: First, it doesn't store the timestamp. You could also get that with gzip -n. (But the old git archive didn't do that, so you have to run the gzip as a separate step after git archive.) Second, it stores the platform identification bits as "UNIX" on all platforms, so the output is identical on all platforms. There is no gzip command-line option for that, unfortunately.

gpvos · on Aug 14, 2024

Ah right. My suggestion would have been to use a small Perl or Python script or something, assuming there would be a zip library that would let you set the proper flags.

petereisentraut · on Aug 13, 2024

The configure generated from configure.ac has always been checked into Git for PostgreSQL. So with either the old or the new make dist approach, the configure in the tarball matches the one checked into the source code repository. So this is outside of what this article is discussing.

cryptonector · on Aug 13, 2024

Got it. I should have checked first.

petereisentraut · on June 3, 2023

> - Support for deferring NOT NULL and CHECK constraints to the end of a transaction (just ran into this problem yesterday)

I'm curious what the use case of this is?

Deferrable constraints are usually considered for foreign keys, since there you might have to juggle updates to multiple tables and might violate the constraint in the intermediate states. But that doesn't appear to apply in that way to CHECK constraints.

evangow · on June 6, 2023

2 tables that reference each other and the foreign key columns both have a NOT NULL constraint.

petereisentraut · on July 11, 2022

I am the author of this feature. The background here is that the SQL standard was ambiguous about which of the two ways an implementation should behave. So in the upcoming SQL:202x, this was addressed by making the behavior implementation-defined and adding this NULLS [NOT] DISTINCT option to pick the other behavior.

Personally, I think that the existing PostgreSQL behavior (NULLS DISTINCT) is the "right" one, and the other option was mainly intended for compatibility with other SQL implementations. But I'm glad that people are also finding other uses for it.

woevdbz · on July 11, 2022

> Personally, I think that the existing PostgreSQL behavior (NULLS DISTINCT) is the "right" one

SQL is old enough and this debate so unsettled still that I think it should be clear there isn't a categorically right behavior here anymore than there is a clear winner between tabs and spaces.

As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL", while rows with NULL still group together in GROUP BY.

I appreciate you giving folks the option.

anamexis · on July 11, 2022

> As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL

This isn't quite true - these comparisons don't evaluate to false, they evaluate to NULL.

I intuitively think of NULL as "unable to compute," a generalized NaN.

magicalhippo · on July 11, 2022

> these comparisons don't evaluate to false, they evaluate to NULL

In the databases I've used, such as MSSQL[1], they evaluate to UNKNOWN.

[1]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

bfgoodrich · on July 11, 2022

The document you link uses unknown as a synonym for null. If you inserted the result of such a comparison into a table, the value inserted would be NULL.

magicalhippo · on July 11, 2022

I'm self-taught, could you show how to insert the result of such a comparison into a table in mssql? I can only think of using CASE or similar, which would mean no direct conversion.

orf · on July 11, 2022

insert into table (c) values (null != null)

magicalhippo · on July 12, 2022

Can you provide a complete working example? I'm just getting "incorrect syntax near !" with both mssql and sqlanywhere.

    create table #nulltest (a bit null);
    insert into #nulltest(a) values (0),(1);
    insert into #nulltest(a) values (null != null);

distalzou · on July 12, 2022

Some SQL dialects don't recognize != and use <> instead

magicalhippo · on July 12, 2022

True, I'll blame vacation mode for not catching that one.

However, that doesn't change things, I can't get any of these to parse:

    insert into #nulltest(a) values (1 <> 0);
    insert into #nulltest(a) values (null <> null);
    insert into #nulltest(a) values (true and true);
    insert into #nulltest(a) values (null and null);

So my question remains, how do you insert an UNKNOWN?

The point of me asking is that it was my belief contrary to what was suggested, UNKNOWN is not the same as NULL and, crucially, does not evaluate as NULL.

ratsmack · on July 11, 2022

It's good to see that everyone agrees on the definition of NULL.

DaiPlusPlus · on July 11, 2022

Yes: a billion-dollar mistake.

bazoom42 · on July 11, 2022

This is s misunderstandig of Hoares “billion dollar mistake”. The mistake was not the existence of nulls, the mistake was type systems where every reference could be null.

Sql does not have this problem.

lmm · on July 12, 2022

SQL absolutely does have that problem. You have no way of enforcing or checking that an expression is supposed to be non-NULL; at best you get an error at runtime when you try to insert it into a given column, which is usually far too late.

anamexis · on July 12, 2022

This is only a problem insofar as any data constraint is a problem.

lmm · on July 12, 2022

No, it's a specific problem of SQL that you can apply constraints to stored or quasistored columns but not to expressions. In a real typed language every expression has a type.

erichocean · on July 11, 2022

And with RDBMSs, a multi-trillion dollar success!

paulryanrogers · on July 11, 2022

This is a good way to remember the SQL behavior. Though I do hope it'll change in the future. So many other languages interpret null as a comparable value, and it will be less surprising to folks coming from those.

afiori · on July 12, 2022

I personally don't have issues with languages that have incomparable values, just with languages that have a generic equality and incomparable values.

If every type has it's own equality operation (e.g. Number.eq) then I have no issue with it being weird in unexpected types (e.g. Number.eq(NULL, NULL) =false)

For example in JavaScript almost every use of === should be replaced by Object.is.

goldenkey · on July 12, 2022

I agree, however that's all fine and well but it turns the languages syntax into dogshit. A babel plugin (transpiling) would be ideal.

On the same note, the original intent was for NaN to be insidious and propagate into every derived calculation or comparison, thus making it easy to tell if a mathematical function had failed, without having to add conditionals at every step within it. Which makes a lot of sense. However, not being able to know exactly where the NaN was produced is kind of shit. Now, if NaN had a stack trace property like Error (exception) objects...it would make this behavior much more useful. Makes me wonder if the prototype of NaN can be extended, hm.

afiori · on July 12, 2022

NaNs cannot be extended in any way, in many interpreters NaN is used to "hide" pointers in 64 bit floats using the 52 free bits in NaN values.

IMO NaN should have never existed, like signed zeros, (I am more agnostic about [signed] Infinity) I think that they were implemented in IEEE 754 because the encoding had space left for them.

I not an expert in any way on this this is literally a personal opinion.

goldenkey · on July 12, 2022

Right. I was thinking that maybe NaN in JS was more than just the IEEE version we see in languages closer to the metal.

Pxtl · on July 11, 2022

> This isn't quite true - these comparisons don't evaluate to false, they evaluate to NULL.

Which large parts of SQL treat as equivalent to false.

    WITH vars AS (
        SELECT CAST(null AS INT) as X
    )
    SELECT 'yes this is a tautology' as "is this is a tautology"
    FROM vars
    WHERE X = X

The above query returns nothing, not an error because the value of X is unknowable. In a sane language, X=X and so it would yield "yes this is a tautology". SQL standard wants it both ways - it wants to say "boolean null isn't false" except when for its own convenience it decides that yes, null is false.

The truth, imho, is that SQL has an archaic type system. In this day and age it should be offering algebraic types where we can implement our own "empty" values and the like that may not use three-value logic as appropriate, so that I can say that for example End Date is blank instead of either unknown or Date.MaxValue.

anamexis · on July 11, 2022

> SQL standard wants it both ways - it wants to say "boolean null isn't false" except when for its own convenience it decides that yes, null is false.

I don't see it that way. Null cast to boolean is always false. Null equality is undefined. There is no contradiction there.

saltcured · on July 11, 2022

I think that this topic gets tripped up with people wanting to assume boolean means two possible states... NULL adds an extra state to every type much like the "bottom" type/value in some functional programming languages or type systems.

It's not the `=` operator that is special or significant here. It is the `WHERE` clause semantics, which are defined as returning rows when the boolean expression is true. This treats both false and null cases as the other possible outcome, when rows are not returned. Not understanding this distinction is also where people write buggy queries using WHERE NOT ..., because like equality the NOT operator will return NULL for a NULL input (so NULL is not like false and does not negate to true).

SELECT 'hidden by null result' WHERE CAST(NULL AS boolean);

SELECT 'also hidden by null result' WHERE NOT CAST(NULL AS boolean);

SELECT 'shown by true result' WHERE CAST(NULL AS boolean) IS NULL;

SELECT 'hidden by false result' WHERE NOT (CAST(NULL AS boolean) IS NULL);

The existing GROUP BY behavior is different than WHERE because it does not have a special rule for handling true values. It just returns a group for each distinct value.

The prior (new default) UNIQUE constraint behavior made sense when you think of NULL as this bottom value and meaning "unknown", while also thinking that a uniqueness constraint is only applied to complete keys. The check is effectively deferred, and the partial or incomplete tuple might become valid and unique once its missing part(s) are filled in with real values. The new optional behavior makes more sense if you think that the uniqueness constraint enforces distinct records even with partial keys, just as the GROUP BY clause will determine distinct groups.

There is also a similar issue with FOREIGN KEY constraints. The default MATCH SIMPLE behavior is consistent with the default UNIQUE constraint behavior. An incomplete foreign key is essentially ignored and not subject to integrity checks. The MATCH FULL behavior is almost a style checker to say that a compound foreign key should be all null or all non-null, while still ignoring incomplete foreign keys. As far as I know, the MATCH PARTIAL case is not implemented, but I think it might complement this new UNIQUE NULLS NOT DISTINCT behavior?

One can imagine a further branching point in behaviors for foreign keys and I'm not sure how MATCH PARTIAL will be defined. One mode might be easily implementable with the new index method, if we want partial foreign keys to be able to reference partial keys in the referred table, i.e. the partial key with NULLs is considered an actual row key. But we might also imagine another partial matching mode where the the non-null components of a compound foreign key have to all exist in at least one key in the referred table, while ignoring the null fields as incompletely formed references. This would be an improvement on MATCH SIMPLE which allows the non-null columns in a foreign key to hold values which could not possibly match the referring table no matter which values are filled in for the remaining null fields of the foreign key. However, this enforcement would not be easily implemented without a powerset of indexes for all partial combinations allowed in the referencing tables!

bobbylarrybobby · on July 11, 2022

Er, sorry, but I thought that the answer to tabs vs spaces was use tabs for semantic/logical indent, use spaces for physical/alignment indent.

jon_richards · on July 11, 2022

Yes, mixing invisible characters will solve all our problems.

account42 · on July 12, 2022

They don't have to be invisible if you use a good editor.

qez · on July 11, 2022

> anymore than there is a clear winner between tabs and spaces.

But there is a clear winner here. Tabs won, almost all code files use spaces. Of course everyone uses the tab key on their keyboard, and the IDE makes spaces for you; no one is going to manually press space 4 times.

bornfreddy · on July 11, 2022

There is no winner, both tabs snd spaces are widely used. However with proliferation of better tools the question thankfully became mostly irrelevant. I don't care if it is spaces or tabs, as long as it is consistent across codebase (which CI enforces with autoformatters and linters) and as long as my editor (which obeys project's editorconfig) uses the same convention.

spacemanmatt · on July 11, 2022

As a savage, I use 4 tabs now.

tingletech · on July 11, 2022

I hit space 4 times...

jjoonathan · on July 11, 2022

...which is clear evidence that hitting space 4 times is the wrong thing to do :P

masklinn · on July 11, 2022

> SQL is old enough and this debate so unsettled

Is the debate really unsettled?

MSSQL considers nulls to be equal, everyone else considers them to be distinct (aside from a few DBs which just refuse to create UNIQUE indexes on nullable columns), and Oracle is apparently huffling glue in a corner.

> As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL", while rows with NULL still group together in GROUP BY.

Group by is the exception here. And arguably it is so because the alternative is never useful, or close enough as to not be worth it.

feanaro · on July 11, 2022

So of the three major SQL implementations, you have one of them following the results of the settled debate, one of them being an exception and one of them huffling glue (whatever that means). In addition, there's a language exception even in the settled group, which you have to handwave away.

Sounds pretty unsettled to me.

irrational · on July 11, 2022

Oracle considers null to be equal to empty string.

mike_hock · on July 11, 2022

That is truly huffing glue.

irrational · on July 11, 2022

And oracle considers null to be equal to empty string. My understanding is oracle was created before the sql standard existed and they don’t want to change because of backwards comparability. I don’t understand why they can’t add a db parameter to allow you to set how nulls are handled.

to11mtm · on July 11, 2022

IDK if you ever saw the HN thread where they talked about oracle and change tests from a few years back, but if I had to guess, the pain just isn't worth the overall cost.

Overall I remember running into this but finding it fairly simple to add a couple specific checks around the condition where it came up in our code.

layer8 · on July 11, 2022

They would need to change their storage format to be able to distinguish between NULL and empty string, and probably a lot of in-memory structures and even more code logic as well. It would be hugely expensive to support both.

Oracle introduced VARCHAR2 in the 90s(?) to mean “Oracle semantics” as opposed to VARCHAR which is documented as “may change to standard SQL semantics in a future version”. However I don’t think they’ll ever materialize that distinction.

nicoburns · on July 11, 2022

A slight tangent, but do you have any pointers on how one might get involved in either:

- The SQL standardisation process (or whether this is even feasible as someone who isn't involved in the development of a major database engine)

- Postgres development

The feature I am particularly keen to get accepted is trailing commas in SELECT column lists (potentially other places too, but SELECT lists would be a very good start). And there are potentially a bunch of other improvements in the syntax sugar department that I might be interested in spearheading if I could successfully get this accepted (e.g. being able to refer to select list column aliases in GROUP BY clauses).

I don't have much experience with C, but I would potentially be up for actually implementing the change if I could get buy in from the maintainers.

orthoxerox · on July 11, 2022

> The feature I am particularly keen to get accepted is trailing commas in SELECT column lists

Not "GROUP BY THE OBVIOUS LIST OF EXPRESSIONS, YOU KNOW, THE ONES FROM THE SELECT CLAUSE THAT DON'T CONTAIN AGGREGATE FUNCTIONS"? Aka "GROUP BY ALL" from DuckDB.

gurjeet · on July 11, 2022

Postgres project is always looking for contributors. The project has a very nice, detailed, well-defined documentation on how to contribute [1]. The community is very welcoming, and has a process (CommitFest[2]) in place to ensure all submissions get their due attention.

FWIW, I am starting to make an effort (e.g. [3]) towards helping the newcomers get their work in shape to be acceptable by the committers.

Note: I'm biased towards Postgres community, since I've worked with them for many years. So others' opinion _may_ differ, but it's highly unlikely.

[1]: https://wiki.postgresql.org/wiki/Development_information [2]: https://wiki.postgresql.org/wiki/Development_information#Com... [3]: https://www.postgresql.org/message-id/CABwTF4Us8WGMffNGTNY1X...

zigzag312 · on July 11, 2022

> Personally, I think that the existing PostgreSQL behavior (NULLS DISTINCT) is the "right" one, and the other option was mainly intended for compatibility with other SQL implementations.

Care to explain why you think NULLS DISTINCT is the "right" default behavior? What problems does it solve to warrant additional complexity by default?

undecisive · on July 11, 2022

I think it's worth pointing out what the opposite means - to say that "this field might have a value, or it might be null - but only one tuple/row can set this field to be null" implies that you have given the null option some kind of real-world value.

That is to say, you are using null to encode some kind of meaning, when really this is not what null is supposed to be used for.

That's not to say I think we should be morally disapproving of people who do that - I use things for their unintended purpose all the time, and it bugs me when people get on a high horse. Use what makes sense to you - and I love this change for exactly that reason.

But the general theoretical approach is that if you want to care about the value in a field, you need to give it a value. Null is for the valueless, and if that isn't an allowed state, you should simply set the field as not-nullable. Theory is fine in theory. For a practical database that exists in the real world, this option is a good addition.

eins1234 · on July 11, 2022

Fauna has an interesting approach to null that I've grown to like:

Null is a real value that can be compared intuitively like any other value (`Equals(null, null)` returns true). [1]

However, in indexes, any term that evaluates to null is simply not stored in the index. So we can create a unique index with multiple null values because they simply won't exist in the index, so won't violate the unique constraint.

If we do care about null values in a particular index, we can handle that by mapping the null value into another value that will get stored in indexes using a "binding", and then use that index to query for or constrain uniqueness by that mapped value [2].

This is not the most convenient thing in the world, but at the end of the day, it feels like an edge case, so I'm happy with having to jump through some hoops to handle it in exchange for making the much more common cases (null comparisons and unique indexes with nullable values) more intuitive and less error prone.

[1] https://fauna.com/blog/understanding-nothing-or-null-in-faun...

[2] https://docs.fauna.com/fauna/current/learn/cookbook/fql/sear...

nightski · on July 11, 2022

I think a lot of the trouble stems from the fact that databases do not have sum types and there is no way to encode a "None" type without hacks. NULL is the only reasonable option in a lot of cases.

wongarsu · on July 11, 2022

Yeah, in hindsight the world would be a slightly better place if SQL didn't include NULL, but instead "UNKNOWN" and "NONE", with "NONE = NONE" being true, and "UNKNOWN = UNKNOWN" being unknown.

stouset · on July 11, 2022

Or have an Option (or Maybe) type. This is how modern programming languages solve the nil issue, I don't see a reason why a database couldn't take the same approach.

kzrdude · on July 11, 2022

When would NONE show up and when would UNKNOWN show up?

wtetzner · on July 11, 2022

NONE would mean “there is no value”, and UNKNOWN “there might be a value, but I don’t know what it is”.

dx034 · on July 11, 2022

I'm not the person you're responding to, but also think NULLS DISTINCT makes sense in many cases. NULLs often represent missing data. Imagine storing a customer's address and the street name is NULL. If several customers have a street=null, it doesn't mean that they have the same street. So from a data perspective, it makes sense to treat these unknown values as distinct.

For filtering and aggregation I still welcome the change, as it makes sense in many cases to treat them as not distinct.

zigzag312 · on July 11, 2022

Thanks. It seems that the issue arises from value equality vs optional value state equality.

To me, a more natural way to treat NULLs is to think of NULL not as a value, but as a state. Several customers with a street=null all have street property in an equal state. However, an equal state doesn't mean the value is also equal, as there is no value in that state. Option type in functional languages models this perfectly:

  'a option = 
    |None       // no value here
    |Some of 'a // value of type 'a

So, when checking, if customers have the same street we need to check the value. So, comparision is only valid if street has value defined. Adding null check in addition to equality check to ensure that the actual values are equal feels most natural to me (as this is how it's done in most languages).

Unique constraints should not be a problem. Unique constraints are about values. So if state is NULL there is no value and so constraint does not treat NULL states as equal values. Adding a keyword to change constraint behavior when needed, would be best IMO, as it would need to be rarely used.

masklinn · on July 11, 2022

> So, when checking, if customers have the same street we need to check the value. So, comparision is only valid if street has value defined. Adding null check in addition to equality check to ensure that the actual values are equal feels most natural to me (as this is how it's done in most languages).

Is it tho?

Languages with ubiquitous nullability will accept both being null or both being non-null and the same value. Languages with option types do the same (as long as the value is equatable obviously), this includes OCaml: https://v2.ocaml.org/api/Option.html#preds.

The only languages I can think of which seriously diverge would be C and C++ where this is UB.

Maybe if you way stretch it Java because it doesn't have operator overloading, but even there there's Object.equals which is null-safe... and returns `true` if both parameters are `null`.

zigzag312 · on July 11, 2022

I completely agree. I tried to explain basically the same thing as you, but it seems there is some ambiguity in my example I didn't intend.

Comparing two nullable/option values when both are in a "null state" will return true in most languages. Which, to me, is the most logical result as they have the same state. So, if you want to check if two people actually live on the same street, it's not enough to check p1.street == p2.street, but you need to also check that at least one of the streets is not null. Two nulls would return true, but if equality returns true and we know that one of the streets isn't null, we know we are comparing actual street values (not null states). Sadly, in SQL, NULL=NULL result is not true, but NULL.

mekster · on July 13, 2022

It seems SQL NULL means both undefined and none which seems to cause confusion.

dtech · on July 11, 2022

Not OP, but this constraint means "the data must either be empty or unique", which is an extremely common constraint. In contrast, I've never encountered "only 1 entry is allowed not to have the data, all others must be unique".

enkrs · on July 11, 2022

"only 1 entry is allowed not to have the data, all others must be unique" makes sense in larger schemas as the database grows. It replaces is_default_address, is_primary_child etc. fields and constraints on relational tables. In the perfect world all the data would be normalized not to have such columns and cases, but in real life it just grows that way. So for some cases NULLS NOT DISTINCT will be a welcome addition.

knorker · on July 11, 2022

Not parent commenter, but to me it seems "obviously correct".

As explained in the article NULL means "unknown". Let's say I have two people, and a "tax identifier" column. Let's say two people can't have the same tax id. Obviously "don't have one" or "unknown" fit very well in NULL, and two people can both be missing a tax id, even if those who have it need it to be unique.

chris_wot · on July 11, 2022

Except that's the ambiguity of NULL in the standard. It doesn't just mean "unknown". It can also mean "no value". And you can clearly compare two elements with no value.

Which is why nulls are so incredibly confusing.

knorker · on July 11, 2022

Yeah that's true. All the database can say is "I don't have a value here". Not "there exists no value for this column".

But this doesn't mean that two NULLs are the same. Two people with NULL tax IDs are not the same (one turns out to be unknown but existing, the other is a toddler without a tax ID). Ten people with NULL address don't live in the same house. Half declined to give address, the other half are homeless. Either way we can't send mail to them.

If you use the database to have a UNIQUE constraint of "only one purchase per household" then it makes no sense to allow NULL addresses, but only allow the first NULL address purchaser to buy the thing.

Or "sorry, in this hotel we only allow one guest at a time without a car, and someone else already doesn't have a car".

Does that guest without a car actually have a car? I don't think that's something that the database can solve.

Should databases have two separate values for unknown or no value? That sounds like a world of hurt, with two types of null.

jl6 · on July 11, 2022

> Should databases have two separate values for unknown or no value? That sounds like a world of hurt, with two types of null.

It’s also a non-problem in practice, because if an application needs to distinguish between multiple types of nulls, it can very easily just use an extra column holding the information needed to disambiguate.

jstanley · on July 11, 2022

Similarly, the need for non-boolean columns is a non-problem in practice, because an application can just make an extra column for every bit it needs to store. If it needs strings or arbitrary-width numbers it can just join on other tables that store those bits.

(Existence of a workaround doesn't turn a problem into a non-problem)

jl6 · on July 11, 2022

I stand by it being a non-problem in practice because the workaround is trivial, and the supposed solution adds complexity everywhere.

There isn’t even a universally agreed methodology for how to handle nulls in the relational model in theory.

It’s a semantic problem that can only be solved with reference to the requirements of a particular use case, and when you do that using explicit extra columns, the problem evaporates.

fendy3002 · on July 11, 2022

Which happen in world of javascript. It has null and undefined. Null variable is a variable / property that's defined and set with null, while undefined is a variable / property that's not defined yet (or maybe not initialized, I forgot).

I hope at least SQL will give another comparison operation, such as the current equal sign (=) means a.prop is not null && b.prop is not null && a.prop equal b.prop (for not defined yet).

Let's say another sign like a.prop *= b.prop, meaning a.prop is null && b.prop is null || a.prop = b.prop (for our intention is both value are empty).

pdw · on July 11, 2022

Such a comparison operator exists: IS (NOT) DISTINCT FROM.

jhgb · on July 11, 2022

> It doesn't just mean "unknown". It can also mean "no value".

Isn't the canonical representation of known-no-value an absence of a tuple? Like as opposed to saying "There exists an employee X who works in department NULL", you simply don't make any claim about employee X working in a department? After all, when enumerating the members of a set, you're also omitting the enumeration of non-members of a set, and the law of excluded middle applies.

Tostino · on July 11, 2022

Nulls will reappear in your queries due to joins even if none exists in your data model.

jhgb · on July 11, 2022

Ah yes, they're the curse of SQL. Still doesn't seem to be a reason to not squash them out as much as feasible within the constraints of SQL.

richdougherty · on July 11, 2022

It's probably a good default because it's more consistent with how NULL equality is handled in SQL generally. From the article:

> In Postgres 14 and prior, unique constraints treated NULL values as not equal to other NULL values.

...

> This is consistent with the SQL Standard handling of NULL in general, where NULL is unknown. It is impossible to determine if one unknown is equal to another unknown. Because NULL values are of unknown equality to one another, they do not violate UNIQUE constraints.

zeroimpl · on July 11, 2022

Your last statement would be better written as: Because NULL values are of unknown equality, they may or may not violate UNIQUE constraints.

The safest implementation would assume that they do violate the constraint, rather than the current behavior assuming they don’t.

In practice I don’t think I’ve ever added a unique index on a nullable column where null might imply unknown. I have used it in cases where null meant “none”.

petereisentraut · on July 11, 2022

> Care to explain why you think NULLS DISTINCT is the "right" default behavior? What problems does it solve to warrant additional complexity by default?

It's the most consistent with the equality behavior of null values elsewhere.

jeff-davis · on July 11, 2022

I generally agree that, in most cases, you want the NULLS DISTINCT behavior. But thank you for providing such a developer-friendly feature that allows flexibility here!

NULL in SQL is not terribly consistent overall. Sometimes NULL is treated like "unknown" and sometimes more like "n/a". And the interaction with non-scalar types (like records) is pretty strange. Also, it's common for ORMs to map the app language's NULL (or None/Nil/whatever) to SQL NULL, which adds its own nuance. So I can see this being a useful feature.

chris_wot · on July 11, 2022

Well, if NULL is most often meant to signify an unknown value. You can't compare one unknown value with another and say they are the same.

The standard is, I believe, ambiguous about NULL because it can also mean that it is an absence of data. In other words, it is "this is not yet determined" (aka a CAR table has a COLOR table, but the paint is applied as the final operation so NULL could be used as the car is not at this stage yet and the color will be chosen later). In this case, you can compare NULLs against other NULLs, because you can have two cars that are in an indeterminant state. In that case, it is useful to compare to NULL values as it means the same thing.

woevdbz · on July 11, 2022

An example where NULLS DISTINCT might make sense in a constraint is when a table is used to store an analytical cube roll-up (eg output of GROUP BY ROLLUP), where NULL in a dimension column has special meaning (it indicates subtotals). Having multiple rows with the same NULL key could, in some of those cases, be an error leading to double-counting.

Pxtl · on July 11, 2022

Once you've accepted that X=X doesn't return TRUE when X=NULL, NULLS DISTINCT is consistent with that. ANSI NULL three value boolean algebra is insane, but it's insane in a pretty consistent way.

rst · on July 11, 2022

So, we have a "standard" which says that both behaviors should be supported, and there should be two different ways to express them, but explicitly declines to say which does what?

I'm genuinely not sure what purpose this particular standard serves, but it looks like it's not allowing people to write portable code...

anamexis · on July 11, 2022

I can't imagine the standard does not specify that NULLS DISTINCT should treat nulls as distinct and NULLS NOT DISTINCT should treat nulls as not distinct.

rst · on July 11, 2022

Ah. So there are three ways to write it, and explicit ways to call for either behavior. That's sane -- just wasn't what I got from the original writeup.

mjw1007 · on July 11, 2022

No, it's not as bad as that.

The thing that's implementation defined is what happens if you don't explictly select either option.

zeugmasyllepsis · on July 11, 2022

Playing around in a docker image of the beta build, it looks like this allows you to add a `unique nulls not distinct` constraint to a composite set of fields, but still does not allow you to specify those same fields a primary key. For example

    -- This works
    alter table announcement_destinations
        add constraint announcement_destinations_pk unique nulls not distinct (sender, receiving_group, optional_filter);

but

    -- Still does not work
    alter table announcement_destinations
        add constraint announcement_destinations_pk2 primary key (sender, receiving_group, optional_filter);

fails with a message like "ERROR: column "optional_filter" of relation "announcement_destinations" contains null values". Is there a motivation for this distinction?

Justsignedup · on July 11, 2022

Currently nulls are considered not distinct in postgres, no? Doesn't this mean anyone upgrading will have to fix up all their table definitions? Or am I wrong about this?

I just care about the backwards compat. To be clear, the second I saw this I thought "finally!!!"

masklinn · on July 12, 2022

> Or am I wrong about this?

This one. Currently nulls are always distinct (different from one another).

thijsvandien · on July 11, 2022

> Personally, I think that the existing PostgreSQL behavior (NULLS DISTINCT) is the "right" one

I find this to be at odds with IS DISTINCT FROM, which is false for two NULLs.

petereisentraut · on Sept 24, 2020

It's a bit more complicated than that. You will also notice another release note item in PG13 that says "Allow inserts, not only updates and deletes, to trigger vacuuming activity in autovacuum", which is because vacuum is not only about cleaning up deleted or rolled-back rows, but also transaction ID wraparound. Making that go away is also a dream of many, but it's a different (or additional) project than a new storage system.

petereisentraut · on May 21, 2020

Sure they can. See example here: https://www.postgresql.org/docs/current/plpgsql-transactions...

petereisentraut · on Oct 2, 2019

This is also much faster than the equivalent using a PL/pgSQL trigger.