CREATE TABLE person (
id INT PRIMARY KEY,
name VARCHAR NOT NULL
);
CREATE TABLE pet (
id INT PRIMARY KEY,
name VARCHAR NOT NULL,
owner_id INT NOT NULL
);
It is common to want to join them, like so:
SELECT owner.id, owner.name, COUNT(pet.id) AS numberOfPets
FROM person AS owner
LEFT OUTER JOIN pet ON owner.id = pet.owner_id
GROUP BY owner.id
This doesn't work in standard SQL, because all columns in the SELECT list have to either be aggregated or included in the GROUP BY. owner.name is neither. That is a bit silly though because we know each result row will only have one unambiguous value for the owner name, since the GROUP BY is on a unique column from the same table as the owner name.
We can solve this with ANY_VALUE:
SELECT owner.id, ANY_VALUE(owner.name) AS name, COUNT(pet.id) AS numberOfPets
FROM person AS owner
LEFT OUTER JOIN pet ON owner.id = pet.owner_id
GROUP BY owner.id
I run into the equivalent problem in Kotlin sometimes.
I want to group things by a key, but that results in type `Map<K, List<V>>` because the compiler can't be sure that there won't be more than one `V` for the field being used for grouping... which is why there's also `associateBy` which does return `Map<K, V>`, but will throw if there happens to be more than one value for each K.
Not relevant, but I thought it may help other readers understand the problem.
Yes, but all these solutions require the optimizer to be smart enough to be smart enough to optimize them away: "oh, owner.id is the primary key, so any value from owner will appear at most once, I don't have to group by them or track the maximum value, I can pick and store any value".
I think all optimizers are smart enough already, as this is a super common pattern. I honestly would have preferred this logic to be made more explicit: if you group by the primary key (or by the table name, as someone suggested), you can use all columns from the table without aggregate functions.
This kinda breaks down when you introduce subqueries and CTEs, which is where myopic optimizers can screw the pooch as well, so any_value can be of use there.
Given the following tables:
It is common to want to join them, like so: This doesn't work in standard SQL, because all columns in the SELECT list have to either be aggregated or included in the GROUP BY. owner.name is neither. That is a bit silly though because we know each result row will only have one unambiguous value for the owner name, since the GROUP BY is on a unique column from the same table as the owner name.We can solve this with ANY_VALUE: