> if you give it a query that only requires certain result rows from one of its mat views, then Materialize is only going to compute the intermediate rows
This is absolutely correct!
> You can just have a bunch of “the same” Materialize node (i.e. every node just freestanding clone of a template node, with exactly the same sources and matviews) and then hit them with the parts of a map-reduce query
This should work, but we have been thinking about it/testing it differently internally. In general you should be able to create materialized views on different "shards" that have different `where` conditions, allowing you to control memory that way. This technique does require data that is actually partitionable in this way, same as it must be partitionable in mapreduce.
> this is all irrelevant the moment you write a query that needs a pure reduce
Of course, with materialize's sinks you can spin up a bunch of `materialized`s and connect them for a final reduce after data has gone through e.g. kafka or shared files. Being able to write joins and aggregates across heterogenous sources makes this kind of workload actually pretty pleasant.
This is absolutely correct!
> You can just have a bunch of “the same” Materialize node (i.e. every node just freestanding clone of a template node, with exactly the same sources and matviews) and then hit them with the parts of a map-reduce query
This should work, but we have been thinking about it/testing it differently internally. In general you should be able to create materialized views on different "shards" that have different `where` conditions, allowing you to control memory that way. This technique does require data that is actually partitionable in this way, same as it must be partitionable in mapreduce.
> this is all irrelevant the moment you write a query that needs a pure reduce
Of course, with materialize's sinks you can spin up a bunch of `materialized`s and connect them for a final reduce after data has gone through e.g. kafka or shared files. Being able to write joins and aggregates across heterogenous sources makes this kind of workload actually pretty pleasant.