The answer I’ve seen is “just pass structs of functions around”, which is just one step more explicit than the implicit version we’re all use to, but honestly I kinda like it to free ourselves of all the ceremony around generics.
It’s discouraged to pass around structs of functions to replicate type classes in Gleam. Instead the preference is to not type class style patterns in your projects, favouring a concrete style instead.
At least half of those languages (Elixir and OCaml) have some sort of mechanism for ad hoc polymorphism (elixir has behaviors and protocols, OCaml has higher order modules) so I feel like the comparison doesn't work that well personally
OCaml's modules are not implicitly instantiated, so they provide the same DX and APIs as you would get in Gleam.
Elixir does have protocols, but they are extremely limited compared to type classes, traits, etc, and they're uncommonly used compared to writing concrete code.
I'm working on a partition-oriented declarative data build system. The inspiration comes from working with systems like Airflow and AWS step functions, where data orchestration is described explicitly, and the dependency relationships between input and produced data partitions is complex. Put simply, writing orchestration code for this case sucks - the goal of the project is to enable whole data platforms to be made up of jobs that declare their input and output partition deps, so that they can be automatically fulfilled, enabling kubernetes-like continuous reconciliation of desired partitions.
This means, instead of the answer to "how do we produce this output data" being "trigger and pray everything upstream is still working", we can answer with "the system was asked to produce this output data partition and its dependencies were automatically built for it". My hope is that this allows the interface with the system to instead be continuously telling it what partitions we want to exist, and letting it figure out the rest, instead of the byzantine DAGs that get built in airflow/etc.
This comes out of a big feeling that even more recent orchestrators like Prefect, Dagster, etc are still solving the wrong problem, and not internalizing the right complexity.
Very much agree that to this is the direction data orchestration platforms should go towards - the basic DAG creation can be straightforward, depending on how you do the authoring - (parsing SQL is always the wrong answer, but is tempting) - but backfills, code updates, etc are when it starts to get spicy.
I think this is where it gets interesting. With partition dependency propagation, backfills are just “hey this range of partitions should exist”. Or, your “wants” partitions are probably still active, and you can just taint the existing partitions. This invalidates the existing partitions, so the wants trigger builds again, and existing consumers don’t see the tainted partitions as live. I think things actually get a lot simpler when you stop trying to reason about those data relationships manually!
This is true, but you can get combinatorial complexity explosions, especially with the data modeling patterns for efficiency common at some companies - eg a mix of latest dimensions and historical snapshots, without always having clear delineations about when you're using what. Common example is something like a recursive incremental table that needs to be rebuilt from the first partition seed. Some SQL operations can also be very opaque (syntactically, or in terms of special DB features) as to what partitions are being referenced, especially again when aggregates get involved.
It's absolutely solvable if you're building clean; retrofitting onto existing dataflow is when things get messy, and then managing user/customer expectations of a more strict system. People like to be able to do wild things!
It sounds like you implicitly delegated many important design decisions to claude? In my experience it helps to first discuss architecture and core components of the problem with Claude, then either tell it what to do for the high leverage decorations, or provide it with the relevant motivating context to allow it to make the right decisions itself.
I've used both, and while I think Vega has it's uses, it's not nearly as web developer friendly. Frontend engineers want a clear delineation between logic, composition and styling. By combining everything into a JSON document, you sacrifice that developer experience while introducing a lot of bespoke approaches.
That said, I absolutely love the idea that a blob of JSON living in my database contains everything I need for my visualization. The reality is that not enough other people are willing to put in the effort to learn that syntax, making it somewhat of a selfish tech choice.
As a big user of vega lite I think that's fair. I think it really shines when used by data vis experts, where charts need to be precise, such as in research and analysis contexts. For something like a simple a metrics dashboard I think I'd agree that it may be difficult for devs.
The hard part about documentation is that it requires you to have a component that can be comprehensibly and sufficiently documented. So much of the software written is seen as provisional, that even its authors think “well, we’ll document the v1”, not realizing that their prototype is just that.
Completely - they simplify the act of transforming data into visualizations down to the essential association of dimensions to axes/color/etc visualization concerns.
reply