Show HN: Kùzu: An Embeddable GDBMS like DuckDB/SQLite from UWaterloo

omatkafa · on Nov 15, 2022

This looks cool and thanks for the MIT licence. I like this `pip install ...` type easy installations and will test its performance on very large graphs. will keep an eye on it.

guodong · on Nov 15, 2022

Thanks!

xkcd99 · on Nov 15, 2022

When you say scalability, how much are we talking about here ? Can it handle around 50 gb or 100 gb datasets ?

Also, going through the docs it seems you support only csv data ingestion currently right ? Any plan on supporting json or parquet and other formats ?

Another thing, do you have support for any in built graph algorithms (like path finding, shortest path, centrality) ?

guodong · on Nov 15, 2022

Thanks for your comment.

For scalability, we can scale to several hundred GBs, and we routinely test on LDBC up to 300GBs. Our goal is to support efficiently querying over data at TB scale.

Right now, we only support CSV import. We are currently working on the integration of arrow, and aim to support more data formats through arrow. Hopefully that will bring us to support parquet, json, etc.

Built-in graph algorithms are coming along, but step by step. We are focusing on shortest path quries for now.

As always, any suggestions and discussions on these are welcome.

smartyboi · on Nov 15, 2022

Different wrapper for the same underlying techniques. Might be good to write some papers, but in my opinion, very unlikely to have any impact in practice.

guodong · on Nov 15, 2022

Thanks for your comment. We don't want to only base our research on Kùzu but instead are focused on implementing Kùzu seriously and support actual users. so expect a few but not many papers these upcoming years.

Also not sure what techniques you had in mind, but our position is that graph dbms's should be built on relational principles and state-of-the-art analytics data management techniques (e.g., that's why Kùzu is a columnar system). but we have many new techniques (e.g., factorization, new join algorithms, new storage designs) that are all optimized for graph data with a lot of many-to-many connections between nodes/entitites. these techniques are optimized for finding patterns over such data. we wrote about prototype implementations of these techniques over many previous research papers and now we are focusing on implementing them very seriously in Kùzu.

Hope this clarifies a bit. Welcome to share more of your opinions in more details.

smartyboi · on Nov 15, 2022

Thanks for the reply. I don't see any operators in the codebase for BFS, DFS, APSP etc. Shouldn't your graph querying build around these fundamental operators?

guodong · on Nov 15, 2022

No, not at all! This is a big misunderstanding that implementing high-level graph DBMS query language require BFS/DFS type "traversals", which is another term to use for joins of node records with each other. Systems that adopt these "traversal" algorithms to do joins end up committing to a specific type of joins (what an RDBMS would call an index-nested loop join) and that's usually not very efficient (no matter what those systems might claim). Instead it's better to accept that these are simply joins and use relational join operators that are however optimized for many-to-many joins and cyclic joins (e.g., if you are searching for triangles, etc.). that's what we do in Kùzu. You can read about some of these join algorithms in our CIDR paper (https://cs.uwaterloo.ca/~ssalihog/papers/kuzu-tr.pdf) and some earlier papers (http://www.vldb.org/pvldb/vol12/p1692-mhedhbi.pdf). We have explained what we do always in terms of fast joins and we strongly believe that's the right thing to do for evaluating the graph patterns in a language like cypher! That said, when we move to other computations, e.g., shortest paths, we will be using more specialized graph algorithms as operators as well.

smartyboi · on Nov 15, 2022

Interesting. Part of my pessimism stems from seeing bad graph engines over the decades but perhaps you are here to change exactly that. I will keep track of the latest developments in your git repo. I wish the very best!

guodong · on Nov 16, 2022

Thanks! Yes, we really want to change that situation and think we have some good ideas. If you've used graph databases, we would love to hear and learn from your use cases too, you can reach us at: contact@kuzudb.com.

erdemsalihoglu · on Nov 15, 2022

Promising work!

guodong · on Nov 16, 2022

Thanks!