Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Other than motherduck, is anyone aware of any good models for running multi-user cloud-based duckdb?

ie. Running it like a normal database, and getting to take advantage of all of its goodies



For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].

Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).

But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.

We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.

0 - https://www.definite.app/blog/duck-takes-flight

1 - https://github.com/Query-farm/httpserver

2 - https://www.definite.app/


I wonder if anyone has experimented with "Mountpoint for S3" + DuckDB yet

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...


The duckdb http extension reads S3 compatibles.


that looks neat - how but do you handle failover/restarts?


in which one? restarts are no problem on ducklake (ACID transactions in catalog)

the others, I haven't tried handling it in.


GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.

It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.

We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.

Check it out at: https://gizmodata.com/gizmosql

Repo: https://github.com/gizmodata/gizmosql


Oh, and GizmoData Cloud (SaaS option) is coming soon - to make it easier than ever to provision GizmoSQL instances...


Feels like I keep seeing "Duckdb in your postgres" posts here. Likely that is what you want.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: