Hacker Newsnew | past | comments | ask | show | jobs | submit | thecalebf's commentslogin

the graphics in it make it so much better


The Chrome Extension is open source and is powered by DuckDB WASM which loads the parquet conversion of the datasets as views so you can query them with SQL.

Here's the repo: https://github.com/cfahlgren1/hf-data-explorer


Yes please do! Looks awesome, would love to help any way I can as well.


Not OP, but would it be possible to use a standardized license? Every time a special purpose license is used for a software that gains adaptation, the lawyers of hundreds to thousands of different companies must spend a lot of time and iterations with the team to figure out if they can actually use this model. There is something magical in the GPL, MIT, Apache, etc licenses because these lawyers have already opined on them once and no longer create a bottleneck.


Sorry, it seems complicated. Since it is a finetune of Deepseek-Coder, I had to include their license.

Deepseek is pretty open, just says not to use it for: - military purposes - exploiting vulnerabilities etc.

Just trying to include as much information as possible from the initial base model.


Yes it is designed for users without SQL knowledge, however, it can still perform fairly well with questions on the difficult side (for non technical users) with queries having multiple joins, aggregations, ratios, and subqueries.

Next step is fine tuning and leveraging larger models that can handle very complex questions, reasoning, and data schemas since this is only a 7B.


Neat, would you ever use a local model for that if it could work with ORMs?


That is really interesting. I took note of this. That would be really cool!


Great call out. Will definitely focus on that in the next iteration!


No, those are benchmark, evaluation questions. The fine tune dataset was a custom, synthetically generated dataset of ~20k PostgreSQL Text to SQL pairs covering different SQL categories and question types.

I mention a little more about it here https://x.com/calebfahlgren/status/1754247740291207198?s=20


So this is essentially postgres only? Or how will it handle e.g. MS SQL Schemas and output?


Currently Postgres yes, already working on a dataset with more DDLs like MySQL, DuckDB, MSSQL, etc for a second iteration.


Updated the title since it may have been confusing, appreciate the feedback!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: