Show HN: Open-source ETL framework to sync data from SaaS tools to vector stores

binarymax · on March 30, 2023

Cool project! It's not clear to me in the code from where you are getting embeddings. Are all your embeddings coming from OpenAI? If so, that sounds expensive for personal use.

jasonwcfan · on March 30, 2023

You can use any embeddings you want! We normally use OpenAI's ada which is $4 per 10 million tokens, which is fine for now. But eventually we'll need to figure out a way to incrementally sync data from SaaS tools instead of re-vectorizing all the content when the vector store needs to be updated.

ayanb9440 · on March 30, 2023

+1, here's the line in the codebase: https://github.com/ai-sidekick/sidekick/blob/main/sidekick-s...

We'll also add support for huggingface embeddings like MPNET and SBERT in the future!

jn2clark · on March 30, 2023

Looks really interesting! Are you looking for more vector search integrations? we have one here https://github.com/marqo-ai/marqo which includes a lot of the transformation logic (including inference). If so, we can do a PR

jasonwcfan · on March 30, 2023

Love the BAYC demo! One of the first companies we signed was actually OpenSea :)

ayanb9440 · on March 30, 2023

Is Marqo a vector database? If so that sounds great, feel free to put up a PR!

jn2clark · on March 30, 2023

search engine. it has all the embedding operations (text and images) and optimized inference. lots of other options like multi-modal queries (including negative queries) and multi-modal representations for documents.

ayanb9440 · on March 30, 2023

Nice, quite a feature set. Add a PR and ping us on slack, we'll merge it in