With binary representations you still get 2^D possible configurations so its entirely possible from a representation perspective. The main issue (I think at least) is around determining the similarity. Hamming distance gives an output space of D possible scores. As mentioned in the article, going to 0/1 with cosine gives better granularity as it now penalizes embeddings if they have differing amounts of positive elements in the embedding (i.e. living on different hyper-spheres). It is probably well suited to retrieval where there is a 1:1 correspondence for query-document but if the degeneracy of queries is large then there could be issues discriminating between similar documents. Regimes of binary and (small) dense embeddings could be quite good. I expect a lot more innovation in this space.
That's a great question. I think regimes like that could offer better trade-offs of memory/latency/retrieval performance, although I don't know what they are right now. It also assumes that going to the larger dimensions can preserve more of the full-precision performance which is still TBD. The other thing is how the binary embeddings play with ANN algorithms like HNSW (i.e. recall). With hamming distance the space of similarity scores is quite limited.
I would love an LLM agent that could generate small api examples (reliably) from a repo like this for the various different models and ways to use them.
We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile).
That looks indeed pretty interesting. But I still feel that it's not very convenient for usage in a desktop environment with local files. This is of course not to blame on the project itself, since I assume that it simply targets different use-cases and audiences.
I also researched in the meantime whether such a functionality could be implemented at all for the Gnome Shell and, more specifically, for its file browser. But the search and extension APIs would not even allow it or require many hacks.
Can anyone comment on an open source multi-modal LLM that can produce structured outputs based on an image? I have not found a good open source one yet (this included), seems to be only closed source that can do this reliably well. Any suggestions are very welcome!
That sounds much longer than it should. I am not sure on your exact use-case but I would encourage you to check out Marqo (https://github.com/marqo-ai/marqo - disclaimer, I am a co-founder). All inference and orchestration is included (no api calls) and many open-source or fine-tuned models can be used.
> That [pgvector index creation time] sounds much longer than it should... I would encourage you to check out Marqo
Your comment makes it sound like Marqo is a way to speed up pgvector indexing, but to be clear, Marqo is just another Vector Database and is unrelated to pgvector.
Try this https://github.com/marqo-ai/marqo which handles all the chunking for you (and is configurable). Also handles chunking of images in an analogous way. This enables highlighting in longer docs and also for images in a single retrieval step.