It is a custom search engine, written in Python. It works surprisingly well, most queries are answered under 200ms and tens of milliseconds are common once caches are warm.
Hundreds of milliseconds across how large an index?
My email corpus is on the order of ten million messages. Needless to say, seeing the words "can index roughly four messages per second" made me cringe rather hard.
any plans for a solr support? there's a lot of knowledge out there on how to scale solr for large webmail deployments like i.e. inside a large company or for forensic mail analysis.
I was wondering about that as well. Why not go with an existing search engine? Having checked out the code it seems that the project it still in its infancy. A lot of effort has probably gone into building the search engine though, which kind of seem like re-inventing the wheel but I'm sure you have reasons that I don't see. By the way, as test-infected coder, my first move was to launch the test suite, and it was a bit surprising that it wanted to run sudo :) I didn't go further for now because I don't have time to setup fetchmail, will do later. You guys seem to have a lot of potential and good intentions :) In my end of the world I'm also dreaming of "taking back email" and I happen to write Python for a living, so I'll be watching you guys very closely. Quite frankly though, I won't consider contributing until you have a proper test suite :) Keep it up!
Contributing increases the odds we'll be able to make the test suite. ;-) But yes, the code is very immature, right now it is basically a proof-of-concept in the middle of being refactored into its first sensible iteration (as mentioned at the top of the README).
I built the search engine from scratch initially because it was just an interesting problem I wanted to experiment with. However, now that I have one that works, I see massive benefits to not having too many external dependencies. It makes integration and packaging ever so much more pleasant.
Is Java really that much more effort to set up than Python? Solr's problem isn't Java, it's the bloaty mess of XML configuration files it insists on having - something like elasticsearch pretty much Just Works.