One fundamental difference between ElasticSearch and Solr

mumrah · on Jan 22, 2013

This seems to be comparing ElasticSearch to Solr 3.x which is disingenuous. Solr 4.0 has been released for months now and includes many new features around distributed search and indexing.

Just to point out a few corrections: * Solr does have a transaction log. When using soft commits, the tx log is used to recover any volatile writes in the event of a crash * Solr replicates in a similar manner to ES. It no longer does full-index replication as indicated by the article. A write comes in and is routed to the shard leader and replicas. * Solr too has "High Availability built-in" with automatic failover of shards

One fundamental difference the article does not address is the robustness of Solr's cluster management. Solr uses ZooKeeper under the hood which implements the Paxos algorithm to avoid issues like "split-brain syndrome". ElasticSearch has implemented its own distributed coordination and is susceptible to such issues.

Xylakant · on Jan 22, 2013

> ElasticSearch has implemented its own distributed coordination and is susceptible to such issues.

Only if configured wrong: setting discovery.zen.minimum_master_nodes to N/2 + 1 where N is the number of master nodes in the cluster prevents split-brain. Nodes that don't see enough master nodes will go into a catatonic state and won't accept writes, effectively preventing a split-brain syndrome.

teraflop · on Jan 22, 2013

That's only sufficient if your network is perfectly reliable. Otherwise, you can still get split-brain in situations like this:

https://github.com/elasticsearch/elasticsearch/issues/2488

This particular bug is probably fixable, but reaching consensus on a set of master nodes in the general case, without race conditions, is quite hard.

Argorak · on Jan 22, 2013

ElasticSearch just handles split-brain differently, by shutting down nodes that are cut off the cluster and marking them as such. The problem is present in the _default_ configuration for more than 2 nodes, but ES is very well tunable to your specific cluster layout (which you should do anyways).

redbad · on Jan 22, 2013

    > Solr uses ZooKeeper under the hood which implements the
    > Paxos algorithm

ZooKeeper does not implement Paxos.

mumrah · on Jan 22, 2013

You are correct. It is not an implementation of Paxos, but a Paxos-inspired protocol. http://wiki.apache.org/hadoop/ZooKeeper/PaxosRun

alexdong · on Jan 22, 2013

@mumrah thank you for the correction. you were right. I apologize for missing important points like the two things you mentioned.

steeve · on Jan 22, 2013

There is a ZooKeeper plugin for ElasticSearch: https://github.com/sonian/elasticsearch-zookeeper

fnl · on Jan 22, 2013

Alex provides summary about the main difference of ES vs. Solr, ie., the distributed, dynamic indexes vs. fast static index search speeds. Here is an somewhat older article highlighting this difference in numbers (search time) and explaining why percolation matters (and what it is) for ES (and not so much for Solr), although from about two years ago:

http://blog.socialcast.com/realtime-search-solr-vs-elasticse...

It should probably be pointed out that this comparison is slightly unfair as of Solr 4, however, and needs to be re-compared against a SolrCloud... But given SC relies on ZooKeeper, this is not nearly as easy a setup as EC.

LiveTheDream · on Jan 22, 2013

Alex mentions that blog post in his second sentence, and suggests that it was a poor comparison because of bad configuration for both Solr (calling commit too often) and for ElasticSearch (using 5 shard index setting). Granted, the point of this comparison was to be against the default settings for each.