That dynamic resharding looks very nice. The big issue I see for using this as a real datastore is the apparent lack of queries and indexes on the data. Keeps it a lot simpler I guess, but so many workloads require the use of queries. I guess you'd load the data into some other system for querying and just use this for storage? Or would you use another database for storing the data, and load it into zBase for quick access to buckets?
It's a distributed key-value store with durability. If you're looking for something to do ad-hoc queries against this is not for you.
Think of it as memcache + disk persistence. (So rather than erasing things by purging cache when memory slab fills, you just evict it from memory and read from disk if its needed again).
I get that - but the usual implementation would be to have a set of databases with indexes (maybe mysql or mongodb) where you could store all the data and run ad-hoc queries against. You'd then put memcache in front of that for fast access to repeated queries where you already know which data you want. If the data isn't in the memcache, it would fall through to the underlying DB that is already on disk.
zBase would have it's own full copy of the data already on distributed disks, so it wouldn't need to fall through to some other database. That seems to be the entire point there - but surely you'd still need to store the data in some place you could run ad-hoc queries on it? That means that the data is duplicated into two places that would need to be kept up to date in sync. If a transaction fails on one of the data stores, don't you have inconsistent data now?
Currently zBase does not have any capabilities for indexing. But, the inherent design enables to use incremental replication protocol to build things outside of zBase to do indexing.
zBase is used as highly available key-value store for writes and reads. It offers few fancy operations like get-lock as well.
Game workloads don't require ad hoc queries, typically - you're usually just stuffing a save state into the database every few seconds, and you route everything that you might need to query through analytics (except maybe billing, which you'd probably handle another way). Zynga pushes analytics data into their biggish (a few hundred nodes IIRC) Vertica cluster, which is far better for that sort of thing than most random access databases.
In any case most game state data wouldn't be very useful on its own, you typically want to look at event streams and histories of certain values, not snapshots of the current state.
When Cassandra came out, before they built a query language, their suggestion was to manually build indeices. Your data would be in one key value store then an index on the data was just another key value store with your data keys as the value and the index as the key.
It works out well enough if you never need ad hoc queries.