I've been really interested in this architecture since Jay Kreps' blog post on i...

divtxt · on May 28, 2015

You're quite right. Switching from a db to a log as the "master" loses the simplicity of the app-to-db model.

As the article says: "For now, a better option is to extract the log from a database" - i.e you use some tooling to generate a log from the db.

Indeed, you can now see tools that go in this direction usually by using the replication stream. (eg https://github.com/shyiko/mysql-binlog-connector-java and https://github.com/xstevens/decoderbufs)

boredandroid · on May 28, 2015

Here is how I think about this, there are three high level paradigms you see for processing: 1. Request/response (e.g. most UI actions, REST, etc) 2. Stream (e.g. subscribing to a Kafka topic) 3. Batch/periodic (e.g. Hadoop, DWH)

You actually need all of these at least somewhere in a company, and they each have their place.

The dividing line for request/response is that someone is waiting on the other end and that if the request fails you can just show them an error and continue on your way. So when a web service gets overwhelmed it usually just times out requests.

Consider an e-commerce site as an example: 1. Request/response Displaying a product Making a sale

2. Stream or batch Restocking Logistics and shipping Price adjustments Analytics Product catalog import Search index updates Etc

The later category is asynchronous so it can be done in a batch fashion (once an hour or day) if latency is not a concern, or in a streaming fashion if it needs to be faster.

dwenzek · on May 28, 2015

To connect a request/response system to a log-oriented system such as presented here, an option is to use the request/reply server as a proxy for the log-oriented one :

  - producing the requests to the log-oriented system,

  - consuming the responses at some endpoint/topic of the dataflow,

  - linking responses to requests using some id propagated along the whole dataflow,

  - dealing with a cache of pending requests, asynchronous responses and timeout

In practice, to keep things manageable - latency and count of pending connections, we can't always wait for the very end of the request processing dataflow; but we can at least choose an intermediate log/topic where progress is sufficient to forge a response or to reply with 202 accepted request.

fizx · on May 27, 2015

Typically, you cache the last write somewhere close so that you can manually splice it into the corresponding page render. Then you engineer the rest of the system such that the log processing system is rarely/never falling behind.

gbrits · on May 27, 2015

You do req/ack. I.e: reply with 202.

Afterwards, either poll or communicate over sockets back to client side. If there's workers involved on the server: add socketid in the message envelope so it can be matched on return.