Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been really interested in this architecture since Jay Kreps' blog post on it. One part that I'm less clear on is how this fits in with request-response style communication between, say, a Web browser and a Web server.

In a simple Web-app-writes-to-DB scenario, it's easy to read my writes, but with a async log processing system, how am I supposed to organize my code so I can read my writes and respond with useful information?

Maybe the solution is to eschew request-response entirely and have all requests return 200, then poll or use two-way communication?

Alternatively, I could have my log-appending operation return a value indicating the position in the totally-ordered log, which I could pass to the query interfaces as a way of indicating "don't return until you've processed at least to here." Does anyone do that?

Am I totally off base here? I'd love to hear from anyone who is using these kinds of systems today.



You're quite right. Switching from a db to a log as the "master" loses the simplicity of the app-to-db model.

As the article says: "For now, a better option is to extract the log from a database" - i.e you use some tooling to generate a log from the db.

Indeed, you can now see tools that go in this direction usually by using the replication stream. (eg https://github.com/shyiko/mysql-binlog-connector-java and https://github.com/xstevens/decoderbufs)


Here is how I think about this, there are three high level paradigms you see for processing: 1. Request/response (e.g. most UI actions, REST, etc) 2. Stream (e.g. subscribing to a Kafka topic) 3. Batch/periodic (e.g. Hadoop, DWH)

You actually need all of these at least somewhere in a company, and they each have their place.

The dividing line for request/response is that someone is waiting on the other end and that if the request fails you can just show them an error and continue on your way. So when a web service gets overwhelmed it usually just times out requests.

Consider an e-commerce site as an example: 1. Request/response Displaying a product Making a sale

2. Stream or batch Restocking Logistics and shipping Price adjustments Analytics Product catalog import Search index updates Etc

The later category is asynchronous so it can be done in a batch fashion (once an hour or day) if latency is not a concern, or in a streaming fashion if it needs to be faster.


To connect a request/response system to a log-oriented system such as presented here, an option is to use the request/reply server as a proxy for the log-oriented one :

  - producing the requests to the log-oriented system,

  - consuming the responses at some endpoint/topic of the dataflow,

  - linking responses to requests using some id propagated along the whole dataflow,

  - dealing with a cache of pending requests, asynchronous responses and timeout
In practice, to keep things manageable - latency and count of pending connections, we can't always wait for the very end of the request processing dataflow; but we can at least choose an intermediate log/topic where progress is sufficient to forge a response or to reply with 202 accepted request.


Typically, you cache the last write somewhere close so that you can manually splice it into the corresponding page render. Then you engineer the rest of the system such that the log processing system is rarely/never falling behind.


You do req/ack. I.e: reply with 202.

Afterwards, either poll or communicate over sockets back to client side. If there's workers involved on the server: add socketid in the message envelope so it can be matched on return.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: