I agree. But it can be useful to have a guarantee, even for a specified period of time, that the message will only be seen once. For example, if the processing of that message is very expensive, such as if that message results in API requests to a very expensive SaaS service. It may be idempotent to process that message more times than necessary, but doing so may be cost prohibitive if you are billed per request. I think this is a case where using the VT to help you only process that message one time could help out quite a bit.
Agreed that this property is useful for efficiency. The danger comes from users believing this property is a guarantee and making correctness decisions based on it. There is no such thing as a distributed lock or exclusive hold.
This is more or less where my brain was heading. The utility of exactly once system is quite high. But the utility of coding knowing that it is at least once, hopefully, and if not we'll do a catch-up, means that I will code knowing that my work must be safe in an environment with these constraints.
The best argument I have for a postgres background worker queue is simplicity. You're already managing a db, why not have the db simplify your startup's infrastructure. Honestly, with heroku, sidekiq, and redis it isn't even simplifying. But for startups simplicity is useful so they can scale when they need to and avoid complexity when possible. But that's the only argument I'd make.
IMO, it is most valuable when you are looking for ways of reducing complexity. For a lot of projects, if you're already running Postgres then it is maybe not worth the added complexity of bringing in another technology.
If the message never reaches the queue (network error, database is down, app is down, etc), then yes that is a 0 delivery scenario. Once the message reaches the queue though, it is guaranteed that only a single consumer can read the message for the duration of the visibility timeout. FOR UPDATE guarantees only a single consumer can read the record, and the visibility timeout means we don't have to hold a lock. After that visibility timeout expires, it is an at-least-once scenario. Any suggestion for how we could change the verbiage on the readme to make that more clear?
You are talking about distributed systems, nobody expects to read "exactly once" delivery. If I read that on the docs, I consider that a huge red flag.
And the fact is that what you describe is a performance optimization, I still have to write my code so that it is idempotent, so that optimization does not affect me in any other way, because exactly once is not a thing.
All of this to say, I'm not even sure it's worth mentioning?
You should let the Apache Flink team know, they mention exactly-once processing on their home page (under "correctness guarantees") and in their list of features.
Exactly once delivery is not the same as exactly once processing.
Exactly once delivery is an impossibility; see the two generals problem. Exactly once processing is usually at-least-once delivery coupled with deduplication or an otherwise idempotent action
You CAN have "at most once" processing, since there is always the option of processing zero messages due to systems being down, but it's a joint effort, nobody is able to offer it "for free" (as in, no effort to implement it)
> Once the message reaches the queue though, it is guaranteed that only a single consumer can read the message for the duration of the visibility timeout.
But does the message get persisted and replicated before any consumer gets the message (and after the submission of the message is acked)? If it's a single node, the answer is simply "no": the hard drive can melt before anyone reads the message and the data is lost. It's not "exactly once" if nobody gets the message.
And if the message is persisted and replicated, but there's subsequently a network partition, do multiple consumers get to read the message? What happens if writing the confirmation from the consumer fails, does the visibility timeout still expire?
> After that visibility timeout expires, it is an at-least-once scenario.
That's not really what "at least once" refers to. That's normal operation, and sets the normal expectations of how the system should work under normal conditions. What matters is what happens under abnormal conditions.
> FOR UPDATE guarantees only a single consumer can read the record, and the visibility timeout means we don't have to hold a lock
This is not always true!
Postgres does snapshot isolation within transactions that begins on the first statement within a transaction. This means that some isolation levels like REPEATABLE READ can feel misleadingly “safer” but actually break your guarantee. From the docs:
> Note also that if one is relying on explicit locking to prevent concurrent changes, one should either use Read Committed mode, or in Repeatable Read mode be careful to obtain locks before performing queries. A lock obtained by a repeatable read transaction guarantees that no other transactions modifying the table are still running, but if the snapshot seen by the transaction predates obtaining the lock, it might predate some now-committed changes in the table. A repeatable read transaction's snapshot is actually frozen at the start of its first query or data-modification command (SELECT, INSERT, UPDATE, DELETE, or MERGE), so it is possible to obtain locks explicitly before the snapshot is frozen.
Well that only is a problem with locks if you actually need locks to satisfy your guarantee and you acquire the lock after your first read. However repeatable reads are never a good guarantee without locking
As u/Fire-Dragon-DoL says, you can have an exactly-once guarantee within this small system, but you can't extend it beyond that. And even then, I don't think you can have it. If the DB is full, you can't get new messages and they might get dropped, and if the threads picking up events get stuck or die then messages might go unprocessed and once again the guarantee is violated. And if processing an event requires having external side-effects that cannot be rolled back then you're violating the at-most-once part of the exactly-once guarantee.
> you can have an exactly-once guarantee within this small system
It's actually harder to do this in a small system. I submit a message to the queue, and it's saved and acknowledged. Then the hard drive fails before the message is requested by a consumer. It's gone. Zero deliveries, or "at most once".
> If the DB is full, you can't get new messages and they might get dropped
In this case, I'd expect the producer to receive a failure, so technically there's nothing to deliver.
> if the threads picking up events get stuck or die
While this obviously affects delivery, this is a concurrency bug and not a fundamental design choice. The failures folks usually refer to in this context are ones that are outside of your control, like hardware failures or power outages.
pgmq.archive() gives us an API to retain messages on your queue, its an alternative to pgmq.delete(). For me as a long-time Redis user, message retention was always important and was always extra work to implement.
DLQ isn't a built-in feature to PGMQ yet. We run PGMQ our SaaS at Tembo.io, and the way we implement the DLQ is by checking the message's read_ct value. When it exceeds some value, we send the message to another queue rather than processing it. Successfully processed messages end up getting pgmq.archive()'d.
This is exactly how we do this in our SaaS at Tembo.io. We check read_ct, and move the message if >= N. I think it would be awesome if this were a built-in feature though.
We talk a little bit in https://tembo.io/blog/managed-postgres-rust about how we use PGMQ to run our SaaS at Tembo.io. We could have ran a Redis instance and used RSMQ, but it simplified our architecture to stick with Postgres rather than bringing in Redis.
As for scaling - normal Postgres scaling rules apply. max_connections will determine how many concurrent applications can connect. The queue workload (many insert, read, update, delete) is very OLTP-like IMO, and Postgres handles that very well. We wrote some about dealing with bloat in this blog: https://tembo.io/blog/optimizing-postgres-auto-vacuum
Simplicity is one of the reasons we started this project. IMO, far less maintenance overhead to running Postgres compared to RabbitMQ, especially if you are already running Postgres in your application stack. If PGMQ fits your requirements, then you do not need to introduce a new technically.
There's definitely use cases where PGMQ wont compare to RabbitMQ, or Kafka, though.
PGMQ doesn't give you a way to deliver the same message to concurrent consumers the same way that you can with Kafka via consumer groups.
To get this with PGMQ, you'd need to do something like creating multiple queues, then send messages to all the queues within a transaction. e.g. `begin; pgmq.send('queue_a'...); pgmq.send('queue_b'...); commit;`
PGMQ doesn't seem to have functionality for returning a payload to the task submitter.
For example, lets say the task is something like:
Run a SELECT query on database FOO, returning the results
There would be workarounds (ie store the results in a "results" table or similar) but just being able to directly return the result in a message to the caller is conceptually simpler.
I think it would be tough to compare. There are client libraries for several languages, but the project is mostly a SQL API to the queue operations like send, read, archive, delete using the same semantics as SQS/RSMQ.
Any language that can connect to Postgres can use PGMQ, whereas it seems River is Go only?
Take a look at https://github.com/tembo-io/pg_vectorize. It makes it a lot easier to get started. It runs on pgvector, but as a user, its completely abstracted from you. It also provides you with a way to auto-update embeddings as you add new data or update existing source data.
This is good until it isn’t. Tried to get it working for 4 hours and it just did not.
And then I had an important architectural gotcha moment: I want my database to be dump. Its purpose is to store and query data in an efficient and ACID way.
Adding cronjobs and http calls to the database is a bad idea.
I love the simplicity and that it helps to keep embedding a up to date (if it works), but I decided to not treat my database as application.
check out https://github.com/tembo-io/pg_vectorize - we're taking it a little bit beyond just the storage and index. The project uses pgvector for the indices and distance operators, but also adds a simpler API, hooks into pre-trained embedding models, and helps you keep embeddings updated as data changes/grows