Using PostgreSQL as a Dead Letter Queue for Event-Driven Systems

(diljitpr.net)

Comments

TexanFeller 25 January 2026
Ofc I wouldn't us it for extremely high scale event processing, but it's great default for a message/task queue for 90% of business apps. If you're processing under a few 100m events/tasks per day with less than ~10k concurrent processes dequeuing from it it's what I'd default to.

I work on apps that use such a PG based queue system and it provides indispensable features for us we couldn't achieve easily/cleanly with a normal queue system such as being able to dynamically adjust the priority/order of tasks being processed and easily query/report on the content of the queue. We have many other interesting features built into it that are more specific to our needs as well that I'm more hesitant to describe in detail here.

rbranson 25 January 2026
Biggest thing to watch out with this approach is that you will inevitably have some failure or bug that will 10x, 100x, or 1000x the rate of dead messages and that will overload your DLQ database. You need a circuit breaker or rate limit on it.
exabrial 25 January 2026
> FOR UPDATE SKIP LOCKED

Learned something new today. I knew what FOR UPDATE did, but somehow I've never RTFM'd hard enough to know about the SKIP LOCKED directive. Thats pretty cool.

with 25 January 2026
Great application of first principles. I think it's totally reasonable also, at even most production loads. (Example: My last workplace had a service that constantly roared at 30k events per second, and our DLQs would at most have orders of hundreds of messages in them). We would get paged if a message's age was older than an hour in the queue.

The idea is that if your DLQ has consistently high volume, there is something wrong with your upstream data, or data handling logic, not the architecture.

jeeybee 26 January 2026
I maintain a small Postgres-native job queue for Python called PGQueuer: https://github.com/janbjorge/pgqueuer

It uses the same core primitives people are discussing here (FOR UPDATE SKIP LOCKED for claiming work; LISTEN/NOTIFY to wake workers), plus priorities, scheduled jobs, retries, heartbeats/visibility timeouts, and SQL-friendly observability. If you’re already on Postgres and want a pragmatic “just use Postgres” queue, it might be a useful reference / drop-in.

kristov 25 January 2026
Why use shedlock and select-for-update-skip-locked? Shedlock stops things running in parallel (sort-of), but the other thing makes parallel processing possible.
cmgriffing 26 January 2026
Only slightly related, but I have been using Oban as a Postgres native message queue in the elixir ecosystem and loving it. For my use case, it’s so much simpler than spinning up another piece of infrastructure like Kafka or rabbitmq
shoo 25 January 2026
re: SKIP LOCKED, introduced in postgres 9.5, here's an an archived copy [†] of the excellent 2016 2ndquadrant post discussing it

https://web.archive.org/web/20240309030618/https://www.2ndqu...

corresponding HN discussion thread from 2016 https://news.ycombinator.com/item?id=14676859

[†] it seems that all the old 2ndquadrant.com blog post links have been broken after their acquisition by enterprisedb

nottorp 26 January 2026
Hmm that raises a question for me.

I haven't done a project that uses a database (be it sql or no sql) where the amount of deletes is comparable to the amount of inserts (and far larger than like tens per day, of course).

How does your average db server work with that, performance wise? Intuitively I'd think it's optimized more for inserts than for deletes, but of course I may be wrong.

branko_d 26 January 2026
Why use string as status, instead of a boolean? That just wastes space for no discernable benefit, especially since the status is indexed. Also, consider turning event_type into an integer if possible, for similar reasons.

Furthermore, why have two indexes with the same leading field (status)?

Andys 26 January 2026
We did this at Chargify, but with MySQL. If Redis was unavailable, it would dump the job as a JSON blob to a mysql table. A cron job would periodically clean it out by re-enqueuing jobs, and it worked well.
cpursley 25 January 2026
nicoritschel 25 January 2026
lol a FOR UPDATE SKIP LOCKED post hits the HN homepage every few months it feels like
renewiltord 25 January 2026
Segment uses MySQL as queue not even as DLQ. It works at their scale. So there are many (not all) systems that can tolerate this as queue.

I have simple flow: tasks are order of thousands an hour. I just use postgresql. High visibility, easy requeue, durable store. With appropriate index, it’s perfectly fine. LLM will write skip locked code right first time. Easy local dev. I always reach for Postgres for event bus in low volume system.

gytisgreitai 25 January 2026
Would be interesting to see the numbers this system processes. My bet is that they are not that high.
awesome_dude 25 January 2026
I think that using Postgres as the message/event broker is valid, and having a DLQ on that Postgres system is also valid, and usable.

Having SEPARATE DLQ and Event/Message broker systems is not (IMO) valid - because a new point of failure is being introduced into the architecture.

tantalor 26 January 2026
This is logging.
reactordev 25 January 2026
Another day, another “Using PostgreSQL for…” thing it wasn’t designed for. This isn’t a good idea. What happens when the queue goes down and all messages are dead lettered? What happens when you end up with competing messages? This is not the way.
tonymet 25 January 2026
Postgres is essentially a b-tree with a remote interface. Would you use a b-tree to store a dead letter queue? What is big O of insert & delete? what happens when it grows?

Postgres has a query interface, replication, backup and many other great utilities. And it’s well supported, so it will work for low-demand applications.

Regardless, you’re using the wrong data structure with the wrong performance profile, and at the margins you will spend a lot more money and time than necessary running it . And service will suffer.