Using PostgreSQL as a Dead Letter Queue for Event-Driven Systems Hackernews Viewer

Using PostgreSQL as a Dead Letter Queue for Event-Driven Systems

258 points by tanelpoder 25 January 2026 | 81 comments

Comments

TexanFeller 25 January 2026

Ofc I wouldn't us it for extremely high scale event processing, but it's great default for a message/task queue for 90% of business apps. If you're processing under a few 100m events/tasks per day with less than ~10k concurrent processes dequeuing from it it's what I'd default to.

I work on apps that use such a PG based queue system and it provides indispensable features for us we couldn't achieve easily/cleanly with a normal queue system such as being able to dynamically adjust the priority/order of tasks being processed and easily query/report on the content of the queue. We have many other interesting features built into it that are more specific to our needs as well that I'm more hesitant to describe in detail here.

rbranson 25 January 2026

Biggest thing to watch out with this approach is that you will inevitably have some failure or bug that will 10x, 100x, or 1000x the rate of dead messages and that will overload your DLQ database. You need a circuit breaker or rate limit on it.

exabrial 25 January 2026

> FOR UPDATE SKIP LOCKED

Learned something new today. I knew what FOR UPDATE did, but somehow I've never RTFM'd hard enough to know about the SKIP LOCKED directive. Thats pretty cool.

deepsun 6 February 2026

> CREATE INDEX idx_dlq_status ON dlq_events (status);

> CREATE INDEX idx_dlq_status_retry_after ON dlq_events (status, retry_after);

You don't need two indices when one is a prefix of another. Just one `idx_dlq_status_retry_after` will do the job.

with 25 January 2026

Great application of first principles. I think it's totally reasonable also, at even most production loads. (Example: My last workplace had a service that constantly roared at 30k events per second, and our DLQs would at most have orders of hundreds of messages in them). We would get paged if a message's age was older than an hour in the queue.

The idea is that if your DLQ has consistently high volume, there is something wrong with your upstream data, or data handling logic, not the architecture.

renewiltord 25 January 2026

Segment uses MySQL as queue not even as DLQ. It works at their scale. So there are many (not all) systems that can tolerate this as queue.

I have simple flow: tasks are order of thousands an hour. I just use postgresql. High visibility, easy requeue, durable store. With appropriate index, it’s perfectly fine. LLM will write skip locked code right first time. Easy local dev. I always reach for Postgres for event bus in low volume system.

kristov 25 January 2026

Why use shedlock and select-for-update-skip-locked? Shedlock stops things running in parallel (sort-of), but the other thing makes parallel processing possible.

jeeybee 26 January 2026

I maintain a small Postgres-native job queue for Python called PGQueuer: https://github.com/janbjorge/pgqueuer

It uses the same core primitives people are discussing here (FOR UPDATE SKIP LOCKED for claiming work; LISTEN/NOTIFY to wake workers), plus priorities, scheduled jobs, retries, heartbeats/visibility timeouts, and SQL-friendly observability. If you’re already on Postgres and want a pragmatic “just use Postgres” queue, it might be a useful reference / drop-in.

cpursley 25 January 2026

https://github.com/pgmq/pgmq

shoo 25 January 2026

re: SKIP LOCKED, introduced in postgres 9.5, here's an an archived copy [†] of the excellent 2016 2ndquadrant post discussing it

https://web.archive.org/web/20240309030618/https://www.2ndqu...

corresponding HN discussion thread from 2016 https://news.ycombinator.com/item?id=14676859

[†] it seems that all the old 2ndquadrant.com blog post links have been broken after their acquisition by enterprisedb

cmgriffing 26 January 2026

Only slightly related, but I have been using Oban as a Postgres native message queue in the elixir ecosystem and loving it. For my use case, it’s so much simpler than spinning up another piece of infrastructure like Kafka or rabbitmq

nottorp 26 January 2026

Hmm that raises a question for me.

I haven't done a project that uses a database (be it sql or no sql) where the amount of deletes is comparable to the amount of inserts (and far larger than like tens per day, of course).

How does your average db server work with that, performance wise? Intuitively I'd think it's optimized more for inserts than for deletes, but of course I may be wrong.

branko_d 26 January 2026

Why use string as status, instead of a boolean? That just wastes space for no discernable benefit, especially since the status is indexed. Also, consider turning event_type into an integer if possible, for similar reasons.

Furthermore, why have two indexes with the same leading field (status)?

awesome_dude 25 January 2026

I think that using Postgres as the message/event broker is valid, and having a DLQ on that Postgres system is also valid, and usable.

Having SEPARATE DLQ and Event/Message broker systems is not (IMO) valid - because a new point of failure is being introduced into the architecture.

Andys 26 January 2026

We did this at Chargify, but with MySQL. If Redis was unavailable, it would dump the job as a JSON blob to a mysql table. A cron job would periodically clean it out by re-enqueuing jobs, and it worked well.

nicoritschel 25 January 2026

lol a FOR UPDATE SKIP LOCKED post hits the HN homepage every few months it feels like

gytisgreitai 25 January 2026

Would be interesting to see the numbers this system processes. My bet is that they are not that high.

tantalor 26 January 2026

This is logging.

reactordev 25 January 2026

Another day, another “Using PostgreSQL for…” thing it wasn’t designed for. This isn’t a good idea. What happens when the queue goes down and all messages are dead lettered? What happens when you end up with competing messages? This is not the way.

tonymet 25 January 2026

Postgres is essentially a b-tree with a remote interface. Would you use a b-tree to store a dead letter queue? What is big O of insert & delete? what happens when it grows?

Postgres has a query interface, replication, backup and many other great utilities. And it’s well supported, so it will work for low-demand applications.

Regardless, you’re using the wrong data structure with the wrong performance profile, and at the margins you will spend a lot more money and time than necessary running it . And service will suffer.