What I wish someone told me about Postgres

(challahscript.com)

Comments

munk-a 12 November 2024
While postgres is indeed case sensitive usually writing queries with keywords in all caps is an effort to increase legibility for visual pattern matching. It absolutely isn't needed but if I'm debugging a query of yours I will send it through my prettifier so that I can breeze through your definitions without getting hung up on minor weird syntax things.

It's like prettification in any other language - visual structures that we can quickly recognize (like consistent indentation levels) make us waste less time on comprehension of the obvious so we can focus on what's important.

The only thing I really object to is "actuallyUsingCaseInIdentifiers" I never want to see columns that require double quotes for me to inspect on cli.

christophilus 12 November 2024
I’d never stumbled across the “don’t do this” wiki entry[0] before. Very handy.

[0] https://wiki.postgresql.org/wiki/Don%27t_Do_This

pavel_lishin 12 November 2024
A lot of these aren't postgres-specific. (null weirdness, index column order, etc.)

For example, how nulls work - especially how interact with indexes and unique constraints - is also non-intuitive in mysql.

If you have a user table with a non-nullable email column and a nullable username column, and a uniqueness constraint on something like (email, username), you'll be able to insert multiple identical emails with a null username into that table - because a null isn't equivalent to another null.

jedberg 13 November 2024
My number one tip: Vacuum every day!

I didn't know this when I started, so I never vacuumed the reddit databases. Then one day I was forced to, and it took reddit down for almost a day while I waited for it to finish.

marcosdumay 12 November 2024
> Normalize your data unless you have a good reason not to

Ouch. You don't want to just say that and move on.

The author even linked to a page citing 10 different kinds of normalization (11 with the "non-normalized"). Most people don't even know what those are, and have no use for 7 of those. Do not send people on wild-goose chases after those higher normal forms.

ellisv 12 November 2024
I really wish developers cared more about normalization and stop shoving everything into a JSON(b) column.
avg_dev 12 November 2024
this is really nice. i am glad the author put it together. i didn't know the pg docs were 3200 pages long! i have been using it for a while and try to learn as i go. i really do like the docs. and i also like to read articles on various particular subjects as i find a need to.

i think the author might find it helpful to readers to add a note to https://challahscript.com/what_i_wish_someone_told_me_about_... that if someone is selecting for b alone, then an index on columns (b, a) would work fine. i think this is kind of implied when they talk about selecting on a alone, but maybe it wouldn't hurt to be extra explicit.

(i didn't spend much time on the json/jsonb part since i rarely use that stuff)

tomcam 12 November 2024
I wrote this to help beginners: https://tomcam.github.io/postgres/
the5avage 12 November 2024
From my experience with a lot of hilarious SQL stuff I have seen in the wild.

It would be a good start to read the paper of codd and trying to understand what the relational model is. It's only 11 pages long and doing that would reduce the suffering in this world.

nasretdinov 12 November 2024
Nice article! One thing I'd add is that almost all of it applies to other MVCC databases like MySQL too. While some details might be different, it too suffers from lon transactions, holds metadata locks during ALTERs, etc, all the good stuff :).
whiskey-one 12 November 2024
Really great post! It belongs on a reading list somewhere for everyone who is using Postgres independently or as a part of their stack.
ahachete 12 November 2024
> Most notably, 'null'::jsonb = 'null'::jsonb is true whereas NULL = NULL is NULL

Because 'null' in the JSON spec is a literal value (a constant), not SQL's NULL. Nothing to see here.

https://datatracker.ietf.org/doc/html/rfc7159

sroerick 12 November 2024
It’s always a relief to read stuff articles this, realize I know 90% of it, and I’ve deserved the jobs I’ve had.

Great and super useful notes

leoqa 13 November 2024
I had an interesting problem occur to the pg stats. We were migrating and had a version column, I.e key, val, version.

We were migrating from version 1 to version 2, double writing into the same table. An index on (key, val, version) was being hit by our reader process using a where clause like key=k and version=1.

When we flipped the reader to read version 2, the latency jumped from 30ms to 11s. Explain showed a sequential scan even though the index could serve the query. I was able to use MATERIALIZED and reorder CTEs to get the planner to do the right thing, but it caused an outage.

We were autovacuuming as well. I ended up deleting the old version and rebuilding the index.

My theory is that because the read load was on 50% of the data, the stats were super skewed.

jaza 13 November 2024
Only some of these are really Postgres specific (use "text" / "timestamptz"; make psql more useful; copy to CSV). Most of them apply to relational databases in general (learn how to normalise!; yes, NULL is weird; learn how indexes work!; locks and long-running transactions will bite you; avoid storing and querying JSON blobs of doom). Not that that detracts from the usefulness of this article - pretty much all of them are important things to know when working with Postgres.
zmmmmm 12 November 2024
> It’s possible that adding an index will do nothing

This is one of the more perplexing thing to me where Postgres ideology is a bit too strong, or at least, the way it works is too hard for me to understand (and I've tried - I'm not going to claim I'm a genius but I'm also not a moron). I hear there may be finally support for some kind of hints in upcoming versions, which would be very welcome to me. I've spent way too much time trying to divine the voodoo of why a slow query is not using indexes when it seems obvious that it should.

nodakai 12 November 2024
That nested JSON query operator chains such as json_col->'foo'->'bar'->>'baz' internally return (copy) entire sub-objects at each level and can be much slower than jsonb_path_query(json_col, '$.foo.bar.baz') for large JSONB data

... although I haven't had the chance to verify this myself

roydivision 13 November 2024
Really like the

Don't <thing>

Why not?

When should you?

format of the Postgres Don't Do This page.

jerrygoyal 13 November 2024
I recently asked a similar question on reddit and got many inputs https://www.reddit.com/r/PostgreSQL/comments/1gbr0it/experie...
vivzkestrel 12 November 2024
since we are on the topic and since your article clearly mentions "Normalize your data unless you have a good reason not to" I had to ask. I am trying to build a news aggregator and I have many websites. Each of them has slightly different format. Even though I use feedparser in python, it still doesn't change how some of them put html text inside content and some of them break it down into a separate media xml attribute while retaining only basic textual summary inside a summary attribute. Do you think it makes more sense to store a single rss item as a column inside postgres or should it be stored after parsing it? I can see upsides and downsides to both approaches. Store it as XML and you have the ability to change your processing logic down the line for each stored item but you lose the flexibility of querying metadata and you also have to parse it on the fly every single time. Store it in multiple columns after processing it and it may require different logic for different websites + changing your overall python processing logic requires a lot of thinking on how it might affect some source. What do you guys recommend?
samsartor 12 November 2024
Instead of psql, I really like https://github.com/dbcli/pgcli
deadbabe 12 November 2024
Don’t create views that reference other views.
RedShift1 12 November 2024
I can't horizontally scroll on mobile, can't see the full query texts...
cyanydeez 12 November 2024
I'd say I tend to ignore the standard docs because they rarely have examples and rely on the arcane procedure of trying to decipher the super command options with all it's "[OR THIS|THAT]".

I assume _someone_ can read this pseudo programming, but it's not me.

abobov 13 November 2024
My tip is a better pager for psql: https://github.com/okbob/pspg
cryptonector 12 November 2024
Watch out, for row/record values, if a column in the row/record is NULL then IS NULL will be true! You want to use IS [NOT] DISTINCT FROM NULL, full stop.
fatih-erikli-cg 13 November 2024
Storing data in text costs less. A tcp connection to get some blog posts from another process is not necessary.
teddyh 12 November 2024

  \pset null '␀'
5Qn8mNbc2FNCiVV 13 November 2024
Your code sections are almost unscrollable on mobile
mikrotikker 13 November 2024
Shout-out to my postgres server that has been sitting untouched doing it's thing perfectly for 10 years, you're a real one
philipwhiuk 12 November 2024
> Name your tables in snake_case

This bit me. It's highly tedious that case isn't preserved honestly.

michaelcampbell 13 November 2024
> null weirdness

Oracle enters the chat... (last I used it it considered an empty string '' the same as NULL)

onoke 12 November 2024
Haha, good job. :)
TwoNineFive 12 November 2024
"going on a journey similar to mine"

At this point "journey" is a cringe word because of it's excessive usage in blogspamverts. It tells me you are a gross aspiring influencer.

cynicalsecurity 12 November 2024
People who use PostgreSQL instead of MySQL just want to suffer while pretending "they are better than others".
Jean-Papoulos 13 November 2024
>"Normalize your data"

>"You don't need to write SQL in uppercase"

>"What's an index ?" section

From reading this, it's clear that the author never sat down and learned to use databases from the ground up. The author started using them and learned as he went, so his "tips" include things you'll be told in the first hour of any course.

This doesn't hold any value for someone who's been using databases for almost any length of time.

andrewstuart 12 November 2024
Dump your schema, drop the entire thing into ChatGPT or Claude and ask it to write your Postgres query.

Then ask it to rewrite the query three different ways and explain the pros and cons of each.

Do the same for any existing queries in your system … drop in the schema then the query and ask for an analysis and suggested improvements.