SQLite JSON at full index speed using generated columns Hackernews Viewer

SQLite JSON at full index speed using generated columns

381 points by upmostly 12 December 2025 | 114 comments

Comments

eliasdejong 12 December 2025

It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.

Disclaimer: I am working on this.

https://github.com/fastserial/lite3

srameshc 12 December 2025

I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.

Loading data into DuckDB is super easy, I was surprised :

SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';

and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'

jelder 12 December 2025

I thought this was common practice, generated columns for JSON performance. I've even used this (although it was in Postgres) to maintain foreign key constraints where the key is buried in a JSON column. What we were doing was slightly cursed but it worked perfectly.

upmostly 12 December 2025

I was inspired to write this blog post after reading bambax's comment on a HN post back in 2023: https://news.ycombinator.com/item?id=37082941

kevinsync 12 December 2025

Hilariously, I discovered this very technique a couple weeks ago when Claude Code presented it out of the blue as an option with an implemented example when I was trying to find some optimizations for something I'm working on. It turned out to be a really smart and performant choice, one I simply wasn't aware of because I hadn't really kept up with new SQLite features the last few years at all.

Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!

tracker1 12 December 2025

As others mention, you can create indexes directly against the json without projecting in to a computed column... though the computed column has the added benefit of making certain queries easier.

That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.

Lex-2008 12 December 2025

interesting, but can't you use "Index On Expression" <https://sqlite.org/expridx.html>?

i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?

i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.

bambax 13 December 2025

Opening an article on HN, seeing one of my comments quoted at the top, and then finding out the whole article is about that one comment: that's a first!

> So, thanks bambax!

You're most welcome! And yes, SQLite is awesome!!

ellisv 12 December 2025

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.

verytrivial 12 December 2025

If you replace JSON with XML in this model it is exactly what the "document store" databases from the 90s and 00s were doing -- parsing at insert and update time, then touching only indexes at query time. It is indeed cool that sqlite does this out of the box.

AlexErrant 12 December 2025

I was looking for a way to index a JSON column that contains a JSON array, like a list of tags. AFAIK this method won't work for that; you'll either need to use FTS or a separate "tag" table that you index.

simonw 12 December 2025

Tiny bug report: I couldn't edit text in those SQL editor widgets from my iPhone, and I couldn't scroll them to see text that extended past the width of the page either.

rrmdp 12 December 2025

The fact the DB is portable is amazing, I use it for all my projects now but didn't know about this JSON feature

Seattle3503 12 December 2025

It says full speed, but no benchmarks were performed to verify if performance was really equivalent.

pawelduda 12 December 2025

I've been coding a lot of small apps recently, and going from local JSON file storage to SQLite has been a very natural path of progression, as data's order of magnitude ramps up. A fully performant database which still feels as simple as opening and reading from a plain JSON file. The trick you describe in the article is actually an unexpected performance buffer that'll come in handy when I start hitting next bottleneck :) Thank you

dmezzetti 12 December 2025

I love this feature. I've long used json_extract to create dynamic columns with txtai sql: https://neuml.github.io/txtai/embeddings/query/#dynamic-colu...

You can do the same with DuckDB and Postgres too.

focusgroup0 12 December 2025

Would this be a good fit for migrating from mongo --> sqlite? A task I am dreading

mcluck 12 December 2025

Very cool article. To really drill it home, I would have loved to see how the query plan changes. It _looks_ like it should Just Work(tm) but my brain refuses to believe that it's able to use those new indexes so flawlessly

meindnoch 12 December 2025

In the 2nd section you're using a CREATE TABLE plus three separate ALTER TABLE calls to add the virtual columns. In the 3rd section you're using a single CREATE TABLE with the virtual columns included from the get go.

Why?

moregrist 12 December 2025

Generated columns are pretty great, but what I would really love is a Postgres-style gin index, which dramatically speeds up json queries for unanticipated keys.

oars 12 December 2025

Great article with clear instructions - could be quite useful if I need to do stuff with storing JSON in SQLite in the future.

zackify 13 December 2025

I love jsonb support in sqlite.

Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.

Also being able to use the same light orm abstraction server side with bun:sqlite is huge.

rcarmo 12 December 2025

I've been using this trick for a while, and it actually got me to do quite a bit without an ORM (just hacking a sane models.py with a few stable wrappers and calling it a day)

bushbaba 12 December 2025

For smaller datasets (100s of thousands of rows) I don’t see why you wouldn’t just use json columns with generated column/index where needed

mring33621 12 December 2025

IIRC, Vertica had/has a similar feature.

N_Lens 12 December 2025

What a neat trick, I love SQLite as well.

baq 12 December 2025

My understanding is Snowflake works kinda like that behind the scenes right?

javantanna 12 December 2025

Your website looks like supermemory.ai , BTW its pretty cool

pipe01 12 December 2025

MongoDB is dead, long live MongoDB

stacktraceyo 12 December 2025

Can I do this with pocket base?

maxpert 12 December 2025

LOL what are the odds, I posted in `Show HN` about Marmot today https://github.com/maxpert/marmot/releases/tag/v2.2.0 and in my head I was thinking exact same thing for supporting MySQL's JSON datatype. At some level I am starting to feel, I might as well be able to expose a full MongoDB compatible protocol that let's you talk to tables as collections, solving this problem once it for all!

But this technique I guess is very common now.

joshtbradley 12 December 2025

Dude what? This is incredible knowledge. I had been fearing this exact problem for so long, but there is an elegant out of the box solution. Thank you!!

groundzeros2015 12 December 2025

> We've got an embedded SQLite-in-the-browser component on our blog

What?