It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.
I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.
Loading data into DuckDB is super easy, I was surprised :
SELECT
avg(sale_price),
count(DISTINCT customer_id)
FROM '/my-data-lake/sales/2024/*.json';
and you can also load into a JSON type column and can use postgres type syntax
col->>'$.key'
I thought this was common practice, generated columns for JSON performance. I've even used this (although it was in Postgres) to maintain foreign key constraints where the key is buried in a JSON column. What we were doing was slightly cursed but it worked perfectly.
Hilariously, I discovered this very technique a couple weeks ago when Claude Code presented it out of the blue as an option with an implemented example when I was trying to find some optimizations for something I'm working on. It turned out to be a really smart and performant choice, one I simply wasn't aware of because I hadn't really kept up with new SQLite features the last few years at all.
Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!
As others mention, you can create indexes directly against the json without projecting in to a computed column... though the computed column has the added benefit of making certain queries easier.
That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.
i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?
i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.
Opening an article on HN, seeing one of my comments quoted at the top, and then finding out the whole article is about that one comment: that's a first!
I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.
It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.
If you replace JSON with XML in this model it is exactly what the "document store" databases from the 90s and 00s were doing -- parsing at insert and update time, then touching only indexes at query time. It is indeed cool that sqlite does this out of the box.
I was looking for a way to index a JSON column that contains a JSON array, like a list of tags. AFAIK this method won't work for that; you'll either need to use FTS or a separate "tag" table that you index.
Tiny bug report: I couldn't edit text in those SQL editor widgets from my iPhone, and I couldn't scroll them to see text that extended past the width of the page either.
I've been coding a lot of small apps recently, and going from local JSON file storage to SQLite has been a very natural path of progression, as data's order of magnitude ramps up. A fully performant database which still feels as simple as opening and reading from a plain JSON file. The trick you describe in the article is actually an unexpected performance buffer that'll come in handy when I start hitting next bottleneck :) Thank you
Very cool article. To really drill it home, I would have loved to see how the query plan changes. It _looks_ like it should Just Work(tm) but my brain refuses to believe that it's able to use those new indexes so flawlessly
In the 2nd section you're using a CREATE TABLE plus three separate ALTER TABLE calls to add the virtual columns. In the 3rd section you're using a single CREATE TABLE with the virtual columns included from the get go.
Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.
Also being able to use the same light orm abstraction server side with bun:sqlite is huge.
Generated columns are pretty great, but what I would really love is a Postgres-style gin index, which dramatically speeds up json queries for unanticipated keys.
I've been using this trick for a while, and it actually got me to do quite a bit without an ORM (just hacking a sane models.py with a few stable wrappers and calling it a day)
LOL what are the odds, I posted in `Show HN` about Marmot today https://github.com/maxpert/marmot/releases/tag/v2.2.0 and in my head I was thinking exact same thing for supporting MySQL's JSON datatype. At some level I am starting to feel, I might as well be able to expose a full MongoDB compatible protocol that let's you talk to tables as collections, solving this problem once it for all!
Dude what? This is incredible knowledge. I had been fearing this exact problem for so long, but there is an elegant out of the box solution. Thank you!!
SQLite JSON at full index speed using generated columns
(dbpro.app)354 points by upmostly 12 December 2025 | 105 comments
Comments
Disclaimer: I am working on this.
https://github.com/fastserial/lite3
Loading data into DuckDB is super easy, I was surprised :
SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';
and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'
Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!
That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.
i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?
i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.
> So, thanks bambax!
You're most welcome! And yes, SQLite is awesome!!
It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.
Why?
Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.
Also being able to use the same light orm abstraction server side with bun:sqlite is huge.
But this technique I guess is very common now.
You can do the same with DuckDB and Postgres too.
What?