One of my favorite moments in HN history was watching the authors of the various search tools decide on a common ".ignore" file as opposed to each having their own: https://news.ycombinator.com/item?id=12568245
I’ve read this multiple times over the years and this post is still the most interesting and informative piece describing the problem of making a fast grep-like tool. I love that it doesn’t just describe how ripgrep works but also how all the other tools work and then compares the various techniques. It’s simultaneously a tutorial and an expert deep dive. Just a beautiful piece of writing. In a perfect world, all code would be similarly documented.
Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
I don't know if this is coincidence or not but Cursor just made a post breaking down why they moved to their own solution in place or Ripgrep and it makes a lot of sense from a cursory (haha) read.
Ripgrep is used as the defautl search backend for ArchiveBox, such a good tool. I was on ag (the-silver-searcher) for years before I switched, but haven't gone back since.
When I first heard about ripgrep my reaction was laughing. grep had been too established. No way something that isn't 100% compatible with grep could get any traction.
And I was dead wrong. Overnight everyone uses rg (me included).
I was using ripgrep once and it had a bug that led me downa terrifying rabbit hole - I can't recall what it was but it involved not being able to find text that absolutely should have been there.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
I don't remember why I didn't switch from ag, but I remember it was a conscious decision. I think it had something to do with configuration, rg using implicit '.ignore' file (a super-generic name instead of a proper tool-specific config) or even .gitignore, or something else very much unwarranted, that made it annoying to use. Cannot remember, really, only remember that I spent too much time trying to make it behave and decided it isn't worth it. Anyway, faster is nice, but somehow I don't ever feel that ag is too slow for anything. The switch from the previous one (what was it? ack?) felt like a drastic improvement, but ag vs. rg wasn't much difference to me in practice.
One thing I learned over the years is that the closer my setup is to the default one, the better. I tried switching to the latest and greatest replacements, such as ack or ripgrep for grep, or httpie for curl, just to always return to the default options. Often, the return was caused by a frustration of not having the new tools installed on the random server I sshed to. It's probably just me being unable to persevere in keeping my environment customized, and I'm happy to see these alternative tools evolve and work for other people.
I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, I’ve been puzzled in the past when inputing `stgit` doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
And burntsushi is one of us: he's regularly here on HN. Big thanks to him. As soon as rg came out I was building it on Linux. Now it ships stocks with Debian (since Bookworm? Don't remember): thanks, thanks and more thanks.
It seems to me that `rg` is the number one most important part that enables LLMs to be smart agents in a codebase. Who would have thought that a code search tool would enable AGI?
Faster is not always the best thing. I still remember when vs code changed to ripgrep I had to change my habit using it, before then I can just open vs code to any folder and do something with it, even if the folder contains millions of small text files. It worked fine before, but then rg was picked, and it happily used all of my cpu cores scanning files, made me unable to do anything for awhile.
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.
Ripgrep is faster than grep, ag, git grep, ucg, pt, sift (2016)
(burntsushi.net)281 points by jxmorris12 13 hours ago | 121 comments
Comments
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
https://cursor.com/blog/fast-regex-search
There's also RGA (ripgrep-all) which searches binary files like PDFs, ebooks, doc files: https://github.com/phiresky/ripgrep-all
And I was dead wrong. Overnight everyone uses rg (me included).
It’s fast even on a 300mhz Octane.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
https://x.com/CharlieMQV/status/1972647630653227054
Someone please make an awesome new sed and awk.
I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
Look at Stacked Git:
https://stacked-git.github.io/
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, I’ve been puzzled in the past when inputing `stgit` doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
The TUI is great, and approximate matches are insanely useful.
https://reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_more_li...
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.