Excerpt: "Some might say that ag and ripgrep and any of the other tools I list on beyondgrep.com are competing projects, but I think that way of thinking is wrong. It’s only a competition if you see it as a competition. I’m not competing against anyone for anything: Clicks, dollars, popularity, etc. If someone uses ripgrep instead of ack, it doesn’t hurt me. It’s the difference between an abundance vs. scarcity view of the world. I choose abundance. I think most of us who work in open source do, too."
I never thought I would see the confluence of the "woo-woo" space abundance mindset and a blog post about an open source command-line utility. I must say I am intrigued.
If this explanatory medium doesn't substantiate the reasonability of a mindset of abundance in the minds of programmers, I don't know what ever will.
I don't read that as a name-it-claim-it spell. I read it as an accuracy claim. That the competition is a self-fulfilling prophecy. I don't know what's "woo-woo" about that.
I’d wager they’re talking about the whole “The Secret” abundance mentality thing. “Believe and you shall receive... somehow”. That sort of interesting New Age stuff. I put no credence in it myself in terms of their stronger claims, but hey, CBT reminds me of it in a lot of ways and helped cure my depression.
Author here. I don't know about "The Secret" other than it being an Oprah thing, what, ten years ago?
Abundance vs. scarcity has nothing to do with “Believe and you shall receive... somehow”.
Scarcity thinking means that you fear giving people credit, and letting others have success. You fear that praising others makes you seem weak. You think that if someone uses a different open source project than yours, that you or your project suffers. You might not even be aware of the thinking. You might just feel it reflexively.
Abundance thinking says that there is more than enough praise to go around. It says that your success doesn't hurt me in mine (unless in some tangible way it does). It says that you can use ag and I can use ack and Susan can use ripgrep and it's all good.
In this specific case, abundance says that I, as the creator of ack, don't need to own the "market". In fact, the creation of other tools only helps ack. It gives the ack team ideas for things we can implement in ack. Who am I to think that I'm the only one with good ideas?
It gives our users a wider choice of tools. I put my work out there publicly to help people. Why would I not want them to have a variety to choose from?
Your view on the world shapes how you perceive the world (yes I've learned that in the tautology club) and as your reality is only what perceive, your view on reality shapes how you see it.
The table is misleading isn't it?
It seems to imply that rg can't work recursively,
but [1] states "..ripgrep defaults to recursive directory search...".
I understand that the table might mean that rg doesn't have a flag
to enable recursive search, but surely we care about features
more than flags...
The table doesn't yet distinguish between lacking a command line flag due to absence of a feature vs due to the feature being the default. This is similar to the situation with case-sensitive search in GNU grep, which ends up with a blank cell even though it is the default. See: https://github.com/beyondgrep/website/issues/72
It's a fair criticism but I don't think it is designed to be misleading.
Maybe not designed to be misleading, but misleading nonetheless. For `grep`, `grep -i needle` is (approximately) equal to `ack/ag needle`, but looking at the table I wouldn't think grep supports case sensitivity.
It's going to be two pages ultimately. If you're comparing features, you don't care about the how.
Right now we're keeping all the data in a JSON file that we massage into a chart. Massaging it into a true feature comparison chart, along with a rosetta stone, should be a simple matter of programming. Same thing with the GNU/POSIX/BSD rosetta stone we're working on.
I agree, but I know in my own project that my "competition" is moving, and I don't follow them. Thus any time like the linked one that I create quickly become out of date as they add new features.
Here to plug using `--passthru`/`--passthrough`: it will print all lines, but highlight matches. I often do things like this to watch an output log, but highlight all entries with the string PLUGH in them:
That's an orthogonal feature, it writes output after each line as opposed to every 4096 bytes, when the output is a pipe instead of a terminal. Useful when the other end of the pipe still goes to the terminal and you want to see it immediately. If `some-util-with-output` echoes stdin then without the option the following would not show you the latest grepped lines until the 4096 buffer fills.
tail -F output.log | grep --line-buffered TEXT | some-util-with-output
Seems pretty comprehensive. One confusing thing I found was these two:
"Don't respect ignore files (.gitignore, .ignore, etc)" vs "Skip rules found in VCS ignore files (.gitignore, .hgignore, etc)"
Aren't those the same thing? Shouldn't they be grouped for better comparison?
I’m stuck with rg right now because it’s the only ine which correctly handles gitignore files. Generally quite happy with it but I wish it could also use a more powerful regex engine for some less common cases.
Most annoying thing is that $ does not work with windows newlines.
Consider just switching to ripgrep instead. It's faster, and the default flags and interface are more thoughtful. This chart may make it seem less feature rich, but most of the 'features' it's missing are thing you'll never need, or things that your shell should be responsible for ("Pipe output through a pager or other command"?).
The only serious feature you might miss is lookahead/lookbehind in regexes - that's missing by design since if you want guaranteed linear time search you can't have those.
Anything that is permissively licensed (like ripgrep) is generally GPL compatible.[1] Note also that ripgrep is dual licensed under the Unlicense or the MIT license, both of which are explicitly GPL compatible according to [1].
And has the advantage of being pluggable rather than a hard-coded list of file types (though IIRC ack has a configuration file whereas with rg you need to use an alias to set up new types for every invocation).
ack has a configuration file, and it's extremely flexible, including the ability to check shebang lines. If you have a shell script without an extension, it's the only way to know what language it is.
No, the chart doesn't compare speed. Lord knows there have been many comparisons of the speeds of grep-alikes, but nobody has written up a comparison of features.
For me, raw speed is not as important as a rich feature set to support my code spelunking.
ripgrep's introductory blog post[1] includes a perf comparison, which incorporates sift. But sift is too slow to include in several benchmarks. sift's achievement is its fast parallel directory traverser, coupled with Go's vectorized IndexByte[2] function for simple literals. In that case, it is quite fast, but as soon as you enter Go's regex engine, it's game over.
Unfortunately that page does not say when the comparison was made, and which version of each tool was tested. Also, which "grep" is that? I assume GNU grep? There are others, though...
All in all, it'd still be nice to have a more comprehensive performance comparison page, which gets regularly updated. Bonus points if it shows how speed changes over time, similar to http://speed.pypy.org (the code for that is available, by the way).
Wishful thinking, I know, but hey, who knows... :-)
TL;DR ripgrep has gotten faster in important areas since the initial set of benchmarks (the proper comparison there would be https://github.com/BurntSushi/ripgrep/tree/master/benchsuite...). The key reasons why are because it grew a parallel directory traverser, and its line counting got vectorized courtesy of the bytecount[1] crate. ucg has gotten a little faster in some cases, but the general conclusion of "ripgrep is the fastest" is still correct.
Thank you for re-running them. You saved me the trouble. ;)
I would be curious to see ack get into the test suite, however. Even if it is much slower, I'd like to see the results.
And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?
ack will always be slower than ripgrep, but it shouldn't be as slow as it is in burntsushi's tests. In his tests, he's showing run times where ack takes 25x as long to run as ripgrep, and ack shouldn't be NEARLY that slow.
We think that there's something weird about his Perl installation that is making it so slow, but we haven't been able to figure it out.
I will dig back into this and see if I can figure it out. If you look at the recent commit history for ripgrep, you'll see I updated the timings for ack on my benchmark in my README. I have no explanation for it, but ack isn't as slow as it was when I went through this before.
Anyway, my Perl installation is the standard one on Archlinux. I will try on other systems.
> I would be curious to see ack get into the test suite, however. Even if it is much slower, I'd like to see the results.
You'll need to add it to the benchsuite script (which should be very easy to do, just peruse the source to see other examples), but for me, ack is too slow to benchmark this way. In theory, I'd be fine adding it to the same benchmarks as pt/sift are in, since they are also generally too slow to benchmark, but are at least fast enough in some of them to tolerate it. But ack has different characteristics. While pt/sift have a very high ceiling (like ack), they also have a very low floor in some cases. ack on the other hand has a reasonably high floor compared to the others, even in the simplest searches. This makes all benchmarks on ack take a long time.
I did a couple ad hoc benchmarks on the same machine:
The subtitles benchmarks are perhaps unfair because I think ack is more focused on directory tree search where as ripgrep claims to be good at both. I included a few anyway to show the difference though. In general, the benchmark just isn't that interesting, and it makes the benchmark run take a lot longer than it would otherwise (because each command is executed several times).
> And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?
Because ripgrep correctly supports Unicode, and does it by default because it can generally handle all Unicode features without a corresponding performance loss. GNU grep handles Unicode in general as well (assuming your system's locale settings are up to snuff), but it can pay a huge price for it some cases, although admittedly, I'd consider such cases to be somewhat infrequent in common usage. It's explained in my blog post: http://blog.burntsushi.net/ripgrep/#single-file-benchmarks --- The subtitles_no_literal is particularly interesting, because it shows what happens when you ask GNU grep to do the correct thing. ;-)
Note that both ag and ucg have the opportunity to support Unicode correctly, but they don't twiddle the right flags in their use of PCRE (and PCRE2, respectively). AFAIK, neither expose a flag to twiddle these things. From scanning the ack man page, I don't see any option there either, although I'm sure Perl regexes probably have that option too.
And no, I don't mix money with my free time side projects. Personal choice. Instead, just donate to a charity. My personal favorites are Rails Girls and Wikipedia. The Internet Archive is another good one!
Don't know how you'd pull it off in the current tabular format, but would be great to include some of the UX side of things: For example, I use rg almost exclusively because it has an easy-to remember syntax (`rg search-string` is a recursive search), attractively colored and well-organized output, and seems much faster compared to other tools in my use cases.
as a long time user of `find . -name "*.foo" -exec grep -Hin {} \;` moving to ack has been great! I love the syntax and the speed and the fact that it actually respects your ignore files. ripgrep is great too. ag on the other hand is recommended by everyone but doesn't seem to respect ignores or understand modern ignore syntax. give it a pass.
Spawning a separate grep for each file? That's terribly inefficient. At least use xargs which will run one process on as many files as possible.
But you know there's a -r for recursive, right? And unless you are using some historic relic of grep that is not GNU or BSD and doesn't understand the --include option you can just do:
I don't want to have to learn 10 different command syntaxes for walking directory trees, so find works well. The "+" terminator of find's exec is similar to xargs, but preserves the flexibility of find's exec.
At the least, I recommending moving past `find . --name '*.foo' -exec grep` pattern if you can help it.
Modern file searchers, of which `ag` is one among others, accepts `--filenametype` argument and skips the `.git` subdirectory if it exists (by default, can be toggled), so `ag --python needle` will recursively search for needle in the the current working tree in all files whos filenames end in `.py`.
Yes, you could write a function to do the same in `find`, but then you're just being stubborn. (Which is fine; my .bashrc is littered with aliases and functions of me being stubborn, but if I have to copy my .bashrc file around, I might as well install my preferred searcher to the target system if available.)
Use --include and --exclude-dirs to do what you describe. (The latter is a good idea to set in your GREP_OPTIONS, unless you actually search .git directories.)
GREP_OPTIONS is deprecated in GNU grep since i think 2.20. It will be removed in a future version, and until then it's going to print an irritating warning message every time you use it.
(I don't think there's any such plan for the various BSD greps, so if you use those exclusively you're probably fine.)
Seems everybody here has switched to rg but I haven't because it's not available in Debian repos (yet) unlike ack/ag. So, do such folks use Arch or just download a pre-built binary ? How about updating rg when a new version is out ?
The easiest way: first, install rust and cargo, either through your distribution, or thorough rustup if your distribution doesn't have it yet. Then run "cargo install -f ripgrep". It'll download the source code, build, and install to ~/.cargo/bin, which rustup adds to $PATH for you by default.
Edit: the "-f" in the "cargo install" command is for updating; without the -f, it refuses to install over an already installed version. The first time you install, you can omit the -f.
Note that if you install Rust through Debian, it likely won't be new enough to compile the latest version of ripgrep. I believe Debian packages Rust 1.14, and the last version of ripgrep to work on Rust 1.14 was 0.5.2. So, `cargo install --vers 0.5.2 ripgrep` might be what you want on Debian.
rg for the win. It's not some huge improvement, but it's more of a feeling that things just work by default (similar to what I get from tmux vs screen).
`alias rg='rg -S'` in your `.bashrc` will fix that for you.
Out of curiosity, what sort of searches do you do that smartcase is desirable? A meaningful number of people seem to prefer it, but I find that most of the time I want to be able to search for variables etc. case sensitively.
We were talking about defaults and feelings here, I'm aware that it's easy enough to fix (and to be fair, I now often use ripgrep with the inverse alias)…
I'm generally a big fan of smartcase. Most of the time I don't care about case and it's easier to type it that way, and when I care, it's often mixed or upper case, so smartcase Does What I Mean. And for the few times when I explicitly search for all-lowercase, it's easy enough to turn it off (M-c, -s, :set nosmartcase etc.).
I added smartcase in ack because I used it constantly in vim. I don't normally want to have to remember if the function I'm searching for is "format_ISBN" or "format_isbn", for example. Some languages (PHP) aren't always case-sensitive, so you need to search both. I'd rather use a "-I" in the few cases where I don't want case-sensitive, than having to remember to add "-i" in the 99% of the cases where I don't.
Like, platonic ideal- because it doesn't exist. That greatly ideal.
I don't know why I care, other than even the super cut down perl being a sizable % of the install size for some of the very small systems I've worked with.
I understand alias'es and wrappers fine, but I like having the environment have some way to contribute. It's just my simple preference.
There's also the related #314 which makes even more sense- per project configuration. Sure would be great being able to download a project & have it already setup nicely for ripgrep! https://github.com/BurntSushi/ripgrep/issues/314
Wow, cool. Thanks burntsushi. rg is already amazing. This project really has gotten an enormous amount of love & effort from you & it really shows through & through.
Your implementation of the gitignore algorithm is crazy impressive to me.
The problem with suffix arrays---even with a blazing fast SACA---is that they are slow. It will take a long time to generate an index for even a moderately sized code repository.
Typically, if you want an index, you build an inverted index, which maps terms (e.g., n-grams or tokens in your favorite PL) to a postings list. The postings list contains all of the documents in which that term occurs.