This chart doesn't compare speed. Until rg, I didn't grep much because I general...

petdance · on Jan 8, 2018

No, the chart doesn't compare speed. Lord knows there have been many comparisons of the speeds of grep-alikes, but nobody has written up a comparison of features.

For me, raw speed is not as important as a rich feature set to support my code spelunking.

andersonfreitas · on Jan 8, 2018

The `sift` [1] tool presents a performance comparison, but not against `rg`.

[1] https://sift-tool.org/performance

burntsushi · on Jan 8, 2018

ripgrep's introductory blog post[1] includes a perf comparison, which incorporates sift. But sift is too slow to include in several benchmarks. sift's achievement is its fast parallel directory traverser, coupled with Go's vectorized IndexByte[2] function for simple literals. In that case, it is quite fast, but as soon as you enter Go's regex engine, it's game over.

[1] - http://blog.burntsushi.net/ripgrep/

[2] - https://golang.org/pkg/bytes/#IndexByte

BlackFingolfin · on Jan 8, 2018

Unfortunately that page does not say when the comparison was made, and which version of each tool was tested. Also, which "grep" is that? I assume GNU grep? There are others, though...

All in all, it'd still be nice to have a more comprehensive performance comparison page, which gets regularly updated. Bonus points if it shows how speed changes over time, similar to http://speed.pypy.org (the code for that is available, by the way).

Wishful thinking, I know, but hey, who knows... :-)

bradknowles · on Jan 8, 2018

There is more detail available in the benchmark runs at https://github.com/BurntSushi/ripgrep/tree/master/benchsuite...

However, those are from 2016, and so it's hard to tell what might have changed in the meanwhile.

burntsushi · on Jan 9, 2018

The benchmark suite can be run by anyone: https://github.com/BurntSushi/ripgrep/blob/master/benchsuite...

I re-ran them :-) https://github.com/BurntSushi/ripgrep/tree/master/benchsuite...

TL;DR ripgrep has gotten faster in important areas since the initial set of benchmarks (the proper comparison there would be https://github.com/BurntSushi/ripgrep/tree/master/benchsuite...). The key reasons why are because it grew a parallel directory traverser, and its line counting got vectorized courtesy of the bytecount[1] crate. ucg has gotten a little faster in some cases, but the general conclusion of "ripgrep is the fastest" is still correct.

[1] - https://github.com/llogiq/bytecount

bradknowles · on Jan 9, 2018

Thank you for re-running them. You saved me the trouble. ;)

I would be curious to see ack get into the test suite, however. Even if it is much slower, I'd like to see the results.

And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?

Thanks!

petdance · on Jan 9, 2018

ack will always be slower than ripgrep, but it shouldn't be as slow as it is in burntsushi's tests. In his tests, he's showing run times where ack takes 25x as long to run as ripgrep, and ack shouldn't be NEARLY that slow.

We think that there's something weird about his Perl installation that is making it so slow, but we haven't been able to figure it out.

Here's the ticket: https://github.com/beyondgrep/ack3/issues/42

If you have any insight, we'd love to have it. We've been stumped, as you'll see if you read through the issue history.

burntsushi · on Jan 9, 2018

I will dig back into this and see if I can figure it out. If you look at the recent commit history for ripgrep, you'll see I updated the timings for ack on my benchmark in my README. I have no explanation for it, but ack isn't as slow as it was when I went through this before.

Anyway, my Perl installation is the standard one on Archlinux. I will try on other systems.

burntsushi · on Jan 9, 2018

> I would be curious to see ack get into the test suite, however. Even if it is much slower, I'd like to see the results.

You'll need to add it to the benchsuite script (which should be very easy to do, just peruse the source to see other examples), but for me, ack is too slow to benchmark this way. In theory, I'd be fine adding it to the same benchmarks as pt/sift are in, since they are also generally too slow to benchmark, but are at least fast enough in some of them to tolerate it. But ack has different characteristics. While pt/sift have a very high ceiling (like ack), they also have a very low floor in some cases. ack on the other hand has a reasonably high floor compared to the others, even in the simplest searches. This makes all benchmarks on ack take a long time.

I did a couple ad hoc benchmarks on the same machine:

                                   ripgrep         ack
    linux_alternates                0.113s      9.750s
    linux_alternates_casei          0.133s     19.955s
    linux_literal                   0.103s      7.220s
    linux_literal_casei             0.122s      8.025s
    linux_no_literal                0.356s     18.881s
    linux_re_literal_suffix         0.104s      6.778s
    linux_unicode_greek             0.194s      8.537s (ack reports no results)
    linux_unicode_word              0.111s      7.299s
    linux_word                      0.108s      6.763s
    
    subtitles_en_alternate          0.247s      9.829s
    subtitles_en_alternate_casei    0.247s     43.091s
    subtitles_ru_alternate          0.978s     28.134s
    subtitles_ru_alternate_casei    0.978s    107.314s (ack reports incorrect)
    subtitles_ru_surrounding_words  0.245s      6.633s (ack reports no results)

The subtitles benchmarks are perhaps unfair because I think ack is more focused on directory tree search where as ripgrep claims to be good at both. I included a few anyway to show the difference though. In general, the benchmark just isn't that interesting, and it makes the benchmark run take a lot longer than it would otherwise (because each command is executed several times).

> And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?

Because ripgrep correctly supports Unicode, and does it by default because it can generally handle all Unicode features without a corresponding performance loss. GNU grep handles Unicode in general as well (assuming your system's locale settings are up to snuff), but it can pay a huge price for it some cases, although admittedly, I'd consider such cases to be somewhat infrequent in common usage. It's explained in my blog post: http://blog.burntsushi.net/ripgrep/#single-file-benchmarks --- The subtitles_no_literal is particularly interesting, because it shows what happens when you ask GNU grep to do the correct thing. ;-)

Note that both ag and ucg have the opportunity to support Unicode correctly, but they don't twiddle the right flags in their use of PCRE (and PCRE2, respectively). AFAIK, neither expose a flag to twiddle these things. From scanning the ack man page, I don't see any option there either, although I'm sure Perl regexes probably have that option too.

bradknowles · on Jan 9, 2018

Awesome! Thanks for all the information!

I feel kinda sad that the only thing I can do in response is to go install ripgrep and use that instead of the alternatives.

Do you have a Patreon page? Or anything similar?

burntsushi · on Jan 9, 2018

Haha go for it! :-)

And no, I don't mix money with my free time side projects. Personal choice. Instead, just donate to a charity. My personal favorites are Rails Girls and Wikipedia. The Internet Archive is another good one!