> I would be curious to see ack get into the test suite, however. Even if it is much slower, I'd like to see the results.
You'll need to add it to the benchsuite script (which should be very easy to do, just peruse the source to see other examples), but for me, ack is too slow to benchmark this way. In theory, I'd be fine adding it to the same benchmarks as pt/sift are in, since they are also generally too slow to benchmark, but are at least fast enough in some of them to tolerate it. But ack has different characteristics. While pt/sift have a very high ceiling (like ack), they also have a very low floor in some cases. ack on the other hand has a reasonably high floor compared to the others, even in the simplest searches. This makes all benchmarks on ack take a long time.
I did a couple ad hoc benchmarks on the same machine:
The subtitles benchmarks are perhaps unfair because I think ack is more focused on directory tree search where as ripgrep claims to be good at both. I included a few anyway to show the difference though. In general, the benchmark just isn't that interesting, and it makes the benchmark run take a lot longer than it would otherwise (because each command is executed several times).
> And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?
Because ripgrep correctly supports Unicode, and does it by default because it can generally handle all Unicode features without a corresponding performance loss. GNU grep handles Unicode in general as well (assuming your system's locale settings are up to snuff), but it can pay a huge price for it some cases, although admittedly, I'd consider such cases to be somewhat infrequent in common usage. It's explained in my blog post: http://blog.burntsushi.net/ripgrep/#single-file-benchmarks --- The subtitles_no_literal is particularly interesting, because it shows what happens when you ask GNU grep to do the correct thing. ;-)
Note that both ag and ucg have the opportunity to support Unicode correctly, but they don't twiddle the right flags in their use of PCRE (and PCRE2, respectively). AFAIK, neither expose a flag to twiddle these things. From scanning the ack man page, I don't see any option there either, although I'm sure Perl regexes probably have that option too.
And no, I don't mix money with my free time side projects. Personal choice. Instead, just donate to a charity. My personal favorites are Rails Girls and Wikipedia. The Internet Archive is another good one!
You'll need to add it to the benchsuite script (which should be very easy to do, just peruse the source to see other examples), but for me, ack is too slow to benchmark this way. In theory, I'd be fine adding it to the same benchmarks as pt/sift are in, since they are also generally too slow to benchmark, but are at least fast enough in some of them to tolerate it. But ack has different characteristics. While pt/sift have a very high ceiling (like ack), they also have a very low floor in some cases. ack on the other hand has a reasonably high floor compared to the others, even in the simplest searches. This makes all benchmarks on ack take a long time.
I did a couple ad hoc benchmarks on the same machine:
The subtitles benchmarks are perhaps unfair because I think ack is more focused on directory tree search where as ripgrep claims to be good at both. I included a few anyway to show the difference though. In general, the benchmark just isn't that interesting, and it makes the benchmark run take a lot longer than it would otherwise (because each command is executed several times).> And I'd be very curious to hear your reasoning for the different results in the subtitles_ru test cases -- why is rg returning radically different numbers of lines as compared to the other tools?
Because ripgrep correctly supports Unicode, and does it by default because it can generally handle all Unicode features without a corresponding performance loss. GNU grep handles Unicode in general as well (assuming your system's locale settings are up to snuff), but it can pay a huge price for it some cases, although admittedly, I'd consider such cases to be somewhat infrequent in common usage. It's explained in my blog post: http://blog.burntsushi.net/ripgrep/#single-file-benchmarks --- The subtitles_no_literal is particularly interesting, because it shows what happens when you ask GNU grep to do the correct thing. ;-)
Note that both ag and ucg have the opportunity to support Unicode correctly, but they don't twiddle the right flags in their use of PCRE (and PCRE2, respectively). AFAIK, neither expose a flag to twiddle these things. From scanning the ack man page, I don't see any option there either, although I'm sure Perl regexes probably have that option too.