This is my first time hearing about rr, and I'm very curious. Your comment below about increasing generality/performance with bpf is very interesting.
Can you share a little bit about how some of these features (looking at the first one for example) impact rr's ability to run in cloud environments? Does it work but with performance degradation? Without performance degradation but missing some features? I don't work at quite a low enough level to look at this list and quickly grasp the high level takeaway.
rr works today on cloud instances with hardware performance counters available, on Intel CPUs. For example it works on a variety of AWS instances that are large enough that your VM occupies a whole CPU socket. c5(d).9xlarge or bigger works well.
The performance impact of virtualization on rr seems small on AWS, small enough that we haven't tried to measure it.
On AWS rr is mostly feature complete. The only missing feature is CPUID faulting, i.e. the ability to intercept CPUID instructions and fake their results. This means that taking rr recordings from one machine and replaying them on an AWS instance with a different CPU does not work. (The other direction does work.)
(Pernosco uses AWS, but we have a closed-source binary instrumentation extension to rr replay that lifts this restriction.)
As I mentioned above, there's no technical reason AFAIK why AWS could not virtualize CPUID faulting; regular Linux KVM supports this.
Can you share a little bit about how some of these features (looking at the first one for example) impact rr's ability to run in cloud environments? Does it work but with performance degradation? Without performance degradation but missing some features? I don't work at quite a low enough level to look at this list and quickly grasp the high level takeaway.