Sure! I think underpinning your question is a really subtle point there. And I think the answer is in the different purposes of regression testing and bug finding. In regression testing (CI), you're testing if the code introduced new problems. You don't at that point in time really want to know that someone else's test downstream from your component fails when given a new thread schedule that it has not previously seen. Wherease if you're stress testing (including fuzzing and concurrency testing) you probably want to torture the program overnight to see if you can turn up new failures.
The Coyote project at Microsoft is a concurrency testing project with some similarities to Hermit. For the reasons above, they say in their docs to use a constant seed for CI regression testing, but use random exploration for bug finding:
Still, it does feel like wasted resources to test the same points in the (exponentially large) schedule space again and again. Kind of like some exploration/exploitation tradeoff.
We don't do it yet, but I would consider doing a randomized exploration during CI, but making the observable semantics the fixed version. If the randomized one fails, send that over to the "bug finding" component for further study, while quickly retrying with the known-good seed for the CI visible regression test results.
I don't think there's one right policy here. But having control over these knobs lets us be intentional about it.
P.S. Taking the random schedules the OS gives us is kind of "free fuzzing", but it is very BAD free fuzzing. It over-samples the probable, boring schedules and under-samples the more extreme corner cases. Hence concurrency bugs lurk until the machine is under load in production and edge cases emerge.
The Coyote project at Microsoft is a concurrency testing project with some similarities to Hermit. For the reasons above, they say in their docs to use a constant seed for CI regression testing, but use random exploration for bug finding:
Still, it does feel like wasted resources to test the same points in the (exponentially large) schedule space again and again. Kind of like some exploration/exploitation tradeoff.We don't do it yet, but I would consider doing a randomized exploration during CI, but making the observable semantics the fixed version. If the randomized one fails, send that over to the "bug finding" component for further study, while quickly retrying with the known-good seed for the CI visible regression test results.
I don't think there's one right policy here. But having control over these knobs lets us be intentional about it.
P.S. Taking the random schedules the OS gives us is kind of "free fuzzing", but it is very BAD free fuzzing. It over-samples the probable, boring schedules and under-samples the more extreme corner cases. Hence concurrency bugs lurk until the machine is under load in production and edge cases emerge.