The assumption is that for a given test program, the call stack used at the spec...

kragen · 2024-07-22T02:33:30 1721615610

i was reading zeller's book today and came to his example of delta-debugging minimization, which his ddmin algorithm solves in 48 tests, and it seemed like your 'easy' algorithm would be more efficient, so i hacked together a test program. your algorithm only takes 29 tests on this case. but because i couldn't remember your algorithm clearly, i first accidentally implemented a different variant that also works, but only takes 21 tests. it's not more efficient in the general case. still, it seems interesting, because it is arguably simpler than your 'easy' version (at least if we sweep the call to first_satisfying_bisect under the standard library) but still preserves logarithmic performance when the satisfying set it finds is of bounded size:

    def bisect_dumb(p, s):
        end, need = len(s), []

        while True:
            need.sort()
            needed = [s[j] for j in need]
            last = first_satisfying_bisect(lambda i: p(s[:i] + needed), 0, end)
            if last == 0:
                return need

            end = last - 1
            need.append(end)

the full program, with comments, is at http://canonical.org/~kragen/sw/dev3/bisectreduce.py. the corresponding version of bisect_easy would be

    def bisect_easy(p, s, targets=None, need=[]):
        if targets is None:
            targets = list(range(len(s)))
        if not targets or p([s[i] for i in sorted(need)]):
            return []
        if len(targets) == 1:
            return [targets[0]]

        m = len(targets)//2
        left, right = targets[:m], targets[m:]

        left_reduced = bisect_easy(p, s, left, right + need)
        right_reduced = bisect_easy(p, s, right, left_reduced + need)

        return left_reduced + right_reduced

bisect_easy is much faster if the minimal set it finds, of some size m, is a long substring a long way from the beginning of the input; i think it takes time proportional to log m + log n in that case, while bisect_dumb takes m log n

however, in cases where hashing scatters the size-m minimal set all over the search space, instead of compacting it into a little lump, i think it also ends up with m log n performance

i may get to implementing the easy/hard algorithm and zeller's ddmin. also, i have the intuition that, as with quicksort, trisection will have consistently but not overwhelmingly better performance than bisection for this purpose (though not in bisect_dumb)

kragen · 2024-07-19T23:10:46 1721430646

i see, thanks!