Hacker News new | past | comments | ask | show | jobs | submit login

The assumption is that for a given test program, the call stack used at the specific moment where the failure begins is the same each run. If that's not true, there will be problems.

If there are two different call stacks that get to the function in question, but only one of them is where the failure begins, you absolutely want to distinguish those.

In practice we do cut off the stacks, using just 16 frames, but that's to avoid O(stack depth) overhead, not to coalesce different things.




i was reading zeller's book today and came to his example of delta-debugging minimization, which his ddmin algorithm solves in 48 tests, and it seemed like your 'easy' algorithm would be more efficient, so i hacked together a test program. your algorithm only takes 29 tests on this case. but because i couldn't remember your algorithm clearly, i first accidentally implemented a different variant that also works, but only takes 21 tests. it's not more efficient in the general case. still, it seems interesting, because it is arguably simpler than your 'easy' version (at least if we sweep the call to first_satisfying_bisect under the standard library) but still preserves logarithmic performance when the satisfying set it finds is of bounded size:

    def bisect_dumb(p, s):
        end, need = len(s), []

        while True:
            need.sort()
            needed = [s[j] for j in need]
            last = first_satisfying_bisect(lambda i: p(s[:i] + needed), 0, end)
            if last == 0:
                return need

            end = last - 1
            need.append(end)
the full program, with comments, is at http://canonical.org/~kragen/sw/dev3/bisectreduce.py. the corresponding version of bisect_easy would be

    def bisect_easy(p, s, targets=None, need=[]):
        if targets is None:
            targets = list(range(len(s)))
        if not targets or p([s[i] for i in sorted(need)]):
            return []
        if len(targets) == 1:
            return [targets[0]]

        m = len(targets)//2
        left, right = targets[:m], targets[m:]

        left_reduced = bisect_easy(p, s, left, right + need)
        right_reduced = bisect_easy(p, s, right, left_reduced + need)

        return left_reduced + right_reduced
bisect_easy is much faster if the minimal set it finds, of some size m, is a long substring a long way from the beginning of the input; i think it takes time proportional to log m + log n in that case, while bisect_dumb takes m log n

however, in cases where hashing scatters the size-m minimal set all over the search space, instead of compacting it into a little lump, i think it also ends up with m log n performance

i may get to implementing the easy/hard algorithm and zeller's ddmin. also, i have the intuition that, as with quicksort, trisection will have consistently but not overwhelmingly better performance than bisection for this purpose (though not in bisect_dumb)


i see, thanks!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: