I hope you'll consider writing up your results in a blog post or something! As I just mentioned elsewhere in this thread, I have experimentally found latency with Metal compute to be too high to be feasible, but I would dearly love to be proven wrong, if there's a trick I'm missing.