Can you (or someone else) explain why that simple construction where all outputs are computed in parallel is not believed to work? Is it that quantum computers have bounded parallelism in some sense?
The "tricky bit" is not in branching out, it's in collapsing back down to determinism again.
When you do your measurement at the end of the process, if you have a mixed state you'll just pick one of the "parallel computations" -- you can't just "pick the best one". Put simply, it's like a nondeterministic Turing machine that branches randomly instead of taking "all paths at once."
Where quantum computers are meant to get their extra power is in what's called "destructive interference". The idea here is that there aren't really classical probabilities associated with the computation branches, there are complex "amplitude" numbers that square to give probabilities. When your computation is laid out in the right way, you want the amplitudes of the "bad" paths in the computation to cancel out, so when you take a measurement at the end all you're left choosing between are "good" paths.
It is possible to compute all outputs in parallel with a quantum computer, but what you'll end up is an unintelligibly entangled state. What you really want is a single output, and there's no generic way of disentangling the entangled state in a useful way.
The reason why numbers can be factored faster using quantum computers is precisely that factoring has an algebraic structure which can be used to achieve this disentangling.