I had assumed that it must be Turing complete because otherwise its utility as an intermediate language seems dramatically reduced (despite its power, no one is compiling to Coq!); but, now that you point it out, I agree that there is no such claim. I originally thought that general recursion followed from folds and unfolds (http://lambda-the-ultimate.org/node/4290), but Ross Angle points out at http://lambda-the-ultimate.org/node/5045#comment-82340 that that's only when they are mixed, which is apparently not the case here.
I don't know how useful it will be in the end. The idea of targeting a powerful (proper) subset of turing completeness to produce fast executables makes a lot of sense to me, and furthermore for reasons that you have already brought up this is the only way it is possible if we truly desire a mechanized solution to this problem.
My concern on the practicality point is that the superoptimizer here really isn't "superoptimizing" so much as super-simplifying. There's a reason modern compilers don't aggressively inline every single function called if they have access to the source code, and for the same reasons they don't aggressively unroll every loop they come across if the loop bounds are known compile time.The tradeoffs of such optimizations are very complicated and hard to reason about because processors are very complicated and hard to reason about.