The checksum is validated by redoing the computation, but making use of the fact that you already have the entire response to enable greater parallelism than when generating it one token at a time.
TOPLOC attempts to detect model substitution, i.e. responses being generated by a different model than requested, it comes with certain caveats, as far as I can tell the TOPLOC paper considers verifiable learning / training as out of scope.