There are forms of super-resolution that certainly aren't guessing. For example, you can take a video of a subject and integrate over time, so that the motion of the subject over the sensor allows you to infer sub-pixel detail.
They started their model with RAW format, so the model should encoded some interactions between red / blue / green light sensors and that can help generate genuine sub-pixel details. OTOH, this is machine learning, unless you specifically has some discriminators (just an idea) to counteract, you don't really know how much these are genuine sub-pixel details and how much are hallucinations.
https://www.cs.huji.ac.il/~peleg/papers/icpr90-SuperResoluti...