It's not that the physics can't be modeled, but figuring out the appropriate models is probably harder than it looks. Every sensor and actuator has quirks. Does it matter how much your airframe flexes? Does the turbulence caused by some little protrusion matter enough to model? You aren't going to model every individual molecule, but how much detail is enough to be right?
The elegant thing about using machine learning is that you don't need to build any models at all. And you can develop the ML technique once and then reuse it to train different hardware configurations, instead of incurring the cost of modeling every one.
One way around that is to make small random variations in the simulation (sensor calibration, vehicle performance etc.) when you generate the training data, so that your system learns to drive a generic vehicle rather than a very specific one.
There are some fairly realistic flight simulator video game for drones. It wouldn't surprise me if someone's also done a GTA-V mod for something like this too.
That's what Sadeghi and Levine [1] do in their work. They train in simulation in a lot of randomly generated scenes. Since the drone is trained with such a diverse set of scenes, the learned policy generalizes to the real world.
Also, note that the physics of the simulation doesn't even need to be realistic. Unless you are doing high-speed control or aggressive maneuvers, the challenging part is the perception and not the control. In the paper from OP, the controls are even high-level discrete actions: left, forward, right.
Or would the physics be too complex to model well for simulation?