Because you can recover left and right from mid and side, you can also multiply both signals by two to make division result to be always integer value.
Surely multiplying both signals by two is changing them? What if overflow happens? How do you store the information that it needs to be divided back when decoding?
Then why divide by two at all? Also, adding one bit for every value (or pair of values) means that data size gets a significant increase from the start.
The point is to decorrelate the channels. The left and right channels are usually mostly the same, so coding them both would basically send the same signal twice. With perfect decorrelation, the side channel would become zero, instantly saving half the bits.
(It's the same idea for images, where the RGB channels mostly all look like grayscale copies of the image, so the YCbCr/YCoCg transform is done to decorrelate them.)
There are actually three methods in FLAC: mid/side, left/side, and right/side. Each frame can use a different method and stores what method it used (if any) in the frame header.
The difference has a range 1 bit larger than the original, but this doesn't matter that much since everything is getting compressed anyway. Anyway the bit is only used if the side channel is very large, ie. the correlation was poor, in which case it would be better not to use a decorrelation for this frame.