I was trying to reconstruct multi-position "panoramas" from video recorded on a ...

I was trying to reconstruct multi-position "panoramas" from video recorded on a phone as someone walks - like getting one big photo of a street (but I'm inside essentially mapping walls). This 2011 talk has some good examples although they benefit from being relatively far away so standard feature detection like SIFT and ORB can do the job well. http://graphics.cs.cmu.edu/courses/15-463/2011_fall/Lectures...

Most phones have an accelerometer and other sensors so I'm exploring if those can be used to determine the phone's movement between frames accurately enough to help me stitch it back together. When relatively close to the subject the perspective changes so quickly that matching detected features with something like RANSAC really struggles.

I'd voraciously consume any good links you have; I'm happily over my head on this and learning/iterating at every turn. I think I've accidentally given myself a relatively hard problem because of the constraints.