On the order of hundreds to low-digit thousands worked well for us. These photos...

panabee · on Aug 6, 2020

can you elaborate on the key variables for the data? for instance, is it safe to assume 360 photos from the same angle would yield a worse model than 1 photo from 360 different angles?

what does the ideal minimal data set look like (eg, 5 photos from each 15-degree offset)?

thanks for being so active on this thread.

duckworthd · on Aug 7, 2020

NeRF's (and all of photogrammetry's) bread and butter is 3D consistency -- that is, seeing the same object from multiple angles. A 360 degree photo from a fixed position just won't do. As to how to select the best camera angles...I'm not sure. I believe there is research in this area for classical photogrammetry techniques, but I'm not familiar enough to point you to a body of work.

narrationbox · on Aug 6, 2020

How do you remove tourists? Is the network trained to segment and ignore humans?

duckworthd · on Aug 6, 2020

The model does not explicitly learn to segment images. The answer is unfortunately more difficult to explain than a HN comment bears. I encourage you to read the paper for more details.

https://arxiv.org/abs/2008.02268

pferdone · on Aug 6, 2020

Just gotta say: amazing!

My follow up question would be: are you able to compare your results to actual photogrammetry data to see how good your reconstruction performs?

duckworthd · on Aug 6, 2020

I'm actually quite new to the field, and I'm not even sure what to compare against nor how to compare it. What's typically measured and how?

_visgean · on Aug 6, 2020

Is the model able to capture the underlining geometry? E.g. If I have a pillar part of which was not visible at any training point is it able to reconstruct that part?

duckworthd · on Aug 6, 2020

The model is trained to reconstruct what is observed, but not what is obscured. If you look closely at our videos, you'll notice some parts of the scene are blurry -- those parts weren't seen often enough to learn well. If you look at parts of the scene not observed at all, I'm not sure what you'd find.

baq · on Aug 6, 2020

would a sufficiently long video in motion, say from a drone, car or even a walking person, work instead?

duckworthd · on Aug 6, 2020

Pictures are pictures, even as video frames :)

sumtechguy · on Aug 6, 2020

Did you consider using movies as a source too?

duckworthd · on Aug 6, 2020

Consider? Yes. Try? Nope!

sumtechguy · on Aug 7, 2020

awww! Figured dolly shots and steady cam shots would fit perfect into something like that. Esp 24 frames per second and usually known locations. Course it would probably drag a lot of the net into being biased towards that time spot I guess?

rasz · on Aug 7, 2020

There are problems associated with using video: motion blur, rolling shutter.

sumtechguy · on Aug 7, 2020

Oh I agree. In my head it seems like it should work. I could be wildly wrong though. I am every day :)

rasz · on Aug 8, 2020

It definitely can work, and even has some additional benefits (1), but requires special considerations. You can deblur using global motion vectors (2), or additional hardware like accelerometer reading embedded in the video feed (3).

1) cant find the paper now, but by exploiting predictable rolling shutter you get additional temporal resolution

2) http://users.ece.northwestern.edu/~sda690/MfB/Motion_CVPR08....

3) http://neelj.com/projects/imudeblurring/imu_deblurring.pdf