Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, convolution is by itself an attempt to exploit translation-invariance in the visual world, and typical deep convnets end up picking up a certain amount of scaling tolerance (though I would not call it invariance) by having features that are sensitive to larger and larger patches of the input as you go up the hierarchy of features. This is not real scale-invariance, and many people run a laplacian pyramid of some sort at test time to get it real scale-invariance when eking out the best possible numbers.

Rotation-invariance is probably not really a thing you want. The visual world is not, in fact, rotation-invariant, and the "up" direction on Earth-bound, naturally-occurring images has different statistics than the "down" direction, and you'd like to exploit these. Animal visual systems are not rotation-invariant either; an entertainingly powerful demo of this is "the Thatcher Effect" (https://en.wikipedia.org/wiki/Thatcher_effect).

Reflection across a vertical axis, on the other hand, often is exploitable, at least in image recognition contexts (as opposed to, say, handwriting recognition). If you look at the features image recognition convnets are learning they are often symmetric around some axis or other, or sometimes come in "pairs" of left-hand/right-hand twins. As far as I know nobody has tried to exploit this architecturally in any way other than just data augmentation, but it's a big world out there and people have been trying this stuff for a long time.



I was thinking more about a machine vision context with e.g. different parts coming in at any rotation angle.

I know that some translation invariance comes from e.g. the usual conv+maxpool layer structure, but there must still be several representations existing in the first hidden layer of the network stack, for the different translation shifts?

Especially rotation looks like something that should produce a lot of symmetry and shared parameters, but it also looks difficult enough for me that I rather would like to know about someone with mad math/group theory(?) skills who looked at that.

But thank you for the detailed reply anyways!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: