Conv layers are strictly special cases of FC layers (with respect to expressive ...

est31 · on Feb 14, 2019

> The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).

It's correct that the inferrence mode of a convolutional neural network can be reduced to the inference mode of a FC network. However, during training, you need to make sure that the weights are the same, and FC networks don't ensure that. So you need to train CNNs-embedded-in-FC slightly differently from normal FCs, otherwise what you're getting is not an CNN any more because the weights are different. What you need to do is to a) initialize all kernel entries with the same weights and b) instead of applying the weight changes directly, average the weight changes over all offsets and only then apply them to the specific kernel positions.

czr · on Feb 14, 2019

Absolutely–if you wanted to get an FC layer that satisfies Conv layer constraints, you would need to perform gradient descent subject to those constraints; normal gradient descent won't do that unless the actual optimum is of convolutional form. That's why I said (with respect to expressive power) :)