Quick question about your code for the conv net, why do you resize the images down to 32x32? I thought one of the big features of conv nets was the fact that they input does not have to be the same, it just slides a window around the image. Am I complete wrong with this one?
Would you be willing to maybe print out the weights for each layer? I'd be interested to see what features your conv net is capturing.
I was (and still am) trying to use an already trained CIFAR10 net in a similar manner to DeCAF/ImageNet. Because CIFAR10 operates on 32x32 color images, I did the same thing for the input of the DeCAF experiment. As far as I know, the inputs to the network need to be identical between train/test sets , though they can be 0-padded/color filled to make the dimensions match, it may affect results - haven't tried anything but scaling personally. I am pretty sure there are 2 sets of scaling happening for my DeCAF experiment: down to 32x32 with convert, then UP to 512x512, then the center 256x256 is pulled out. I think this may affect my results a little :)
The plan is to operate on 32x32 data for now, then try scaling up the input images or just scaling to 512x512 to see how input data size/resolution affects the DeCAF/pylearn2 classification result, either positively or negatively.
As far as network weights, I haven't tried to print/plot the DeCAF weights yet (though there are images in the DeCAF paper itself). For pure pylearn2 networks, there is a neat utility called show_weights.py in pylearn2/scripts.
Would you be willing to maybe print out the weights for each layer? I'd be interested to see what features your conv net is capturing.