Shouldn't it be possible to backpropogate those categorical outputs all the way ...

azeirah · on June 22, 2017

I'm in no way a researcher or even an enthusiast of machine learning, but I'm pretty sure that I came across a paper posted on HN a few days ago that did exactly what you and the parent poster are describing, figuring out what pixels contributed most to some machine learning algorithm. I'll try and see if I can find it.

Edit: yep, found it.

SmoothGrad: removing noise by adding noise, https://arxiv.org/abs/1706.03825

Web page with explanations and examples

https://tensorflow.github.io/saliency/

I couldn't find the HN thread, but there was no discussion as far as I remember.

CuriouslyC · on June 23, 2017

Bagging and bootstrap ensemble methods aren't really that confusing. Just think of it as stochastic gradient descent on a much larger hypothetical data set.

The effect is same one that occurs when you get a group of people together to estimate the number of jelly beans in a jar. All the estimators are biased, but if that bias is drawn from a zero mean distribution, deviation of the average bias goes down as the number of estimators increases.

kefka · on June 22, 2017

I think you might be on to something, but the big problem here is that the Input is hundreds of GB or TB's . It's hard to understand what a feature is, or even why it's selected.

I can certainly observe what's being selected once the state machine is generated, but I have no clue how it was constructed to make the features. Do determine that, I have to watch the state of the machine as it "grows" to the final result.