Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was working some time ago on image processing model using GAN architecture. One model produces output and tries to fool the second. Both are trained together. Simple, but requires a lot extra efforts to make it work. Unstable and falls apart (blows up to unrecoverable state). I found some ways to make it work by adding new loss functions, changing params, changing models' architectures and sizes. Adjusting some coefficients through the training to gradually rebalance loss functions' influence.

The same may work with you problem. If it's unstable try introduce extra 'brakes' which theoretically are not required. May be even incorrect. Whatever it is in your domain. Another thing to check is optimizer, try several. Check default parameters. I've heard Adams defaults lead to instability later in training.

PS: it would be heaven if models could work at human expert level. Not sure why some really expect this. We are just at the beginning.

PPS: the fact that they can do known tasks with minor variations is already a huge time saver.



Yes, I suspect that engineering the loss and hyperparams could eventually get this to work. However, I was hoping the model would help me get to a more fundamental insight into why the training falls into bad minima. Like the Wasserstein GAN is a principled change to the GAN that improves stability, not just fiddling around with Adam’s beta parameter.

The reason I expected better mathematical reasoning is because the companies making them are very loudly proclaiming that these models are capable of high level mathematical reasoning.

And yes the fact I don’t have to look at matplotlib documentation anymore makes these models extremely useful already, but thats qualitatively different from having Putnam prize winning reasoning ability


One thing I forgot. Your solution may never converge. Like in my case with GAN after training models start wobbling around some point trying to outsmart each other. Then they _always_ explode. So, I was saving them periodically and took the best intermediate weights.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: