I was working some time ago on image processing model using GAN architecture. On...

bglazer · 2025-04-07T15:38:37 1744040317

Yes, I suspect that engineering the loss and hyperparams could eventually get this to work. However, I was hoping the model would help me get to a more fundamental insight into why the training falls into bad minima. Like the Wasserstein GAN is a principled change to the GAN that improves stability, not just fiddling around with Adam’s beta parameter.

The reason I expected better mathematical reasoning is because the companies making them are very loudly proclaiming that these models are capable of high level mathematical reasoning.

And yes the fact I don’t have to look at matplotlib documentation anymore makes these models extremely useful already, but thats qualitatively different from having Putnam prize winning reasoning ability

MoonGhost · 2025-04-08T01:19:27 1744075167

One thing I forgot. Your solution may never converge. Like in my case with GAN after training models start wobbling around some point trying to outsmart each other. Then they _always_ explode. So, I was saving them periodically and took the best intermediate weights.