Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd also be really interested to learn how you generate synthetic data. Is it a 'simple' case of manually generating as much as possible and then using the statistics of this to bulk it out? Or are you using something more complex to augment it?


The general class of model that you can use to create synthetic data is called a generative model [1]. The style of learning that pits one of these generative models against a discriminative one (the class you can't generate data from, including traditional neural networks) is called adversarial learning [2]. It's worth noting that normally you can use generative models for classification on their own as well.

[1] https://en.wikipedia.org/wiki/Generative_model

[2] https://en.wikipedia.org/wiki/Adversarial_machine_learning


One example of synthetic data generation was for our OCR project. We took a corpi of word choices (Project Gutenberg, modern books, the UPC database for receipts, etc.), took several thousand fonts, and combined it with geometric transformations that mimic distortions like shadows, creases, etc. to bootstrap millions of fake OCR like scannable documents.

We aren't using GANs yet, but are definitely keeping an eye on them. Work like InfoGANs which has the GAN learn a ground-truth like label are very promising, but GANs don't yet work at the image sizes necessary to really make this promising. I do think in the next year or two we will see these problems solved and GANs will become an integral part of synthetic data generation.


Ooh, neat! I've used GANs to generate synthetic temporal sequence data for training electrophysiologic (i.e., EEG, EMG) signal decoders. In fact, I wrote up the section of my dissertation on this topic today! In my experience it worked quite a bit better than other generative techniques (I've used convolutional variational autoencoders in the past for this and had so-so results). Looking forward to seeing what you guys do with this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: