Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you build a new model, there is a spectrum of how you use the old model: 1. taking the weights, 2. training on the logits, 3. training on model output, 4. training from scratch. We don't know how much advantage #3 gives. It might be the case that with enough output from the old model, it is almost as useful as taking the weights.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: