Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Open Source Definition is quite clear on its #2 requirement: `The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed.` https://opensource.org/osd


Arguably this would still apply to deepseek. While they didn’t release a way of recreating the weights, it is perfectly valid and common to modify the neural network using only what was released (when doing fine-tuning or RLHF for example, previous training data is not required). Doing modifications based on the weights certainly seems like the preferred way of modifying the model to me.

Another note is that this may be the more ethical option. I’m sure the training data contained lots of copyrighted content, and if my content was in there I would prefer that it was released as opaque weights rather than published in a zip file for anyone to read for free.


It takes away the ability to know what it does though, which is also often considered an important aspect. By not publishing details on how to train the model, there's no way to know if they have included intentional misbehavior in the training. If they'd provide everything needed to train your own model, you could ensure that it's not by choosing your own data using the same methodology.

IMO it should be considered freeware, and only partially open. It's like releasing an open source program with a part of it delivered as a binary.


It's not that they want to keep the training content secret, it's the fact that they stole the training content, and who they stole it from, that they want to keep secret.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: