What would you do with the training data if you had it? I see absolutely no reas... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

andy99 on Nov 30, 2023 | parent | context | favorite | on: To safely deploy generative AI in health care, mod...

What would you do with the training data if you had it? I see absolutely no reason why the training data is needed to evaluate a model, or how any kind of guarantees could be made about the model if you did have the training data. With the weights and code it's perfectly possible to interrogate and evaluate it.

I suspect a lot of people asking for training data are mainly looking to complain about some aspect of it (bias, copyright, etc etc) instead of actually thinking they can somehow use it to devine how the model will perform.

ssivark on Dec 1, 2023 [–]

One can never practically evaluate it on all possible inputs/prompts, so an understanding of the training data distribution is important to generate the right test queries and create guardrails for desired use cases.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact