Happy to see this type of work that is truly open source and commercially usable... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dreaminvm on April 12, 2023 \| parent \| context \| favorite \| on: Databricks Releases 15K Record Training Corpus for... Happy to see this type of work that is truly open source and commercially usable. Is this the entire corpus or a subset? Do you intend to release any new iterations? I've been thinking of starting similar efforts at another BigCorp by hosting a UL2 or GPT-J instance.

pwendell on April 12, 2023 [–]

15k is the entire corpus we have right now. Hopefully others can join up in releasing additional samples that can be merged in over time.

We'll definitely keep iterating on Dolly and releasing everything openly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact