Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah. Most of what is valuable to me about GPT-4 is its reasoning ability, not fact recall or writing quality. Fact recall has been mostly solved by Google search cards for years, and writing quality is not the most important thing now that I'm no longer a freelance writer, GPT-3.5 and some of the good OS models like Koala produce okay writing quality.

What nothing else can provide is something that will reason intelligently with the data you give it, with similar or better quality to paying for something like MTurk, for much cheaper and nearly instant delivery. That reasoning ability comes from the model size and training data quality, and in real applications using CoT, LangChain etc a lot of it comes from the context length. 8k is better than anything else I've tried at real use cases, and I very much want to try 32k because that opens up a lot of space to do new things (e.g. dump in a textbook on the domain you want the model to reason about). I want even longer context lengths than that too, but we'll have to see how it develops. From what I understand context length/block size is a pretty pure relationship to the amount of compute and memory they're willing to devote during training. RWKV's architectural changes may shake that up a bit, we'll see when Stability releases it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: