Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is the model still based on training data from 2021? I'm curious to see what happens when it's unleashed on its own output.


I assume there's a certain danger in letting it consume data in real time. It wouldn't be hard to trick the web crawler into ingesting undesirable content, and people would quickly start asking it questions like "why is the metro down today?" or "do I need to worry about the hurricane that's forecast for tomorrow?" which it would struggle with. Not to mention how much AI generated data is now found across the internet.


It’s a fun test actually. There was a beta version of one of my libraries online before 2021, and when I ask ChatGPT how to use it, the answers are bad, but clearly it knows some correct things. I want to know if our current documentation of the full release is good enough to close the gap…


With a paid subscription to OpenAI you can fine-tune their models on additional data, so if you're a business trying to offer an AI based chat help or something this seems achievable.


I believe 2021 was the tipping point where most text content is now AI generated, so to avoid training your LLM with other LLM output they restrict the date to 2021.


>> most text content is now AI generated

do you have any sources to back it up or is it a gut feeling?


I have this question as well.

When will it be "up to date" and when will it learn from our questions in real time and add that to its model?


> learn from our questions in real time and add that to its model

That's a big no. It will turn really bad https://www.theverge.com/2016/3/24/11297050/tay-microsoft-ch...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: