Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wait for the first large scale LLM using source-aware training:

https://github.com/mukhal/intrinsic-source-citation

This is not something that can be LoRa finetuned after the pretraining step.

What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.

In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.

Source aware training also has the potential to decentralize search!



Yeah. Treating these things as advanced, semantically aware search engines would actually be really cool.

But I find the anthropomorphization and "AGI" narrative really creepy and grifty. Such a waste that that's the direction it's going.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: