..."(at least for now) you are in the drivers seat, and the LLM is just another tool to reach for."
Improvements in model performance seem to be approaching the peak rather than demonstrating exponential gains. Is the quote above where we land in the end?
I'm not sure why duplicates were ever considered an issue. For certain subjects (like JS) things evolved so quickly during the height of SO that even a year old answer was outdated.
That and search engines seemed to promote more recent content.. so an old answer sank under the ocean of blog spam
But the answer has not become incorrect. It is still correct for that question in that specific context. More likely, the 'canonicalization process' was overly coarse (for SEO?), inconsistent and confused.
Sounds like they optimised for a select 1% class of self appointed gatekeepers rather than the broad user base. Classic mistake of nearly every defunct social site.
If I watch a movie, then draw a near perfect likeness of the main character from my very good memory, put it on a tshirt and sell the t-shirt. That is grounds for violation of copyright if the source isn't yet in the public domain (not guaranteed but open to a lawsuit).
If I download all content from a website that has a use policy stating that all content is owned by that website and can't be resold. Then allow my users to query this downloaded data and receive a detailed summary of all related content, and sell that product. Perhaps this is a violation of the use policy.
All of this hasn't been properly tested in the courts yet.. large payments have already been made to Reddit to avoid this, likely because Reddit has the means to fight this in court.. my little blog though, fair game because I can't afford to engage.
For sure, it's rich people playing rules for thee not for me. What's interesting is we'll discover on which side of the can-afford-to-enforce-its-copyright boundary the likes of NYTimes fall.
Would be informative if both sides share what the problem domain is when providing their their experiences.
It's possible that the domain or the complexity of the problems are the deciding factor for success with AI supported programming. Statements like 'you'll be left behind' or 'it's a skill issue' are as helpful as 'It fails miserably'
Before leaving any job, or when updating the CV. I look at my sent folder (comms app) and completed tickets.
List everything and grab the high level doc/ticket summary for each. Remove any business strategy and now you have a list of achievements that can jog the memory
Will a business located in another jurisdiction be comfortable that the records of all staff queries & prompts are being stored and potentially discoverable by other parties? This is more than just a Google search, these prompts contain business strategy and IP (context uploads for example)
Tidal forces on a moon in orbit around a large body generate heat. This doesn't need the sun. Certain orbits will generate more heat and orbits aren't permanent, but technically it is possible post sun to have a source of energy
Pretty sure the people creating the best models understand how these things work. Plenty of papers explain the process and you and I can create our own basic model ourselves from scratch with enough aptitude and drive.
There is a lot of art involved with the best models. We aren't dealing with determinism with regard to the corpus used to train the model (too much information to curate for accuracy) nor the LLM output (probabilistic by design) nor the prompt input (humans language is dynamic and open to multiple interpretations) nor the human subjective assessment of the output.
That there is a product available that manages to produce great results despite these huge challenges is amazing and this is the part that is not quantifiable - in the sense that the data scientists decisions made with regard to temperature settings etc are not derived from any fundamental property but more from an inspired group of people with qualitative instincts aligned with producing great results
Let me tell you something. You can go online, find a tutorial on how to make an LLM, and actually make one at home. The only thing stopping you from making an OpenAI scale LLM is compute resources.
If you happen to make an LLM from scratch you won't know how it works, and neither do the people who are at OpenAi or anthropic.
Improvements in model performance seem to be approaching the peak rather than demonstrating exponential gains. Is the quote above where we land in the end?