Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hate seems like quite a strong reaction.

My hunch is that quite soon, LLMs will be totally interchangeable. Due to the intense competition and the fact that people are basically training on the same base data distribution.

In the tasks I'm using LLMs switching one for another makes less difference than I had predicted even.

However, I'm not spending $30k+ per month, so I guess my opinion may be less informed.

What is your use case? Could these micro-optimizations you need to do now be the result of the technology still being quite immature?

I'm working with digital twins of musicians/celebrities. But have also done more analytical stuff with LLMs.

My current side project involves working with the production company of a well-known German soap opera to help them write further episodes. The first thing we did was write a small evaluation system. Could be interesting to test with Unify.



We do content generation and need a high level of consistency and quality

Our prompts get very complex and we have around 3,000 lines of code that does nothing but build prompts based on user's options (using dozens of helpers in other files)

They aren't going to get more interchangeable because of the subjective nature of them

Give five humans the same task and even if that task is just a little bit complex you'll get wildly different results. And the more complex it gets the more different the results will become

It's the same with LLMs. Most of our prompt changes are more significant but in one recent case it was a simple as changing the word "should" to "must" to get similar behavior between two different models.

One of them basically ignored things we said it "should" do and never performed the thing we wanted it to whereas the other did it often, despite these being minor version differences of the same model


thank you!

prompt engineering is a thing, and it's not a thing that you get on social media posts with emojis or multiple images.

is it a public-facing product?


If you do test it out, feel free to ping me with any questions!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: