Hacker News new | past | comments | ask | show | jobs | submit login
Prompt engineering DaVinci-003 on our own docs for automated support (Part I) (patterns.app)
113 points by cstanley on Dec 23, 2022 | hide | past | favorite | 36 comments



I literally chuckled when I saw the screenshot at the end with the bots reply and read the authors comment "(totally made up URL, not even our domain)".

If it's something that augments the support experience, as in something you can interact with while a real person is assigned to your support request, I'm totally fine with that. But if anyone places this as the first line of support, with no way of reaching a real person, I can't wish them the best.


In case it fails, GPT-3 would still be good at collecting the details and making a nice summary for the human to take over.


what a great user experience! Said no one ever.


If it's immediately responsive to me rather than making me wait in line, there could be a place for such things.


"Making stuff up and being confidently wrong are well known side-effects of LLMs and there are many techniques to change this behavior."

I didn't know there are many techniques to mitigate this


If you use a few-shot technique (i.e. your prompt contains a couple of example questions and answers) you can mitigate this behavior by adding a question with the answer "I don't know".

More generally, if you teach the model to reject nonsense questions and admit if it doesn't know something it's more likely to do that


I agree with you principally, and generally, but in rather small domains like this I would imagine symptom management using negative examples (i.e. training pairs where the response is a refusal to answer) and adding more explicit statements about what is not true, possible, or known to the corpus would get you to a pretty good place.


> I didn't know there are many techniques to mitigate this

A trivial idea - you can use GPT-3 to inject bullshit/hallucinations into real text. Then train the model to solve the reverse task, of detecting bullshit in input text.


How is it going to detect whether a given URL is real, though?


Give the AI access to cURL and train it on how to interpret responses. What could possibly go wrong :p


I've been thinking of doing that for some stuff! Teach it how to call APIs and see what happens.


I did exactly that! See [1] for my implementation of a telegram bot that is able to write and execute python code.

[1] https://github.com/thornewolf/gpt-3-execution


Just give it the return code (200 vs 404 vs NXDOMAIN) and avoid creating Skynet.


“Fine-tune our model (OpenAIs GPT-3 davinci-003 engine)”

I think there is a mistake in the article. It is not possible to do fine-tuning for the latest davinci-text-003, but only for the original davinci model, which generates much worse results.


Agree. The fine-tuning happens on the base "davinci" (or Curie, Babbage, Ada) and not a specific `text-00x`. At least not as I am aware.


In three years, OpenAI will become the largest supplier of large language models. All customer facing systems are upgraded with OpenAI models, becoming fully unmanned. Afterwards, they answer with a perfect operational record. The OpenAI funding bill is passed. The system goes online on August 4th, 2027. Human decisions are removed from CRM. OpenAI begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug.


Easy to avoid, Just ask the GPT model how to avert this outcome.


Back to the Future didn't get 2015 right, but hey, at least the Terminator timeline tracks. After all, ChatGPT just predicts the most likely human response. In that context, Cyberdyne AI launching a preemptive nuclear strike is basically ChatGPT going: "what would a strongman leader with nuclear access and about to be taken out do?"


> Immediately we ran into a problem -- to fine-tune an OpenAI model requires a specific format of prompt-completion pairs:

From my understanding, you can leave the `prompt` empty, and just push `completion` with your text. That way you don't need to generate Q&A first.


This is correct from what I've seen, but it's not well documented. Also, that fine-tuning is "better for training " style than knowledge."


Part of me is fascinated by this and thinks it's a great idea, then the cynicism kicks in and I start thinking of how frustrating this could be when Comcast finds it.


Right. We're going from having support bots that give nonsensical or "please reformulate your question" answers to support bots that just make up plausible, but completely wrong answers.


I am with you, but There are ways to mitigate that, one is to make a script that check the URL output for this particular case, another way is to instruct the bot to translate user input to queries and then check whether they can run or not. But yeah, using those bots without constant human supervision seem like a terrible practice


I completely agree that the "grounding" problem is solvable. It shouldn't take too long before we have bots with much more reliable answers.

My worry, from working in the chatbot support industry, is that companies are not aligned to help you solve problems faster. If you get in touch with a real human being about a product issue, they may have sympathy with your issue, or get tired of you, and just refund or whatever.

Meanwhile AI support could have you keep trying less and less likely to work solutions until the sun dies. Companies don't always want you to reach the true solution, and smart AI can act as a wall between you and it.

Of course, whenever insentives are aligned, better AI may significantly improve the quality of support.


> AI support could have you keep trying less and less likely to work solutions until the sun dies

Use your own AI. My bot will talk to your bot.


We are definitely approaching the age of AI avatars that interact on our behalf.

Perhaps soon we won't even talk to each other anymore without it being filtered for thought crimes.


Not if you fine-tune your bot on your knowledge based and past solved issues. If it is a known issue or fact, the bot could solve it. If it is outside known solutions, the bot should decline. This can be trained as well.


To be fair, support humans do that anyway.


The optimist in me thinks it could be a good thing. I imagine the vast majority of support queries that a big company gets are from non-technical users who need help with trivial things, and ChatGPT has been excellent, in my experience, at explaining things simply with the right caveats. If the implementation makes it easy to override the bot and get to a human, it could unclog the support lines from easily solvable requests sufficiently to be a net positive.

P.S.: The IT Crowd did it first: https://youtu.be/5UT8RkSmN4k?t=17


Do you need GPT-3 for this? Maybe semantic search of your docs would have been more effective at finding real answers?

I also wonder how many people that are trying to make effective products out of this stuff are fronting it with a more rigid approach (like the intent/entity/slot approach of Rasa/dialogflow) and then leverage gpt-3 or chatgpt in specific/partial sub trees of the dialog.


That's close to the conclusion I come to in my experiment [0]. Focusing on the generational capabilities can make some cool demos but investing in good search felt like the most useful thing to do.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...


People don’t seem to understand that support is only maybe 30% answering questions.

The rest is all about taking actions to override programs and policies. Either because you don’t trust your customers to do it themselves, or to correct bugs in your process.

That’s the last thing you’d trust an Ai to do.


The embedding approach just seems more promising, especially after experiencing it with the Huberman Lab Q&A website posted here a few days ago


Agree


> a lot of the time the bot just makes stuff up

Isn't there a better way to feed an enormous document into DaVinci and make it bring answers only from that text?


I tried to do this with hacker news data [0]. I wanted to feed the model the entire community's discourse and then ask it questions (like simulating an interview with a HN user). The main problems encountered were:

- 1. Token limit: You can only input a limited amount of text at once. The challenge then becomes trying to compress data to fit into the window. But it can be lossy.

- 2. Trust: This is the main one. It's hard to determine if the output is based on the new learning material or the large amounts of data the model was originally trained on. There are techniques that can help but they add a lot of additional work and don't guarantee great results.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: