Sometimes it "apologizes" rather than saying "sorry", you could build a fairly s...

roywiggins · on Jan 12, 2024

> OpenAI could presumably add a "did the safety net kick in?" boolean to API responses, and, also presumably, they don't want to do that because it would make it easier to systematically bypass.

Is a safety net kicking in or is the model just trained to respond with a refusal to certain prompts? I am fairly sure it's usually the latter, and in that case even OpenAI can't be sure a particular response is a refusal or not.

wongarsu · on Jan 12, 2024

Just feed the text to a new ChatGPT conversation and ask it whether the text is an apology or a product description.

Or do traditional NLP, but letting ChatGPT classify your text is less effort to set up

sargun · on Jan 12, 2024

Right, it seems like having another model (or just simply doing it with chatgpt itself) do adversarial classification is the right model here.

pixl97 · on Jan 12, 2024

Yea, I'd expect some lower powered model would be able handle and catch the OpenAI apologies messages at a much lower cost too.

RGamma · on Jan 13, 2024

That's merely a first order reaction... The resulting race will leave humans far behind :/

rcthompson · on Jan 12, 2024

What happens when ChatGPT apologizes instead of answering your question about whether the text is an apology or a product description?

tester457 · on Jan 12, 2024

You simply feed the text to another ChatGPT.

Just kidding, it should only require function calling[0] to solve this. Make the program return an error if the output isn't a boolean. It's easy to avoid this mistake.

[0]: https://platform.openai.com/docs/guides/function-calling

nprateem · on Jan 12, 2024

Even when you tell it to stop apologising, the first thing it does is apologise. Our jobs are totally safe.

iainmerrick · on Jan 13, 2024

I guess you’re not British

kdazzle · on Jan 13, 2024

Just wait until more jobs are outsourced to Canada - there won’t be any difference

Cheer2171 · on Jan 12, 2024

> OpenAI could presumably add a "did the safety net kick in?" boolean to API responses, and, also presumably, they don't want to do that because it would make it easier to systematically bypass.

This exists and is a free API: https://platform.openai.com/docs/guides/moderation

cedws · on Jan 12, 2024

It's hilarious that people think ChatGPT is about to change the world when interaction with it is this primitive.

AlecSchueler · on Jan 13, 2024

Dogs and horses changed the world with much more primitive communication skills.

Wolfenstein98k · on Jan 13, 2024

Dogs and horses didn't perform in the world solely by communication

AlecSchueler · on Jan 13, 2024

My point is that it took humans to seize their capabilities.

ryandamm · on Jan 12, 2024

Why not have a separate chat request to apology-check the responses?

Not my original idea, there was a link from HN where the dev did just that.

Retr0id · on Jan 12, 2024

Sounds like a great way to double your API bills, and maybe that's worth it, but it seems pretty heavy-handed to me (and equally not 100% watertight).

Cheer2171 · on Jan 12, 2024

OpenAI's moderation API is free and just tells you if your query will be declined: https://platform.openai.com/docs/guides/moderation

spdustin · on Jan 12, 2024

Only allow one token to answer. Use logit bias to make "0" or "1" the most probable tokens. Ask it "Is this message an apology? Return 0 for no, 1 for yes." Feed it only the first 25 tokens of the message you're checking.

AnarchismIsCool · on Jan 12, 2024

Time to create on algorithm that operates on the safety flag boolean to optimize phrases to bypass it

boxedadin · on Jan 12, 2024

You could go full circle and ask OpenAI to determine if another instance of OpenAI was apologetic.

nxobject · on Jan 12, 2024

Sounds like a "good" add-on service to have to purchase as an extra.