Interesting that the chatbots learned to show "fake" interest in an item, just to conceide it later in the negotiation process.
But I think what is missing, is the time component when negotiating with humans. A negotiation process is usually better for humans if the negotiation is quick and not dragging on too long.
And more importantly the chatbots never seemed to "walk away" from a deal. But in real life, you sometimes have to walk away to show the other party, that you are not a pushover. It would be interesting to enhance the model so that chatbots negotiate repeatedly with each other and "remember" how the other party behaves and how far you can push the other party to concede. Because some negotiations really are zero sum games.
No, it's signaling that you're wasting time and need to move on.
Many situations cannot be resolved until you convince the other party that their business model or position or assumptions are wrong. Walking away is the only way to do that because it triggers escalation.
I had s sitstuiin recently where the account team couldn't get a term that we needed. We basically told them to go away and stopped 2-3 other negotiations. That let our counterparty get the resource he needed (SVP of product X) and balance returned to the force.
Not really, there are some situations where the deal is too heavily skewed for it to be favorable for you. In human terms "I'm not willing to pay this much for this". What AI might actually consider bugs would be stuff like politeness or pity (I can't refuse to buy from this small hungry child), basically things external to the actual process of negotiation, but that might have considerable impact on human agents and how they would feel afterwards. (for example giving money to a beggar is a one-way act, but the donor will likely feel good about themselves despite not getting any material gains) Similarly, people might purposefully adjust their tactics depending on the status of who they are negotiating with, making more concessions with family and friends. If the AI just looks at the inherent value of the items, it'll miss things like that.
Walking away is a bug if you consider a single deal - it's always better to get something rather than nothing.
However, if you zoom out and consider the optimal deal-making strategy over multiple deals, then walking away can be a good strategy. For example, a used car salesman would rationally walk away from a deal if they believed it's likely they can sell a car later for a better price.
If you consider multiple deals then you can also consider the concept of your reputation. This is information that other parties may have about you when they enter a negotiation in the future. You may rationally wish to make a sacrifice on a present deal in order to alter your reputation, to improve your outcomes in future detals.
The thing is, our approaches to this are like the average player of Go or Chess. Even the most seasoned dealmaking pro is like a grandmaster in Chess or Go. In short - a computer network that's been doing simulated scenarios for trillions of iterations will eventually hit upon dealmaking strategies where we aren't even sure why they work! We will just have computers negotiate with one another about everying because their arguments will match or outperform any human.
Imagine a chatbot who can chat up a girl online better than any human. Whose jokes make any human seel dull by comparison. And whose wit is quick to about 1 trillion jokes a second :-P
It'd be interesting if the agents developed their own language during the reinforcement learning stage that is unintelligible to humans but allows them to quickly navigate the negotiation. They use the model trained in a supervised way during the reinforcement learning stage to avoid this, but I'm curious to see what the agent learns when paired against another reinforcement learning agent.
Edit: Indeed, the paper says that not using the fixed agent trained on human negotiation leads to unintelligible language from the agents.
I bet we could. The length of each utterance, the number of exchanges between agents, and the entropy of the symbols used in the utterances could give you some measure of efficiency.
The most interesting thing to me is not the negotiation tactics that the agents learn but the idea of coming up with a more easily quantifiable (and therefore differentiable) quality metric for dialogue tasks.
So in a very limited sense these agents have a theory of mind - they can infer the beliefs and goals of their opponents and act accordingly. Agents/objects can be in an exponential number of possible relative positions, but this system factorizes structure from function.
It would be interesting to see what happens when new untrained bots start negotiating with trained one. One new language for each pair of bots or one common language that every new bot has to learn?
But I think what is missing, is the time component when negotiating with humans. A negotiation process is usually better for humans if the negotiation is quick and not dragging on too long.
And more importantly the chatbots never seemed to "walk away" from a deal. But in real life, you sometimes have to walk away to show the other party, that you are not a pushover. It would be interesting to enhance the model so that chatbots negotiate repeatedly with each other and "remember" how the other party behaves and how far you can push the other party to concede. Because some negotiations really are zero sum games.