How Can We be Sure AI will Behave? Perhaps by Watching It Argue with Itself.
Take, for instance, an AI system designed to defend against human or AI hackers. To prevent the system from doing anything harmful or unethical, it may be necessary to challenge it to explain the logic for a particular action. That logic might be too complex for a person to comprehend, so the researchers suggest having another AI debate the wisdom of the action with the first system, using natural language, while the person observes.
[READ THE FULL ARTICLE