A Theory of Mind Approach as Test-Time Mitigation Against Emergent Adversarial Communication

Piazza , Nancirose, Behzadan , Vahid : A Theory of Mind Approach as Test-Time Mitigation Against Emergent Adversarial Communication. https://arxiv.org/abs/2302.07176, 2023.

Abstract

Multi-Agent Systems (MAS) is the study of multi-agent interactions in a shared environment. Communication for cooperation is a fundamental construct for sharing information in partially observable environments. Cooperative Multi-Agent Reinforcement Learning (CoMARL) is a learning framework where we learn agent policies either with cooperative mechanisms or policies that exhibit cooperative behavior. Explicitly, there are works on learning to communicate messages from CoMARL agents; however, non-cooperative agents, when capable of access a cooperative team's communication channel, have been shown to learn adversarial communication messages, sabotaging the cooperative team's performance particularly when objectives depend on finite resources. To address this issue, we propose a technique which leverages local formulations of Theory-of-Mind (ToM) to distinguish exhibited cooperative behavior from non-cooperative behavior before accepting messages from any agent. We demonstrate the efficacy and feasibility of the proposed technique in empirical evaluations in a centralized training, decentralized execution (CTDE) CoMARL benchmark. Furthermore, while we propose our explicit ToM defense for test-time, we emphasize that ToM is a construct for designing a cognitive defense rather than be the objective of the defense.

BibTeX (Download)

@conference{Nanci-2023-02-14,
title = {A Theory of Mind Approach as Test-Time Mitigation Against Emergent Adversarial Communication},
author = {Piazza , Nancirose and Behzadan , Vahid },
url = {https://arxiv.org/abs/2302.07176},
doi = {https://doi.org/10.48550/arXiv.2302.07176},
year  = {2023},
date = {2023-02-14},
urldate = {2023-02-14},
address = {https://arxiv.org/abs/2302.07176},
abstract = {Multi-Agent Systems (MAS) is the study of multi-agent interactions in a shared environment. Communication for cooperation is a fundamental construct for sharing information in partially observable environments. Cooperative Multi-Agent Reinforcement Learning (CoMARL) is a learning framework where we learn agent policies either with cooperative mechanisms or policies that exhibit cooperative behavior. Explicitly, there are works on learning to communicate messages from CoMARL agents; however, non-cooperative agents, when capable of access a cooperative team's communication channel, have been shown to learn adversarial communication messages, sabotaging the cooperative team's performance particularly when objectives depend on finite resources. To address this issue, we propose a technique which leverages local formulations of Theory-of-Mind (ToM) to distinguish exhibited cooperative behavior from non-cooperative behavior before accepting messages from any agent. We demonstrate the efficacy and feasibility of the proposed technique in empirical evaluations in a centralized training, decentralized execution (CTDE) CoMARL benchmark. Furthermore, while we propose our explicit ToM defense for test-time, we emphasize that ToM is a construct for designing a cognitive defense rather than be the objective of the defense.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}