Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Franzmeyer, Tim; McAleer, Stephen; Henriques, João F.; Foerster, Jakob N.; Torr, Philip H. S.; Bibi, Adel; de Witt, Christian Schroeder

Computer Science > Artificial Intelligence

arXiv:2207.10170 (cs)

[Submitted on 20 Jul 2022 (v1), last revised 6 May 2024 (this version, v5)]

Title:Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Authors:Tim Franzmeyer, Stephen McAleer, João F. Henriques, Jakob N. Foerster, Philip H.S. Torr, Adel Bibi, Christian Schroeder de Witt

View PDF HTML (experimental)

Abstract:Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce {\epsilon}-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of {\epsilon}-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find {\epsilon}-illusory to be significantly harder to detect with automated methods, and a small study with human participants (IRB approval under reference R84123/RE001) suggests they are similarly harder to detect for humans. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses. The project website can be found at this https URL.

Comments:	ICLR 2024 Spotlight (top 5%)
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2207.10170 [cs.AI]
	(or arXiv:2207.10170v5 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2207.10170

Submission history

From: Tim Franzmeyer [view email]
[v1] Wed, 20 Jul 2022 19:49:09 UTC (600 KB)
[v2] Wed, 1 Feb 2023 16:00:59 UTC (1,828 KB)
[v3] Tue, 20 Jun 2023 17:11:12 UTC (1,934 KB)
[v4] Mon, 29 Apr 2024 16:59:57 UTC (1,664 KB)
[v5] Mon, 6 May 2024 06:53:31 UTC (1,664 KB)

Computer Science > Artificial Intelligence

Title:Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators