Theoretical evidence for adversarial robustness through randomization

Abstract

This paper investigates the theory of robustness against adversarial attacks. It focuses on the family of randomization techniques that consist in injecting noise in the network at inference time. These techniques have proven effective in many contexts, but lack theoretical arguments. We close this gap by presenting a theoretical analysis of these approaches, hence explaining why they perform well in practice. More precisely, we provide the first result relating the randomization rate to robustness to adversarial attacks. This result applies for the general family of exponential distributions, and thus extends and unifies the previous approaches. We support our theoretical claims with a set of experiments.

Publication
Conference on Neural Information Processing Systems
Avatar
Rafael Pinot
Junior Professor in Machine Learning