On the impact of randomization on robustness in machine learning

Abstract

This thesis investigates the theory of robust classification under adversarial perturbations (a.k.a. adversarial attacks). An adversarial attack refers to a small (humanly imperceptible) change of an input specifically designed to fool a machine learning model. The vulnerability of state of the art classifiers to these attacks has genuine security implications especially for deep neural networks used in AI driven technologies (e.g. for self driving cars). Besides security issues, this shows how little we know about the worst-case behaviors of models the industry uses daily. Accordingly, it became increasingly important for the machine learning community to understand the nature of this failure mode to mitigate the attacks. One can always build trivial classifiers that will not change decision under adversarial manipulation (e.g. constant classifiers) but this comes at odds with standard accuracy of the model. This raises several questions. Among them, we tackle the following one. Can we build a class of models that ensure both robustness to adversarial attacks and accuracy? We first provide some intuition on the adversarial classification problem by adopting a game theoretical point of view. We present the problem as an infinite zero-sum game where classical results (e.g. Nash or Sion theorems) do not apply. We then demonstrate the non existence of a Nash equilibrium in this game when the classifier and the adversary both use deterministic strategies. This constitutes a negative answer to the above question in the deterministic regime. Nonetheless, the question remains open in the randomized regime. We tackle this problem by showing that randomized classifiers outperform deterministic ones in term robustness against realistic adversaries. This gives a clear argument for further studying randomized strategies as a defense against adversarial example attacks. Consequently, we present an analysis of randomized classifiers (i.e. classifiers that output random variables) through the lens of statistical learning theory. To do so, we first define a new notion of robustness for randomized classifiers using probability metrics. This definition boils down to forcing the classifier to be locally Lipschitz. We then devise bounds on the generalization gap of any randomized classifier that respects this new notion of robustness. Finally, we upper- bound the adversarial gap (i.e. the gap between the risk and the worst-case risk under attack) of these randomized classifiers. Finally, we highlight some links between our line of research and another emerging topic in machine learning called differential privacy. Both notions build upon the same theoretical ground (i.e. stability of probability metrics). Therefore, results from one domain can be transferred to the other. Based on this idea, we use the differential privacy literature to design a simple noise injection method. The scheme allows us to build a class of robust randomized classifiers out of a deterministic hypothesis class, making our previous findings applicable to a wide range of machine learning models. Open questions and perspectives for future research conclude this work.

Publication
PhD Thesis in Computer Science, PSL University (Paris-Dauphine)
Avatar
Rafael Pinot
Junior Professor in Machine Learning