Rafael Pinot

Junior Professor in Machine Learning

Sorbonne Université


I am a junior professor in the department of mathematics at Sorbonne Université (Paris 6). I hold a chair on the mathematical foundation of computer and data science within the LPSM research unit. My main line of research is in statistical machine learning with a focus on the privacy and robustness of machine learning algorithms. Most of my work has a theoretical flavor, but I also like to participate in more applied projects to get deeper understanding of state-of-the-art methods, or simply to better grasp the gap that can exist between the theoretical analysis and the empirical performance of some machine learning algorithms.

From 2021 to 2023, I was a postdoctoral researcher at École Polytechnique Fédérale de Lausanne, where I worked with Pr. Rachid Guerraoui and Pr. Anne-Marie Kermarrec within the Ecocloud Research Center. From 2017 to 2020 I completed my PhD in Computer Science at Université Paris Dauphine-PSL and CEA LIST institute, Université Paris Saclay where I was advised by Pr. Jamal Atif, Dr. Florian Yger, and Dr. Cédric Gouy-Pailler.


  • Robustness in Machine Learning
  • Privacy Preserving Data Analysis
  • Distributed Machine Learning
  • Theory of Deep Learning
  • Graph Theory


  • PhD in Computer Science, 2020

    PSL University (Paris-Dauphine)

  • MSc in Applied Mathematics, 2017

    Sorbonne Université (Paris 6)

  • BSc in Applied Mathematics, 2016

    Sorbonne Université (Paris 1)


(2020). On the impact of randomization on robustness in machine learning. PhD Thesis in Computer Science, PSL University (Paris-Dauphine).

PhD Thesis Slides

(2017). Minimum spanning tree release under differential privacy constraints. Master Thesis in Statistics, Sorbonne Université (Paris 6).

Master Thesis

Some Recent Publications

On the Inherent Anonymity of Gossiping

The notion of adversary is a staple of distributed computing. An adversary typically models hostile assumptions about the underlying distributed environment, e.g., a network that can drop messages, an operating system that can delay processes or an attacker that can hack machines. So far, the goal of distributed computing researchers has mainly been to develop a distributed algorithm that can face a given adversary, the abstraction characterizing worst-case scenarios. This paper initiates the study of the somehow opposite approach. Given a distributed algorithm, the adversary is the abstraction we seek to implement. More specifically, we consider the problem of controlling the spread of messages in a large- scale system, conveying the practical motivation of limiting the dissemination of fake news or viruses. Essentially, we assume a general class of gossip protocols, called all-to-all gossip protocols, and devise a practical method to hinder the dissemination. We present the Universal Gossip Fighter (UGF). Just like classical adversaries in distributed computing, UGF can observe the status of a dissemination and decide to stop some processes or delay some messages. The originality of UGF lies in the fact that it is universal, i.e., it applies to any all-to-all gossip protocol. We show that any gossip protocol attacked by UGF ends up exhibiting a quadratic message complexity (in the total number of processes) if it achieves sublinear time of dissemination. We also show that if a gossip protocol aims to achieve a message complexity α times smaller than quadratic, then the time complexity rises exponentially in relation to α. We convey the practical relevance of our theoretical findings by implementing UGF and conducting a set of empirical experiments that confirm some of our results.

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines’ data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.

Robust Collaborative Learning with Linear Gradient Overhead

Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of (alpha, lambda)-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.

Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity

Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.

Towards Consistency in Adversarial Classification

In this paper, we study the problem of consistency in the context of adversarial examples. Specifically, we tackle the following question; can surrogate losses still be used as a proxy for minimizing the 0/1 loss in the presence of an adversary that alters the inputs at test-time? Different from the standard classification task, this question cannot be reduced to a point-wise minimization problem, and calibration needs not to be sufficient to ensure consistency. In this paper, we expose some pathological behaviors specific to the adversarial problem, and show that no convex surrogate loss can be consistent or calibrated in this context. It is therefore necessary to design another class of surrogate functions that can be used to solve the adversarial consistency issue. As a first step towards designing such a class, we identify sufficient and necessary conditions for a surrogate loss to be calibrated in both the adversarial and standard settings. Finally, we give some directions for building a class of losses that could be consistent in the adversarial framework.


Université Paris Dauphine PSL

  • Lecturer in Trustworthy Machine Learning - MBa C.S (2019-2020)

  • Lecturer in Mathematics for Machine Learning - MSc C.S (2019-2020)

  • Teaching assistant in Introduction to Machine Learning - MSc C.S (2017-2020)

  • Teaching assistant in Introduction to Machine Learning - MBa B.I (2017-2020)

Sorbonne Université (Paris 1)

  • Teaching assistant in Calculus - BSc Mathematics (2017-2018)


Contact at Sorbonne Université

  • E-Mail address: {lastname}[at]lpsm.paris

Contact at EPFL (may be deprecated)

  • E-Mail address: {firstname}.{lastname}[at]epfl.ch