Nodes clustering in a graph under differential privacy constraints

Abstract

We investigate the problem of nodes clustering in a graph representation of a dataset under privacy constraints. Our contribution is twofold. First we formally define the concept of differential privacy for graphs and give an application setting where nodes clustering under privacy constraints allows for a secure analysis of the experiment. Then we propose a theoretically motivated method combining a sanitizing mechanism (such as Laplace or Gaussian mechanism) with a Minimum Spanning Tree (MST)-based clustering algorithm. It provides an accurate method for nodes clustering in a graph while keeping the sensitive information contained in the edges weights of the graph private. We provide some theoretical results on the robustness of the Kruskal minimum spanning tree construction for both of the sanitizing mechanisms. These results exhibit which conditions the graph’s weights should respect in order to consider that the nodes form well separated clusters. The method has been experimentally evaluated on simulated data, and preliminary results show the good behavior of the algorithm while identifying well separated clusters.

Publication
Paris-Saclay Junior Conference on Data Science and Engineering
Avatar
Rafael Pinot
Junior Professor in Machine Learning