Minimum spanning tree release under differential privacy constraints

Abstract

We investigate the problem of nodes clustering under privacy constraints when representing a dataset as a graph. Our contribution is threefold. First we formally define the concept of differential privacy for structured databases such as graphs, and give an alternative definition based on a new neighborhood notion between graphs. This definition is adapted to particular frameworks that can be met in various application fields such as genomics, world wide web, population survey, etc. A thorough theoretical analysis of our algorithm stressing the comparison between our bound and the state of the art theoretical bound is presented. To illustrate and support this result we also perform some experiments comparing the two methods based on simulated graphs. Finally, we propose a theoretically motivated method combining a sanitizing mechanism (such as Laplace or our new algorithm) with a Minimum Spanning Tree (MST)-based clustering algorithm. It provides an accurate method for nodes clustering in a graph while keeping the sensitive information contained in the edges weights of the private graph. We provide some theoretical results on the robustness of an almost minimum spanning tree construction for Laplace sanitizing mechanisms. These results exhibit which conditions the graph weights should respect in order to consider that the nodes form well separated clusters both for Laplace and our algorithm as sanitizing mechanism. The method has been experimentally evaluated on simulated data, and preliminary results show the good behavior of the algorithm while identifying well separated clusters.

Publication
Master Thesis in Statistics, Sorbonne Université (Paris 6)
Avatar
Rafael Pinot
Junior Professor in Machine Learning