At a high level, the job of the k-anonymity server is to keep track, for each
"set" S, how many distinct users have invoked Join
on S over a given
lookback period w (say, last 7 days). If the set size exceeds a predetermined
threshold k, the set is said to have k-anonymity status. The k-anonymity
server continually updates the k-anonymity status of each set known to it, and
publishes the status of each set via the Query
endpoint.
In this document, we explain how the k-anonymity status of a set is updated and published, to limit the ability of a bad actor to learn information about individual user behavior by observing changes in the output of the Query endpoint.
We employ the following techniques to protect user's privacy:
-
The status of each set is only updated at periodic intervals (such as once an hour) rather than continuously. This limits the ability of tying a specific set status change (from no to yes) to a specific
Join
action based on timing. -
We add a calibrated amount of noise to both the true set size and the k-anonymity threshold when evaluating the status of a set. This ensures differential privacy properties for the algorithm and masks the contribution of any single user's actions on the status of the set.
-
We limit the frequency with which the status of a set can change. While it is important to support fast ramp-ups (a large number of users Join-ing a set should result in the set achieving k-anonymity quickly), we limit the speed at which k-anonymity status is lost.
In Differentially Private Algorithms for 𝑘-Anonymity Server we analyze the algorithm and formally bound the amount of information leakage.
We begin by describing the main parameters of the algorithm and presenting the algorithm itself at a high level. We then propose a list of initial parameter settings.
We will use the following parameters in the AboveThresholdWithPeriodicRestart algorithm:
Parameter | Definition |
---|---|
|
|
We use |
|
We use |
|
|
The following paragraphs describe the algorithm at a high level. For further details, see Differentially Private Algorithms for 𝑘-Anonymity Server.
Let
Parameters:
Inputs: Set
State:
Algorithm
// algorithm restart
if t-1 = 0 mod w:
state = false;
// modify k with truncated Laplace noise mu (function of epsilon)
generate k'=k+mu;
if state == true:
A_t(S)=true;
else:
generate nu; // truncated Laplace noise
A_t(S)=(C(S,t-w,t)+nu <= k');
We propose the following parameter values to determine k-anonymity status of a set.
Parameter | Value |
---|---|
Update period |
|
Lookback window |
|
Set size |
|
The differential privacy algorithm needs the
Parameter | Value |
---|---|
|
We verified that these parameters resulted in a limited amount of noise (as shown in our document Differentially Private Algorithms for 𝑘-Anonymity Server).