At a high level, the job of the k-anonymity server is to keep track, for each
"set" S, how many distinct users have invoked Join
on S over a given
lookback period w (say, last 7 days). If the set size exceeds a predetermined
threshold k, the set is said to have k-anonymity status. The k-anonymity
server continually updates the k-anonymity status of each set known to it, and
publishes the status of each set via the Query
In this document, we explain how the k-anonymity status of a set is updated and published, to limit the ability of a bad actor to learn information about individual user behavior by observing changes in the output of the Query endpoint.
We employ the following techniques to protect user's privacy:
The status of each set is only updated at periodic intervals (such as once an hour) rather than continuously. This limits the ability of tying a specific set status change (from no to yes) to a specific
action based on timing. -
We add a calibrated amount of noise to both the true set size and the k-anonymity threshold when evaluating the status of a set. This ensures differential privacy properties for the algorithm and masks the contribution of any single user's actions on the status of the set.
We limit the frequency with which the status of a set can change. While it is important to support fast ramp-ups (a large number of users Join-ing a set should result in the set achieving k-anonymity quickly), we limit the speed at which k-anonymity status is lost.
In Differentially Private Algorithms for 𝑘-Anonymity Server we analyze the algorithm and formally bound the amount of information leakage.
We begin by describing the main parameters of the algorithm and presenting the algorithm itself at a high level. We then propose a list of initial parameter settings.
We will use the following parameters in the AboveThresholdWithPeriodicRestart algorithm:
Parameter | Definition |
We use |
We use |
The following paragraphs describe the algorithm at a high level. For further details, see Differentially Private Algorithms for 𝑘-Anonymity Server.
Inputs: Set
// algorithm restart
if t-1 = 0 mod w:
state = false;
// modify k with truncated Laplace noise mu (function of epsilon)
generate k'=k+mu;
if state == true:
generate nu; // truncated Laplace noise
A_t(S)=(C(S,t-w,t)+nu <= k');
We propose the following parameter values to determine k-anonymity status of a set.
Parameter | Value |
Update period |
Lookback window |
Set size |
The differential privacy algorithm needs the
Parameter | Value |
We verified that these parameters resulted in a limited amount of noise (as shown in our document Differentially Private Algorithms for 𝑘-Anonymity Server).