How differential privacy techniques in adtech protect consumers
Data is knowledge, and knowledge is power – but there has always been a fine line between how much is too much.
While marketers naturally want to know as much as possible about their potential audience, to give them the best chance of targeting people who are interested in their products or services, checks and balances within massive data-gathering operations such as Google have always existed so that actual one-on-one targeting of a single individual is incredibly hard to achieve.
A key element of this is analysing data in aggregate, meaning the data from one person who clicked an online ad, for example, is exposed together en masse with everyone else who did the same. What marketers have been largely unable to do – much as they would really like to – is separate out individual data so they can target or measure on a one-to-one basis. Consumer privacy reigns supreme, at least in theory.
The data privacy challenge
Yet, as this fascinating article showed, if you target a small enough demographic with a niche request, and keep refining the results, identifying individual results from a cohort is way, way simpler than it should be. Smaller groups – such as the 200 people who remain in Antarctica over the winter - mean fewer clicks, and fewer clicks, of course, mean smaller data sets. This data is still in aggregate – but as writer Patrick Berlinquette noted: “By segmenting genders and ages from this data, and then excluding some of these genders and ages from seeing the app ads, we can get a good idea, down to a much smaller subset of people, of the age, gender, etc., of who is using what app in Antarctica.”
Data opens a window into consumers’ daily lives, habits and desires. So how do ethical data platforms maximise the legitimate ability and desire of brands to leverage their data sets to find out as much as possible about their customers, while maintaining individual privacy in all circumstances? The answer is differential privacy.
Differential privacy is designed to protect consumers by removing the ability to ‘sniper target’ one individual within a set of results by constantly refining and querying results to hone the number returned down. Applying various differential privacy techniques and thresholds to statistical results ensure it can be effectively analysed and rich insights unlocked, without ever running the risk of allowing a single individual to be identified.
At InfoSum, we ensure consumer privacy by applying a number of privacy-preserving techniques to all analysis delivered by our Unified Data Platform.
Firstly, we add noise, obscuring the real results by fuzzing them up and down by one or two per cent.
For example, say a query is being run on a dataset to find out how many people live in Hampshire. Without noise being added (or any other differential techniques being applied), someone can add people they suspect of living in Hampshire to the dataset one-by-one and keep re-running the query to see if the result changes. Adding a small level of purposeful inaccuracy through random noise, however, renders this impossible.
And consumer protection becomes even stronger when the other two techniques, redaction and rounding, are subsequently applied.
Our platform will only return results when the number of individuals that match the defined criteria is above a set threshold, normally 100. So to carry on the previous example, if only 95 people from the dataset actually live in Hampshire, redaction will cause that number to show as 0 - any number under the threshold simply doesn’t exist for reporting purposes. This ensures that a group of individuals cannot be filtered in such a way that a single individual is exposed.
Finally, the platform rounds down results by a pre-defined number, normally also 100. This ensures that tweaking or re-querying the results with small numbers of users will not cause it to change - or them to be identified. Using the previous example again, if 1,081 individuals actually meet the criteria of living in Hampshire, and the rounding is set to 100, this would be presented as 1,000. From the users point-of-view, there would need to be a change of over 100 within the dataset before the results would update.
The sum total of applying those three techniques is riding the optimal fine line between privacy and accuracy - they are small enough changes to prevent individuals being re-identified, but also small enough to not affect the accuracy of the aggregate data.
And it’s not just us saying that - we have been externally evaluated to ensure that the quality of our results is not compromised by our championing of consumer privacy.
Standing up for consumers and ethically sound marketing is very important to our ethos, and differential privacy is the key tool that allows us to live those values - with consumers in mind.