If you recently got a memo (encrypted, one would hope) to review Privacy-Enhancing Technologies (PETs), you’re not alone. The Royal Society has a PET working group. IAB Tech Lab has a PET Working Group. The Annual Privacy Forum run by ENISA, the European Union Agency for Cybersecurity, has a PET program. The White House just launched a PET prize challenge. For real.
As a rule, we should be wary of stampedes, especially those with cute acronyms. But PETs are the exception that proves the rule. Designed to make marketing use cases (like targeting, personalization, or attribution) compliant with prevailing consumer privacy regulations, PETs have the potential to rebuild trust between parties in the advertising ecosystem that have grown dangerously apart: advertisers, publishers, and, most importantly, consumers. The end justifies the hype, in my opinion.
Think consumer trust isn’t a big issue? Check out this recent LWT piece by John Oliver, where he describes data brokers as “middlemen of surveillance capitalism,” and loopholes in the Fourth Amendment protection against unreasonable searches and seizures. Ouch.
Under pressure to comply with the law and do right by their customers, top brands and media companies are putting a stop to their traditional data sharing practices. While this could hamstring data-driven marketing as we’ve come to know it, new PET advances are making it possible for marketers to collaborate with partners without sharing first-party data. These developments have the potential to unlock new business opportunities, but only for those who do it properly.
Why the interest in PETs, all of a sudden?
Data privacy isn’t a new development. In the US, financial records have been protected under the Gramm-Leach-Bliley Act since 1999. Affiliate data derived from consumer reports has been strictly regulated by FACTA since 2003. Medical records have been shielded under HIPAA’s Security Rule since 2006. And Census officials have been sworn to uphold Title 13 and protect personal information since 1954, long before the Bureau started to use differential privacy.
But for most other industries, we can attribute the sudden surge in interest to three key triggers: the passage of GDPR in 2018; the death of digital identifiers (like third-party cookies and device IDs); and the explosion of data generated by dozens of new social, mobile, streaming, and CTV channels.
According to UNCTAD, the UN Conference on Trade and Development, there are now 137 countries around the world with some form of data privacy legislation. In the US, California led the way with CCPA and CPRA, followed last year by Virginia and Colorado, Utah and Connecticut in early 2022, and many others on the way. There’s no turning back.
What are PETs, exactly?
All the working groups I mentioned earlier are putting together their own PET taxonomy. Some are inspired by Daniel Solove’s 2006 Taxonomy of Privacy and categorize PETs based on their contribution to solving specific privacy risks (like surveillance, insecurity, or appropriation). Others, like the UK Center for Data Ethics and Innovation, categorize PETs based on their level of complexity, or market readiness (traditional vs. emerging). Yet others, like the Federal Reserve Bank of San Francisco, categorize PETs based on their primary mechanism: altering data, for example, or shielding it.
At InfoSum, we look at PETs through the prism of modern data clean rooms. A clean room, for us, is a decentralized environment where multiple parties can collaborate across their first-party data assets without exposing that raw data—and, crucially, without moving that data either. That perspective has led us to group PETs into the following four categories:
This group includes techniques like homomorphic encryption and synthetic data. Companies don’t always have the in-house analytical resources to securely process large amounts of customer data. If they need to expose their data to an outside party for processing—in the cloud, for example—homomorphic encryption makes it possible for that outside party to process it in encrypted form. Synthetic data achieves similar data exposure protection by replacing all values in the original dataset, without affecting that dataset’s statistical properties.
This group includes well-known data masking techniques like anonymization (where identifying data is irreversibly scrubbed from the dataset so the individual cannot be reidentified) and pseudonymization (where identifiers are replaced with pseudonyms but can be retrieved using a mapping table). Pseudonymization, in particular, is drawing a lot of attention because it’s relatively easy to understand, and because it was spelled out in GDPR as a way to reduce the risk to data subjects. ENISA just released a report to show companies how to deploy it on healthcare data. Bloom filters are part of this PET category too, because they make it possible to estimate overlap between multiple datasets without moving raw identifiers. And their compact data structure makes them computationally very efficient for large scale data collaboration.
These are techniques like cohorts and differential privacy that shift the focus from the individual to a group of individuals with shared characteristics (like demographics, interests, or purchase intent). They involve a tradeoff between privacy and precision that can be difficult to quantify, but that hasn’t stopped them from making headlines: Google’s Topics API, IAB Tech Lab’s Audience Taxonomy, and the adoption of differential privacy by the US Census Bureau for its 2020 count are all high-profile projects that are fueling R&D in this space. With contextual advertising on the rise, those PET solutions will only continue to attract interest.
This is where techniques like secure multi-party computation and federated learning belong. It’s an often overlooked PET area, primarily because these techniques were originally developed for applications in cryptography and decentralized network topologies. But they’re absolutely essential in the context of data privacy, because they allow the analysis (or machine learning) to be broken up and reconciled across multiple parties or edge devices without sharing the input data, or moving it around.
Don’t data clean rooms come with PETs by default?
To continue to collaborate with their partners while meeting their data privacy obligations, many companies are now adopting data clean rooms. This is great news, but only as long as those clean rooms are built on strong PET principles. It’s a common misconception that all data clean rooms come with best-in-class PET features right out of the box. They don’t. If a vendor tells you that its platform is a clean room simply because it has encryption built-in, run.
Here’s how my colleague Nick Halstead put it in a recent podcast: “The question I always ask is: How do you know who's the same person across two datasets? If you need to go through a separate ID matching process, then all of your concepts around decentralization, and around the way that you're doing the processing of the data, are flawed. It doesn’t matter if you’re using hashing, or edge processing, or federated learning. You can't call yourself a clean room, or privacy-safe, if you have to run a separate ID matching process that involves handing the data over.”
The sum is greater than its parts
In today’s complex marketing and advertising ecosystem, even the most basic use case requires tremendous scale, speed, computational power—and an eye for privacy considerations every step of the way. Every PET has its limitations, but those limitations can be overcome when PETs are deployed in combination.
For example, at InfoSum, we’ve built our data clean rooms using a principle we call non-movement of data. Features like private-set intersection (using techniques like Bloom filters to calculate the intersection between multiple datasets faster and more efficiently), decentralized edge processing, and differential privacy sit at the core of our solution. Combined with data bunkers (private cloud instances) and advanced permission controls, we’ve developed a data clean room that, we believe, ticks all the boxes for data-driven marketing and advertising applications at scale.
But that doesn’t mean that those tick boxes will be the same tomorrow. As privacy needs continue to evolve, our industry needs to work together and establish a baseline of standards for privacy not just in clean rooms, but across all marketing use cases. We also need to do a better job communicating PET benefits to marketers and C-suite decision-makers who may not have a degree in data science or privacy engineering, but are ultimately responsible for PET adoption in their organization.
A recent article in the New York Times illustrates the challenge ahead. It reports that, according to the 2020 Census, 14 people (13 adults and one child) live underwater in a 700-ft bend of the Chicago River. That type of story, while well-intentioned in its attempt to explain the benefits of PETs (in this case, differential privacy) to the general public, can also be confusing to the uninitiated. How can the Census be right as a whole, and yet so wrong in the details? It’s hard for people outside the industry to wrap their mind around privacy loss as a mathematical equation.
But if we want to encourage marketers to look at privacy not as an impediment to their business, but as an opportunity to expand their data collaboration efforts in new directions, we need to take up that challenge. The future of marketing depends on it. Let’s get to work.