Link to home page
Link to home

News from the open internet

Opinion

It’s time to drop the ‘data clean room’ label for more accurate terms

oversized magnifying glasses peer over the shoulders of smartphone users
Illustration by Reagan Hicks / Shutterstock / The Current

How would you define a data clean room?

The guidelines set by the IAB Tech Lab in 2023 define a data clean room like this: a “secure collaboration environment which allows two or more participants to leverage data assets for specific, mutually agreed upon uses, while guaranteeing enforcement of strict data access limitations…”

It’s a helpful description, but the truth is that not all data clean rooms are created equal. The label has become almost meaningless, partly due to lazy or downright dishonest vendors co-opting the term, but also because it’s become a catch-all that glosses over crucial differences in how platforms actually handle privacy and performance.

While the name implies that it’s a safe space, just because something has been called a data clean room doesn’t necessarily guarantee the privacy or security of the entities using it — or of their customers.

I argue that we should stop using the term “data clean room” altogether.

The big problem with ‘data clean rooms’

Inside the theoretical four walls of a data clean room lies a unique technical architecture, capabilities and approach to privacy. Let’s break these three aspects down a little.

From a technical perspective, there are several different but adjacent privacy-enhancing technologies (PETs) that exist today, each with its own strengths and weaknesses. These PETs each have a specific function, but only by using several of them in combination do they provide the foundation for robust privacy protection. Without the right PETs layered in the right way, the data clean room is not fit for purpose.

Speaking of purpose, functionality matters. The notion of the data clean room is that it enables multiple parties to collaborate in a way that helps them reach their business goals. Many of the common use cases are marketing-focused, but if the systems aren’t user-friendly or require specific technical skills to use, then just how useful are they?

Then there’s the approach to privacy. If, when using a “data clean room,” any party has to centralize, commingle, share or in any other way expose their datasets to a partner or partners, then the collaboration is neither private nor secure.

There are solutions on the market that call themselves “data clean rooms” that fail on one or more of these principles. Therefore, we need to move on.

What we should be talking about instead

This is about to get (more) technical, so bear with me.

Rather than data clean rooms, we should instead be talking about data collaboration platforms, private data networks and PETs. Many of the collaborations that are happening today are multiparty, and “data collaboration platform” is an accurate descriptor for an environment where brands, media owners, retailers and other entities can work together to extract maximum value from their first-party data.

“Private data network” takes this concept a step further, with companies now building their own private, bespoke, closed data ecosystems that enable parties to quickly plug in, collaborate and work toward collective goals without moving, sharing or exposing their data. Talking in these terms sets realistic expectations for participants and reduces the risk of misunderstanding while making it harder for unscrupulous vendors to use the term loosely.

We’ve already talked a little about PETs, but in order to understand how private data networks and data collaboration platforms work, we need to get a better grip on what the various PETs are, what they do and the advantages and limitations of each.

For example, pseudonymization replaces sensitive customer data with artificial identifiers (aka pseudonyms). When combined with another PET called differential privacy, “noise” can be added to queries, making it virtually impossible to reverse engineer results to reveal personal information.

Decentralized data processing and secure multiparty computation are PETs that, when combined, ensure each party in a collaboration retains full control of their data, since it is never moved, pooled or centralized. With federated learning, collaborations can then run distributed analyses, generating insights without ever sharing raw data.

Let’s try to build understanding, not muddy the waters

With direct advertising partnerships between media owners and brands now increasingly common, as well as the rise of commerce media, collaboration is central to media strategies. Appreciating the nature of data collaboration platforms and private data networks — and, crucially, what distinguishes good ones from bad ones — is essential for brands, media owners and any other partner that wants to use them effectively.

The “data clean room” label has become dirty; let’s replace it with language that actually promotes understanding.


This op-ed represents the views and opinions of the author and not of The Current, a division of The Trade Desk, or The Trade Desk. The appearance of the op-ed on The Current does not constitute an endorsement by The Current or The Trade Desk.