Our technology uses identity-based matching to join and analyze any number of isolated datasets, from virtually any source.
Rather than centralizing the datasets, the datasets are held in isolated instances and connected in a distributed network. The query is sent to the instances, rather than the data being moved.
The platform scales to connect and query any number of databases and produces the same analytical output as stitching the datasets together in one location, without the data privacy, trust and implementation barriers.
The technology automates data transformations by mapping each dataset into a global schema. This enables users to normalize and anonymize multiple datasets, without adapting the source data.
The engine is built to translate different data formats. For example, a customer’s age could be represented as a date of birth, the exact number of years, or by an age range, but InfoSum’s ensures they can be compared.
Direct imports can be made from common databases, such as MySQL, and standard formats, such as CSV files. The engine can be configured to periodically update and normalize tens of millions of rows within minutes.
Our technology irreversibly anonymizes identifying data, before deleting the original data. Then the record can be connected to other datasets by a non-reversible mathematical model.
Data owners are able to utilize their existing customer identifiers (e.g. email address, name and email, loyalty number), alongside those owned by third parties.
During a query, these identities are mapped across the isolated datasets and the optimal match rate is determined, which is flexible based on the objective.
The identifier in each dataset does not need to match. This enables analytics across a range of data sources. Users can streamline comparing their first party data, with that owned by third parties.
InfoSum enables organizations to learn as much as possible from a multitude of datasets, without exposing anything about an individual.
It applies a range of privacy controls based on standard data anonymization techniques and concepts of Differential Privacy allowing insights to be gained from datasets while making it impossible to extract information about a single individual.
Obscures the source data by rounding the result up or down.
Adds a small amount of deterministic fuzziness to protect individual privacy.
Data owner defines the minimum bucket size returned in the query.
Prevents over collection of data and over use of the platform.
Customer data is held in a secure decentralized network of cloud containers, known as Bunkers. Each Bunker is private to the specific dataset, maintaining data isolation and helping ensure regulatory compliance.
Bunkers are held on their own isolated subnets within a private Virtual Private Cloud, and all communications with the Bunker are protected by TLS 1.2 (HTTPS).
Rigorous firewalling further restricts the network traffic entering or leaving a Bunker.