InfoSum Logo

Connect data without moving it

InfoSum’s core technology connects and analyzes spatially separated and disparate datasets, without any exchange of raw data occurring.


Distributed computing

Creating a trusted environment through distributed computing

Our technology uses identity-based matching to join and analyze any number of isolated datasets, from virtually any source.

Rather than centralizing the datasets, the datasets are held in isolated instances and connected in a distributed network. The query is sent to the instances, rather than the data being moved.

The platform scales to connect and query any number of databases and produces the same analytical output as stitching the datasets together in one location, without the data privacy, trust and implementation barriers.

Data normalization

Overcome variations in data formats through an AI-powered automated normalization process

Global schema

Global schema

The technology automates data transformations by mapping each dataset into a global schema. This enables users to normalize and anonymize multiple datasets, without adapting the source data.

Flexible representations

Flexible representations

The engine is built to translate different data formats. For example, a customer’s age could be represented as a date of birth, the exact number of years, or by an age range, but InfoSum’s ensures they can be compared.

Simple integrations

Simple integrations

Direct imports can be made from common databases, such as MySQL, and standard formats, such as CSV files. The engine can be configured to periodically update and normalize tens of millions of rows within minutes.

Identity resolution

Connect existing identities without prior mapping to a single common identity

Our technology irreversibly anonymizes identifying data, before deleting the original data. Then the record can be connected to other datasets by a non-reversible mathematical model.

Data owners are able to utilize their existing customer identifiers (e.g. email address, name and email, loyalty number), alongside those owned by third parties.

During a query, these identities are mapped across the isolated datasets and the optimal match rate is determined, which is flexible based on the objective.

The identifier in each dataset does not need to match. This enables analytics across a range of data sources. Users can streamline comparing their first party data, with that owned by third parties.

Differential Privacy concepts

Maximize query accuracy while preserving privacy

InfoSum enables organizations to learn as much as possible from a multitude of datasets, without exposing anything about an individual.

It applies a range of privacy controls based on standard data anonymization techniques and concepts of Differential Privacy allowing insights to be gained from datasets while making it impossible to extract information about a single individual.


Obscures the source data by rounding the result up or down.


Adds a small amount of deterministic fuzziness to protect individual privacy.

Redaction thresholds

Data owner defines the minimum bucket size returned in the query.

Rate Limits

Prevents over collection of data and over use of the platform.

Data security

Isolated data hosting via cloud containers

Customer data is held in a secure decentralized network of cloud containers, known as Bunkers. Each Bunker is private to the specific dataset, maintaining data isolation and helping ensure regulatory compliance.

Bunkers are held on their own isolated subnets within a private Virtual Private Cloud, and all communications with the Bunker are protected by TLS 1.2 (HTTPS).

Rigorous firewalling further restricts the network traffic entering or leaving a Bunker.