TeskaLabs · TurboCat.io

Features

Connectors

Files
TCP streams
SSL and TLS streams
RabbitMQ/AMQP,
SQL (Oracle, MySQL, PostgreSQL), ODBC
MongoDB
HTTP client and server (pull & push)
REST, SOAP
ElasticSearch
InfluxDB
SQL Databases

*Custom connectors are easy to build, it is open-source core technology.

Data formats

JSON, XML, CSV, TXT
REST, SQL
Excel files

*Custom formats are easy to build, it is open-source technology.

Cryptographic hardware

HSM compatibility
TPM compatibility
YubiKey
PKCS#11 / OpenSSL / OpenSC API

Key rotations

Yes

De-identification methods

Anonymization
Pseudonymization
Data masking
Encryption
…

See more on our blog.

Field-specific de-identification

Yes

Encryption algorithms

AES-512, AES-256, AES-128
ECIES
ECDSA
RSA (4096 bit or equivalent)

Hash functions

SHA-512, SHA-386
BLAKE-512, BLAKE-386, BLAKE-256, BLAKE-224

User Interface

Optional web UI (via web browser)

Real-time processing

Yes

Modes of operations

Fully autonomous
Semi-automated
Manual

Support

SLA with 24/7 or 8/5.
We provide L2 and L3.

Deployment options

On-premise
Public cloud (AWS, Azure, Google Cloud, …)
Private cloud

Hardware requirements (typical)

2 CPU cores, x86-64, 4GB RAM, HDD 20GB

High Availability

Yes, supported.

Software requirements

OS Linux Red Hat, CentOS, Ubuntu
Containers: Docker, LXC, Kubernetes (optionally)

Directory services

Active Directory
LDAP

De-identification methods

Pseudonymization

GDPR defines pseudonymization as the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information. By holding the de-identified data separately from the “additional information,” the GDPR permits data handlers to use personal data more liberally without fear of infringing on the rights of data subjects. This is because the data only becomes identifiable when both elements are held together.

According to the GDPR, Pseudonymization may facilitate the processing of personal data beyond the original collection purposes.

Illustration of the pseudonymization process at work.

Anonymization

Anonymization is the irreversible removal of information that could lead to an individual being identified, either on the basis of the removed information or in combination with other information. This definition emphasizes that anonymized data must be stripped of any identifiable information, making it impossible to derive insights on a discreet individual, even by the party that is responsible for the anonymization.

When done properly, anonymization places the processing and storage of personal data outside the scope of the GDPR.

Illustration of the anonymization process at work.

Encryption

Encryption translates data into another form so that only people or a system with access to a secret key —formally called a decryption key— can read it. Under Article 32 of GDPR, controllers are required to implement risk-based measures to protect data security. One such measure is the “encryption of personal data” that “renders the data unintelligible to any person who is not authorized to access it".

Businesses can use encryption to meet the GDPR’s data security requirements.

Illustration of the encryption process at work.

Data Masking

Data masking or suppression is an extreme form of anonymization. It replaces information with pre-defined fixed text (or a black tape). Data masking is very simple to implement and very effective in removal of sensitive data. On the other hand, any statistical or analytical value of data is lost in the masking process.

Businesses can use encryption to meet the GDPR’s data security requirements.

Illustration of the suppression process at work. The sensitive information has been replaced by XXX.

Use cases

De-identify data exports from the production database

It is typical for organizations to export production data, including sensitive personal information, for marketing purposes, to test new versions of apps, etc. Exported data contains readable information about users, such as their names, email addresses, phone numbers, home addresses, and so on.

Risk: An unauthorized person accesses an exported file that contains Personally Identifiable Information (PII). He then takes a copy of that file and uploads it to the public internet or darknet. This has resulted in a so-called data breach and the company is liable under GDPR.

Protection: The proper application of anonymization, pseudonymization, and encryption prevents data leaks right at the source.

Breach example: In July 2017, the Czech Republic e-commerce site MALL.cz suffered a data breach after which the information of 735,000 unique accounts (including email addresses, names, phone numbers, and passwords) was later posted online.
Source: haveibeenpwned.com

TurboCat.io performs de-identification during data exports and, therefore, minimizes the risk of sensitive data being leaked.

Encrypt sensitive data in archives and backups

One of the targets of the cyber attacker are archives and backups because they contain the same valuable data as the production database, but they are usually much less protected. For this reason, the privacy protection regulation requires deletion of a person from all archives and backups. This is a very tedious and costly task to implement.

Risk: The cyber attacker steals files with the production database archives. The attacker then extracts all Sensitive Personal Information (SPI) from these archives and publishes the data on the internet or darknet. This has resulted in a so-called data breach and the company is liable under GDPR.

Protection: Applying encryption, anonymization, and pseudonymization to all archives and backups will prevent the attacker from extracting sensitive data from the stolen archive.

Breach example: In 2012, TD Bank misplaced computer backup tapes containing personally identifiable information for 267,000 customers. TD Bank later paid an $850,000 fine for this data breach.
Source: Bank Info Security

Re-risking Big data

The use and adoption of big data within organization processes allow efficiencies in terms of cost, productivity, and innovation, but the process does not come without flaws. One of these flaws is the fact that the data sets that are created and stored can contain huge amounts of sensitive information. Control of access to this information is very difficult.

Risk: Organization loses control of sensitive data and it gets into the hands of unauthorized people. These people can then copy and export the data outside of the organization. This has resulted in a so-called data breach and the company is liable under GDPR.

Protection: Routine anonymization, pseudonymization, and data encryption before data is loaded into big data systems.

Three Big Data Threat Vectors: Oracle

TurboCat.io integrates with big data technologies (Hadoop, ElasticSearch) and transparently ensures the de-identification of sensitive data.

Safe sharing of data with third-party vendors

Nowadays, it is a widespread practice that organizations share production data with third-parties for the purpose of further processing, analysis, cleaning, etc. This obviously raises the risks associated with data leaks as the organization typically does not perform security control of data before it leaves the IT infrastructure of the organization.

Risk: The third-party does not provide sufficient protection for confidential data and a data leak occurs. This has resulted in a so-called data breach and the company is liable under GDPR.

Protection: Anonymization, pseudonymization, and encryption of data before sending to a third-party.

Breach example: Equifax leaked 143 million individuals’ personal information. Equifax blamed this breach on a flaw of third-party software it was using.
Source: CSO Online

TurboCat.io provides anonymization, pseudonymization, and encryption as well as integration capabilities in typical B2B scenarios.

Protection of sensitive personal information in the healthcare industry

Healthcare has been a long-term driver in research about SPI de-identification. The HIPAA Privacy Rule (in the United States of America) provides mechanisms for using and disclosing health data responsibly without the need for patient consent. These mechanisms center on two HIPAA de-identification standards: Safe Harbor and the Expert Determination Method. Safe Harbor relies on the removal of specific patient identifiers (e.g. name, phone number, email address, etc.) while the Expert Determination Method requires knowledge and experience with generally accepted statistical and scientific principles and methods to render information not individually identifiable.

Breach example: New York and Presbyterian Hospital data breach (damage: $3.3 million).
Source: HHS.gov

More breach examples: Healthcare IT News

GDPR

European Data Protection Law

Selected recitals

Recital 26

The principles of data protection should apply to any information concerning an identified or identifiable natural person.

Personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person.

To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.

To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person** or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

Recital 28

The application of pseudonymization to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations. The explicit introduction of ‘pseudonymization’ in this Regulation is not intended to preclude any other measures of data protection.

Recital 29

In order to create incentives to apply pseudonymization when processing personal data, measures of pseudonymization should, whilst allowing general analysis, be possible within the same controller when that controller has taken technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented, and that additional information for attributing the personal data to a specific data subject is kept separately. The controller processing the personal data should indicate the authorised persons within the same controller.

Recital 75

A personal data breach may, if not addressed in an appropriate and timely manner, result in physical, material or non-material damage to natural persons such as loss of control over their personal data or limitation of their rights, discrimination, identity theft or fraud, financial loss, unauthorised reversal of pseudonymization, damage to reputation, loss of confidentiality of personal data protected by professional secrecy or any other significant economic or social disadvantage to the natural person concerned.

TurboCat.io

Data anonymization tool for GDPR

Prevent data leakage

GDPR compliant

Data privacy by design

Battle tested

High performance

Wide compatibility

Fast deployment

How it works

Let's do it

Product information

Features

FAQ

Does TurboCat.io store any data?

Are there training courses that are necessary to use the tool?

Can you de-identify a data in-line between the application and its database (e.g. as a ODBC or JDBC proxy)?

De-identification methods

Pseudonymization

Anonymization

Encryption

Data Masking

Use cases

De-identify data exports from the production database

Encrypt sensitive data in archives and backups

Re-risking Big data

Safe sharing of data with third-party vendors

Protection of sensitive personal information in the healthcare industry

GDPR

Recital 26

Recital 28

Recital 29

Recital 75