How does data de-identification work?

By Sara Kassabian/ Published on October 16, 2018

Previously, we answered a commonly asked question: What constitutes as protected health information (PHI)? This time, we take our series a step further, and explain how de-identifying PHI will allow your business to work with health behavior data without liability.

Let’s return to the original equation:

PHI = Personally Identifiable Information (PII) + health information.

Most companies only need the health information component. The PII is extraneous. By de-identifying the data (i.e., removing the PII from the equation), the liability for businesses working with health data is dramatically reduced. Below is a hypothetical example that casts the roles of Covered Entity, Business Associate, and Data Subject in a de-identification scenario.

De-identifying PHI in Gotham City

In our example: ACME Research is the Business Associate; Gotham City Department of Health is the Covered Entity.

Now, let’s say the Gotham City Department of Health (Covered Entity) hired ACME Research (Business Associate) to conduct an epidemiological study about the prevalence of PTSD in the city.

Traditionally, ACME Research would receive full patient records, which is PHI, from Gotham City Department of Health to begin its analysis. Because it is handling PHI, ACME Health is forced to accept the full compliance requirements of HIPAA.

But the reality is, ACME does not need patients' full medical record, which includes PII such as their name and exact address, to conduct their research. ACME only needs non-identifying demographic and medical information to stratify the data in search of trends. In other words, ACME is taking on compliance and data security burdens unnecessarily.

Holy pseudonymization, Batman!

By removing the PII from the health information, a process called pseudonymization (or more generally, de-identification, tokenization), the researchers at ACME no longer have to worry about having PHI in their system, because the de-identified data does not confer the same levels of protection under HIPAA.

bruce-wayne-deidentified (1)

Let’s explore an example of patient record:

Bruce Wayne is diagnosed with PTSD = PHI

Next, we pseudonymize it.

“Patient J^g7xz(3hG9!?6x is diagnosed with PTSD” is not PHI because the PII has been stripped from the data. Now, it is just health data that ACME Health can use for their study without ever having to worry about building a HIPAA compliant application in which to store it.

 Store the PHI in TrueVault

Even after the health data is de-identified, the original, identifying medical information ought to be stored somewhere that is HIPAA compliant and secure so it can be re-identified later if needed.

This is where Business Associate Agreements (BAA) come into play.

ACME Research may not have the resources to build their own HIPAA compliant application to store the PHI that drives their business. The company can sign a BAA with a company like TrueVault to accomplish their business goals without having to worry about HIPAA compliance.

To help ACME Research achieve its goal and limit the scope of HIPAA compliance for the company, TrueVault would use our Tokenization Engine to do the following:

  1. Tokenization Engine will collect the data from Gotham City Department of Health on behalf of ACME health, which includes the medical record for Bruce Wayne: “Bruce Wayne is diagnosed with PTSD.”
  2. Tokenization Engine will de-identify the data, so it now reads “Patient J^g7xz(3hG9!?6x is diagnosed with PTSD”. The PII is stored in our SecureVault or removed entirely, whichever ACME Research prefers.
  3. Tokenization Engine will then send the de-identified data to ACME.

TrueVault inherits our clients’ risk by transferring and/or storing all PHI in SecureVault, our HIPAA compliant data solution. Because the PHI will never touch ACME Research's servers, the scope of compliance concerns for the company is limited. Our team keeps up with the laws governing PHI to ensure that our clients alway stay compliant with federal regulation.

Latest Posts

Inside the Vault: Searching and Fetching Data

Inside the Vault: How data flows in TrueVault

Will the midterms impact tech?

Is antivirus software good or bad?

What's the difference between PII and personal data?

Mailing List