← Return to Blog Home

PII vs PHI vs PCI: What They Are, How They Differ, and How to Protect All Three

Bilal Khan

April 2, 2026

PII, PHI, and PCI data overlap in ways that multiply compliance risk. Learn the differences, 2026 penalty structures, and how tokenization protects all three.

TL;DR

  • PII = any data identifying a person (broadest category)
  • PHI = PII linked to healthcare, governed by HIPAA
  • PCI = cardholder data, governed by PCI DSS 4.0
  • Tokenization protects all three from a single control layer

Personally identifiable information (PII), protected health information (PHI), and Payment Card Industry (PCI) data are the three categories of sensitive data that drive the majority of enterprise compliance obligations. 

  • PII is any data that can identify a person – i.e., names, Social Security numbers, email addresses, biometrics.

  • PHI is a regulated subset of PII: personally identifiable data linked to healthcare services and governed by HIPAA.

  • PCI data refers to cardholder data and sensitive authentication data protected under the PCI DSS 4.0 standard.

These categories overlap – e.g., a patient paying a hospital co-pay with a credit card creates a record that is simultaneously PII, PHI, and PCI data. 

The average healthcare data breach costs $7.42 million (IBM, 2025), HIPAA penalties reach $2.19 million per violation category (HHS, 2026), and PCI non-compliance fines escalate to $100,000 per month (PCI DSS Guide). 

Understanding the differences –  and where they converge – is the first step toward building a unified data protection strategy.

What is PII (Personally Identifiable Information)?

Personally identifiable information is any data that can be used to identify or trace an individual, either alone or when combined with other information. 

The National Institute of Standards and Technology (NIST) SP 800-122 defines PII broadly, and that breadth makes it the foundational category for data classification across all regulatory frameworks.

PII falls into two groups. Direct identifiers (e.g., Social Security numbers, passport numbers, driver's license numbers, and biometric data) can identify someone on their own. 

Quasi-identifiers (e.g., ZIP code, date of birth, gender, or job title) cannot identify a person individually but can re-identify when combined, a risk documented in research by Carnegie Mellon's Latanya Sweeney showing that 87% of the U.S. population can be uniquely identified by ZIP code, date of birth, and gender alone. Every data discovery program must account for both types.

There is also a critical distinction between sensitive and non-sensitive PII. Sensitive PII (e.g., Social Security numbers, financial account numbers, medical records, biometric identifiers) requires encryption or equivalent protection because exposure causes direct harm. 

Non-sensitive PII (e.g., name, email address, phone number) is often publicly available but still regulated under laws like GDPR and CCPA. 

The table below breaks out the key examples by category, and a strong data security management program treats both tiers with appropriate controls.

Category Examples Identifiability
Direct Identifiers Social Security number, passport number, driver's license number, biometric data (fingerprints, retina scans) Identifies an individual alone
Quasi-Identifiers ZIP code, date of birth, gender, job title, ethnicity Re-identifies when combined with other data points
Sensitive PII Financial account numbers, medical record numbers, criminal history, immigration status Requires encryption or tokenization — exposure causes direct harm
Non-Sensitive PII Full name, email address, phone number, mailing address Publicly obtainable but still regulated under GDPR, CCPA, and state privacy laws

PII is governed by a patchwork of regulations. 

  • In the EU, GDPR defines "personal data" even more broadly than NIST defines PII, covering any information relating to an identified or identifiable person.

  • In California, the CCPA/CPRA applies to businesses collecting California residents' personal information.

  • Canada's PIPEDA covers PII in commercial contexts. 

Across all frameworks, the obligation is the same: identify it, classify it, and protect it with controls proportional to its sensitivity – and automated data discovery and classification is the only reliable method at enterprise scale.

What is PHI (Protected Health Information)?

Protected health information is personally identifiable information that is linked to a healthcare service, diagnosis, treatment, or payment record and is held by a HIPAA-covered entity or business associate. 

The HIPAA Privacy Rule (45 CFR § 164.501) defines PHI as individually identifiable health information relating to past, present, or future physical or mental health conditions, healthcare provision, or healthcare payment. 

The key distinction: all PHI contains PII, but PII only becomes PHI when linked to a healthcare context under HIPAA jurisdiction. A thorough data classification process must flag this overlap explicitly.

HIPAA's Safe Harbor de-identification method (45 CFR § 164.514) lists 18 specific identifiers that must be removed or de-identified to strip PHI status from a record. 

Those 18 HIPAA identifiers are: 

  • Names
  • Geographic data smaller than a state
  • Dates (except year) related to an individual
  • Phone numbers; fax numbers
  • Email addresses
  • Social Security numbers
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers
  • Certificate or license numbers
  • Vehicle identifiers and serial numbers
  • Device identifiers and serial numbers
  • Web URLs
  • IP addresses
  • Biometric identifiers
  • Full-face photographs
  • And any other unique identifying number, characteristic, or code. 

Removing all 18 is the threshold for de-identification under Safe Harbor – anything short of that, and the record remains PHI.

There is a further distinction that matters for your technical controls. 

Electronic protected health information (ePHI) is any PHI that is created, stored, transmitted, or received in electronic form. 

While HIPAA's Privacy Rule covers all PHI formats (e.g., paper, oral, and electronic), the HIPAA Security Rule applies specifically to ePHI and mandates technical safeguards, including encryption, access controls, audit logging, and integrity verification. 

If your organization stores health records digitally – and in 2026, virtually every covered entity does – the Security Rule's ePHI requirements are your operative compliance standard, and data access controls must be configured to enforce them.

What is PCI Data?

PCI data is cardholder data (CHD) and sensitive authentication data (SAD) as defined by the PCI Security Standards Council (PCI SSC)

Any entity that stores, processes, or transmits cardholder data (e.g., merchants, payment processors, acquirers, issuers, and service providers) must comply with the PCI Data Security Standard (PCI DSS)

Understanding the distinction between CHD and SAD is critical because the storage rules differ, and PCI tokenization strategies must account for both.

Cardholder data includes the Primary Account Number (PAN), cardholder name, expiration date, and service code. 

These elements can be stored post-authorization if they are protected in accordance with PCI DSS requirements, but the PAN is the defining element. 

If a record includes a PAN, PCI DSS applies. Sensitive authentication data (e.g., full magnetic stripe data, CAV2/CVC2/CVV2/CID codes, and PINs or PIN blocks) must never be stored after authorization, regardless of the encryption method applied. 

The table below maps these categories to their storage and protection rules.

Data Element Type Storable Post-Auth? Protection Required
Primary Account Number (PAN) Cardholder Data Yes, if protected Render unreadable (tokenization, encryption, hashing, truncation)
Cardholder Name Cardholder Data Yes, if protected Access controls, encryption
Expiration Date Cardholder Data Yes, if protected Access controls
Service Code Cardholder Data Yes, if protected Access controls
Full Magnetic Stripe Data Sensitive Auth Data Never Must not be stored — tokenize before authorization
CAV2/CVC2/CVV2/CID Sensitive Auth Data Never Must not be stored
PIN / PIN Block Sensitive Auth Data Never Must not be stored

PCI DSS 4.0 is now fully enforceable. 

As of March 31, 2025, all 64 future-dated requirements that were previously listed as best practices became mandatory. 

The PCI DSS 12 requirements – i.e., the compliance backbone – are organized into six objectives: 

  • Build and maintain a secure network
  • Protect cardholder data
  • Maintain a vulnerability management program
  • Implement strong access controls
  • Regularly monitor and test networks, and maintain an information security policy. 

For Level 1 merchants processing over six million transactions annually, compliance requires a full Report on Compliance (RoC) – and understanding your SAQ type is the first step for everyone else.

PII vs PHI vs PCI: Key Differences at a Glance

The three data types share a common thread – i.e., they all contain information about people – but they diverge in scope, regulatory authority, and the consequences of mishandling them. 

Your data protection and data classification strategy depends on understanding these distinctions. 

The table below is the core reference for your data security and compliance teams.

Dimension PII PHI PCI Data
Definition Any data identifying or tracing an individual PII linked to healthcare, held by HIPAA entity Cardholder data + sensitive auth data under PCI DSS
Governing Law GDPR, CCPA/CPRA, state breach laws, PIPEDA HIPAA (Privacy, Security, Breach Notification Rules) PCI DSS 4.0 (industry standard, not government law)
Scope Broadest — covers all sectors and contexts Healthcare: covered entities + business associates Payment: any entity storing/processing/transmitting CHD
Key Examples SSN, name, email, biometrics, IP address Medical records, lab results, insurance claims, prescriptions PAN, cardholder name, expiration date, CVV, magnetic stripe
Breach Notification State laws (47 states + DC), GDPR 72-hour rule 60 days to HHS for breaches affecting 500+; annual log for smaller breaches Card brand rules: immediate disclosure to acquirer, forensic investigation
Max Penalty GDPR: €20M or 4% global revenue; CCPA: $7,500/violation $2,190,294 per violation (Tier 4, 2026) $100K/month + card brand penalties + breach costs
Protection Methods Encryption, tokenization, masking, access controls Same as PII + HIPAA-specific safeguards (ePHI controls) Tokenization (preferred for scope reduction), encryption, truncation, hashing
Relationship Umbrella category Subset of PII (with HIPAA layer) Subset of PII (with PCI DSS layer)

The critical takeaway: PII is the umbrella. PHI and PCI data are specialized subsets that inherit all PII obligations and add their own regulatory layer on top. 

If you protect PII properly, you have a data protection foundation – but you still need the additional controls specific to HIPAA and PCI DSS, including data loss prevention (DLP) capabilities that detect and block unauthorized movement of each data type. 

The data security platforms guide breaks down how unified platforms address all three.

Where Do PII, PHI, and PCI Data Overlap?

In theory, PII, PHI, and PCI data sit in distinct regulatory buckets. In practice, your databases, CRMs, and data lakes store them together – often in the same record. 

That convergence is where most compliance failures originate, and it is why data discovery and classification must precede any protection strategy.

Consider healthcare billing. 

A patient checks into a hospital, provides their name, date of birth, and insurance ID (PII), receives a diagnosis and treatment plan (PHI), and pays their co-pay with a credit card (PCI data). 

That single transaction creates a record governed by HIPAA, PCI DSS, and state breach notification laws simultaneously. If that record is compromised, you face three parallel compliance response obligations – not one.

Insurance claims present the same convergence. 

A health insurer processing a claim handles member PII (name, SSN, address), diagnosis and procedure codes (PHI), and payment routing information that may include cardholder data when reimbursements go to patient credit cards. 

E-commerce adds another layer: an online pharmacy collecting customer PII and PCI data may also collect health-related purchase data that, depending on jurisdiction and business associate agreements, triggers PHI classification obligations.

The operational problem is that most enterprises treat these as separate compliance workstreams – i.e., separate tools, separate audits, separate budgets. 

According to IBM's 2025 Cost of a Data Breach Report, the global average breach cost reached $4.44 million, with organizations using fragmented security tooling paying significantly more than those with unified platforms. 

A single data security platform that classifies and protects all three data types through one control layer eliminates that fragmentation.

Regulatory Frameworks Governing Each Data Type

Every data type in this comparison carries distinct penalties, and the numbers have increased in 2026. Your compliance team needs current figures – i.e., not last year's – when calculating risk exposure and justifying data protection investments.

HIPAA (Governing PHI)

The U.S. Department of Health and Human Services (HHS) updated HIPAA penalty amounts effective January 28, 2026, applying a cost-of-living adjustment multiplier of 1.02598. 

The four-tier penalty structure, as reported by HIPAA Journal, now stands at: 

  • Tier 1 (lack of knowledge) — $145 to $73,011 per violation
  • Tier 2 (reasonable cause) — $1,461 to $73,011
  • Tier 3 (willful neglect, corrected within 30 days) — $14,602 to $73,011
  • Tier 4 (willful neglect, not corrected) — $73,011 to $2,190,294 per violation. 

Annual caps range from $25,000 (Tier 1) to $1.5 million (Tier 4). De-identifying PHI through tokenization reduces your regulatory surface area because de-identified records fall outside HIPAA's definition of PHI.

PCI DSS 4.0 (Governing PCI Data)

PCI DSS is an industry standard, not a government regulation, but the financial consequences are severe. 

According to PCI DSS Guide, acquiring banks impose escalating monthly fines: $5,000–$10,000 per month for the first three months, $25,000–$50,000 for months four through six, and up to $100,000 per month beyond six months. 

Post-breach costs add $50,000 to over $500,000 for forensic investigation, remediation, and card reissuance. 

ith all 64 future-dated requirements now mandatory since March 31, 2025, the single most effective step to reduce your PCI audit scope is tokenizing cardholder data so PANs never enter your environment.

GDPR (Governing PII in Relation to the EU)

Article 83 of the GDPR allows supervisory authorities to impose fines of up to €20 million or 4% of global annual revenue, whichever is higher, for the most serious infringements. 

GDPR defines "personal data" more broadly than most PII definitions, covering any information relating to an identified or identifiable natural person – including online identifiers, location data, and cookie IDs. 

Pseudonymization, including tokenization, is explicitly recognized under GDPR Article 4(5) as a risk-reduction measure, though it does not entirely exempt data from the GDPR's scope.

CCPA/CPRA (Governing PII in Relation to California)

The California Privacy Rights Act (CPRA), which amended the original CCPA, imposes penalties of $2,500 per unintentional violation and $7,500 per intentional violation. It also grants consumers a private right of action for data breaches: $100 to $750 per consumer per incident. 

For organizations handling California residents' data alongside cardholder data and health records, the compliance obligation stacks across all applicable frameworks.

GLBA (Governing NPI in Relation to the Financial Sector)

The Gramm-Leach-Bliley Act (GLBA) governs non-public personal information (NPI) held by financial institutions – a category that overlaps heavily with PII but carries its own penalties: up to $100,000 per violation for institutions and $10,000 per violation for individuals. 

NPI includes account numbers, transaction history, and credit data. 

Financial institutions subject to GLBA, CCPA, and PCI DSS simultaneously face the densest regulatory overlap in any sector, and a unified data security approach is the only way to manage it without multiplying audit costs.

Regulation Governs Max Penalty Per Violation Annual Cap Enforcement Body
HIPAA PHI $2,190,294 $1.5M per tier HHS Office for Civil Rights
PCI DSS 4.0 PCI Data $100K/month No cap — escalates Card brands via acquiring banks
GDPR PII (EU) €20M or 4% revenue No cap National supervisory authorities
CCPA/CPRA PII (California) $7,500 No cap California Privacy Protection Agency
GLBA NPI (Financial) $100,000 No cap FTC, federal banking regulators

How to Protect PII, PHI, and PCI Data

Protecting all three data types requires a layered data protection strategy – i.e., encompassing data classification, tokenization vs encryption decisions, data masking, and data loss prevention controls. 

The five steps below apply across PII, PHI, and PCI data, and each step builds on the one before it. A data security best practices framework starts here.

Step 1: Discover and Classify Your Sensitive Data

You cannot protect data you have not identified. 

Automated data discovery tools scan databases, file shares, cloud storage, SaaS applications, and mainframe environments to locate PII, PHI, and PCI data wherever it resides — including in systems your team may have forgotten about.

Classification engines then categorize each discovered record using pattern recognition, AI models, and custom rulesets. 

The output is a map: here is where your PII lives, here is where your PHI lives, here is where your PCI data lives, and here is where they overlap. 

According to IBM's 2025 report, breaches involving shadow data — data that organizations did not know existed — cost 16% more than average breaches.

Most enterprises also have dark data — information stored in legacy systems, archived databases, or decommissioned applications that were never inventoried. 

A discovery tool that covers mainframe environments, cloud, and on-premises stores is the only way to achieve complete visibility.

Step 2: Apply Tokenization to Eliminate Compliance Scope

Tokenization replaces sensitive data elements — PANs, Social Security numbers, medical record numbers — with non-reversible tokens that retain format but carry no exploitable value. 

The original data is stored in an isolated token vault, and the tokens flowing through your systems are meaningless to an attacker.

For PCI data, the scope implications are decisive. 

The PCI Security Standards Council treats an encrypted PAN as equivalent to cleartext for scoping purposes because encryption is reversible — if you hold the key, you can produce the original PAN. 

Tokenized PANs, by contrast, exit PCI DSS scope entirely when the tokenization system meets PCI SSC requirements. That scope reduction translates directly to simpler SAQ questionnaires, smaller audit surface, and lower compliance costs.

For PHI, tokenization enables HIPAA Safe Harbor de-identification

When the 18 HIPAA identifiers are replaced with tokens, the resulting record no longer qualifies as PHI under HIPAA — reducing your regulatory obligations for that data. 

For PII broadly, tokenized records reduce the impact of a breach to near zero because exposed tokens cannot be reversed to recover the original data. A single tokenization platform can protect PII, PHI, and PCI data simultaneously across the same infrastructure.

Step 3: Encrypt Data at Rest and in Transit

Encryption is the baseline protection layer across every framework. AES-256 for data at rest and TLS 1.2+ for data in transit are the minimum standards expected by HIPAA, PCI DSS, and GDPR.

There is a critical nuance: encryption protects data from unauthorized access, but it does not reduce PCI DSS compliance scope. 

As the PCI SSC's tokenization guidance clarifies, encrypted cardholder data remains in scope because the encryption is reversible. 

For HIPAA, the Security Rule lists encryption as an "addressable" safeguard for ePHI — not technically mandatory, but the HHS Office for Civil Rights expects it, and failing to encrypt without a documented alternative explanation is a finding in most audits. 

The takeaway: encryption is necessary but not sufficient. The tokenization vs. encryption question is not either/or — pair tokenization with encryption and masking for maximum scope reduction and data protection.

Step 4: Enforce Access Controls and Monitoring

Data access control enforces who can see and interact with sensitive data. 

Role-based access control (RBAC) and least-privilege principles are required across all three frameworks: HIPAA's minimum necessary standard, PCI DSS Requirement 7 (restrict access by business need-to-know), and GDPR's data minimization principle.

Audit logging — who accessed what, when, and from where — is mandatory under HIPAA's Security Rule (audit controls), PCI DSS Requirement 10 (log and monitor all access to cardholder data), and GDPR's accountability principle. 

Real-time monitoring and anomaly detection are the operational layer that turns static logs into actionable threat intelligence — the foundation of any data loss prevention program. 

Organizations that deploy data masking alongside access controls add a further safeguard: even authorized users see only the data elements they need, with sensitive fields dynamically masked based on role and context.

Step 5: Build and Test an Incident Response Plan

A breach response plan is required by HIPAA (notification within 60 days to HHS for breaches affecting 500+ individuals), PCI DSS (immediate notification to the acquiring bank and card brands with forensic investigation), and GDPR (72-hour notification to the supervisory authority). 

Without a tested plan, your organization's response will be reactive, slow, and expensive — and the leading data breach risks compound when multiple regulatory clocks start simultaneously.

According to IBM's 2025 Cost of a Data Breach Report, organizations that regularly conduct tabletop exercises reduce their average breach cost by $232,000. 

Healthcare breaches take the longest to identify and contain — 279 days on average, five weeks longer than the global average. 

A documented, rehearsed incident response plan is the difference between a contained event and a catastrophic data breach that triggers all three regulatory response tracks simultaneously.

Protection Step PII PHI PCI Data
Discover & Classify Required — map all repositories Required — flag ePHI specifically Required — locate all PANs
Tokenize Reduces breach impact to near zero Enables HIPAA Safe Harbor de-identification Eliminates PCI DSS scope for tokenized PANs
Encrypt AES-256 at rest, TLS 1.2+ in transit HIPAA addressable safeguard for ePHI Required but does NOT reduce scope
Access Control Least privilege per regulation Minimum necessary standard (HIPAA) Requirement 7: restrict by need-to-know
Monitor & Audit State breach law triggers HIPAA audit controls Requirement 10: log all CHD access
Incident Response State notification (varies) 60-day HHS notification Immediate acquirer notification + forensics

What Most Enterprises Miss: Cross-Regulation Tokenization

Most organizations approach PII, PHI, and PCI compliance as separate workstreams. 

They purchase separate tools, engage separate auditors, and maintain separate policy libraries for each regulatory domain. That fragmentation is the primary cost multiplier — and it is the pattern that a unified data security platform eliminates.

A single tokenization layer applied at the data level — across mainframes, cloud databases, SaaS applications, and hybrid environments — can satisfy PCI DSS scope reduction, HIPAA Safe Harbor de-identification, and GDPR pseudonymization requirements simultaneously. 

According to IBM's 2025 Cost of a Data Breach Report, the global average breach cost is $4.44 million, and healthcare leads all industries at $7.42 million. Organizations that consolidate their security tooling spend less per breach and detect breaches faster than those running fragmented, siloed tools.

Healthcare organizations processing payments are the clearest example. 

A single patient record containing PII (name, address, SSN), PHI (diagnosis codes, treatment history), and PCI data (payment card for co-pays) passes through a single tokenization engine rather than three separate compliance tools. 

The result: fewer tools, fewer audits, faster compliance cycles, lower total cost — and a data protection architecture that scales without tripling your vendor roster every time a new regulation takes effect.

How DataStealth Protects PII, PHI, and PCI Data

DataStealth is a data security platform purpose-built for enterprises managing overlapping sensitive data obligations across hybrid environments.

  • Automated data discovery and classification identifies PII, PHI, and PCI data across mainframes, cloud databases, SaaS applications, and hybrid environments — including the dark data your team does not know exists.

  • Format-preserving tokenization replaces sensitive data elements with non-reversible tokens, eliminating PCI DSS scope and enabling HIPAA Safe Harbor de-identification from a single control layer.

  • Agentless deployment protects mainframes and production databases without installing software on the host — reducing implementation risk and accelerating time-to-compliance.
  • Cross-regulation compliance from one platform satisfies PCI DSS, HIPAA, GDPR, and CCPA requirements through a single data security architecture — replacing the fragmented tooling that inflates audit costs and breach exposure.

Request a demo →

Frequently Asked Questions: PII vs PHI vs PCI

About the Author:

Bilal Khan

Bilal is the Content Strategist at DataStealth. He's a recognized defence and security analyst who's researching the growing importance of cybersecurity and data protection in enterprise-sized organizations.