Personally identifiable information (PII) is any data that can be used to distinguish or trace a specific individual's identity – either on its own or when combined with other linked or linkable information.
The National Institute of Standards and Technology (NIST) SP 800-122 defines PII as including direct identifiers like names and Social Security numbers, as well as indirect identifiers like date of birth, ZIP code, and employment records.
PII data protection is mandated under GDPR, HIPAA, CCPA, FERPA, and GLBA – making personally identifiable information the most broadly regulated data type across global compliance frameworks.
When PII data is exposed, it enables identity theft, financial fraud, and regulatory penalties that can reach 4% of global annual revenue.
Protecting PII requires data discovery, data classification, and data-centric controls like tokenization that render the data valueless even if it is exfiltrated.
The following is a comprehensive guide to PII meaning, PII examples, regulatory frameworks, and enterprise protection methods.
PII stands for personally identifiable information. The term refers to any data that can identify, contact, or locate a specific individual.
The canonical definition comes from NIST Special Publication 800-122.
NIST defines personally identifiable information as "any information about an individual maintained by an agency, including: (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to a specific individual."
This two-part PII definition is important. The first category covers direct identifiers – data that points to a person without additional context. The second covers indirect identifiers – PII data that, when combined with other information, can reveal someone's identity.
The meaning of PII has expanded as digital data collection has grown. Information that once seemed harmless – an IP address, a device identifier, a browsing pattern – can now serve as PII when paired with other data points.
Understanding what qualifies as personally identifiable information is the first step in building a defensible data security programme.
PII spans a wide range of data types. The critical distinction is between direct identifiers – data that can identify someone immediately – and indirect identifiers that require combination with other data.
Direct identifiers can single out an individual without additional context. These are the most sensitive forms of PII and carry the highest regulatory and breach risk.
Common direct identifiers include full legal name, Social Security number (SSN), passport number, driver's licence number, biometric records (fingerprints, facial recognition, retinal scans), email address, phone number, and financial account numbers. Each of these points to a specific person and, if exposed, can enable identity theft or financial fraud.
Government-issued identification numbers carry the highest risk. A compromised SSN opens pathways to fraudulent credit applications, tax fraud, and account takeover – damage that can persist for years.
Indirect identifiers cannot identify someone on their own. However, when combined, they narrow the field rapidly.
Research by Latanya Sweeney at Harvard demonstrated that 87% of US citizens can be uniquely identified using only three data points – gender, ZIP code, and date of birth. None of these is a direct identifier individually. Together, they function as PII.
Other indirect identifiers include IP address, device ID, employment information, education records, and geolocation data. In the context of data classification, indirect identifiers require the same governance as direct identifiers when they exist in combinations that enable re-identification.
It is equally important to understand what is not PII data. Data about a person's streaming habits, for instance, does not constitute personally identifiable information on its own – it would be difficult to identify someone based solely on what they watched.
PII data only refers to information that points to a particular individual, such as the data you provide when verifying your identity with a financial institution. This distinction matters for data classification – it determines which data stores require the strongest protection controls and which can operate under standard governance.
Not all personally identifiable information carries the same level of risk.
Sensitive PII refers to data that, if lost or disclosed, could cause substantial harm, embarrassment, or inconvenience to an individual. PII examples in this category include Social Security numbers, financial account numbers, biometric data, medical records, and government-issued IDs.
Sensitive PII often falls under stricter regulatory requirements – HIPAA for health data, PCI DSS for payment card data, GLBA for financial data – and typically requires encryption, tokenization, or strict access controls.
Non-sensitive PII includes data that is generally available in public records or that cannot directly identify someone – names, email addresses, phone numbers, ZIP codes, dates of birth. This data still requires protection because it becomes sensitive when combined with other data points.
| Dimension | Sensitive PII | Non-Sensitive PII |
|---|---|---|
| Examples | SSN, biometrics, financial accounts, medical records | Name, email, phone, ZIP code, date of birth |
| Risk if Exposed | Identity theft, financial fraud, severe harm | Profile building, phishing, social engineering |
| Regulatory Triggers | HIPAA, PCI DSS, GLBA, state breach notification laws | GDPR, CCPA (when linked to identifiable individual) |
| Protection Required | Encryption, tokenization, strict access controls | Standard access controls, monitoring, anonymisation |
The distinction matters for data security best practices. Organizations must classify PII by sensitivity level to apply proportionate controls – over-protecting non-sensitive data wastes resources; under-protecting sensitive data creates breach liability.
The terms "PII" and "personal data" are often used interchangeably. They are not the same, and the distinction has real compliance implications.
PII is a US-centric concept rooted in NIST SP 800-122 and federal privacy law. The PII meaning refers specifically to data that can identify, contact, or trace a particular individual. The scope is relatively narrow – it requires a clear link to identity.
"Personal data" is the term used under the EU's General Data Protection Regulation (GDPR). GDPR defines personal data as "any information relating to an identified or identifiable natural person." This definition is broader than the PII definition. It includes personally identifiable information but extends to online identifiers, location data, cookie IDs, behavioural patterns, and any data that can be linked to a person through reasonable effort.
Given that distinction, cookie data and advertising identifiers qualify as "personal data" under GDPR but may not qualify as PII under US frameworks. Similarly, an IP address is personal data under GDPR but has historically been treated as indirect PII data in the US – requiring combination with other data points before it identifies an individual.
The practical implication is straightforward. All PII qualifies as personal data under GDPR. Not all personal data qualifies as PII under US frameworks.
Organizations operating globally must meet the broader GDPR standard, which means treating a wider range of data points as regulated information requiring data security management controls. For enterprises with PII data in both US and EU jurisdictions, aligning to the GDPR's broader "personal data" definition as the baseline simplifies PII compliance across frameworks.
PII, protected health information (PHI), and Payment Card Industry (PCI) data are three categories of sensitive data that drive the majority of enterprise compliance obligations. They overlap – and that overlap multiplies risk.
PHI is a regulated subset of PII. It consists of personally identifiable data linked to healthcare services and governed by HIPAA. A patient's name in a hospital record is PHI; the same name on a marketing newsletter is PII but not PHI.
PCI data refers to cardholder data and sensitive authentication data protected under PCI DSS. Credit card numbers, CVVs, and cardholder names fall into this category.
A patient paying a hospital co-pay with a credit card creates a record that is simultaneously PII, PHI, and PCI data. This overlap is precisely where compliance risk compounds.
| Dimension | PII | PHI | PCI Data |
|---|---|---|---|
| Definition | Data identifying an individual | PII linked to healthcare services | Cardholder and authentication data |
| Governing Regulation | NIST, GDPR, CCPA, state laws | HIPAA | PCI DSS |
| Key Examples | Name, SSN, email, biometrics | Medical records, prescriptions, insurance IDs | Card number, CVV, cardholder name |
| Maximum Penalty | GDPR: 4% global revenue; CCPA: $7,500/violation | $2.13M per violation category per year | Fines from card brands + potential loss of processing |
| Overlap | Broadest category | Subset of PII | May overlap with PII (cardholder name) |
For a deeper analysis of how these categories intersect and how to build a unified protection strategy across all three, see PII vs PHI vs PCI.
PII exposure creates three categories of damage – financial, regulatory, and reputational. Each compounds the others.
The financial impact is quantified. IBM's Cost of a Data Breach Report 2025 puts the global average data breach cost at $4.88 million, with the US average reaching $10.22 million. Data breaches involving personally identifiable information – names, Social Security numbers, financial records – drive the highest remediation costs because they trigger notification requirements, credit monitoring, and legal exposure.
Moreover, 72% of data breaches in 2025 involved data stored in cloud environments. PII data scattered across multi-cloud infrastructure, SaaS applications, and legacy databases creates an attack surface that expands with every copy and migration.
Regulatory penalties add a second layer. GDPR fines can reach 4% of global annual revenue or EUR 20 million, whichever is greater. HIPAA penalties reach $2.13 million per violation category per year. CCPA imposes $7,500 per intentional violation. These penalties are not theoretical – enforcement actions are increasing across all frameworks.
Reputational damage is the hardest to quantify and the slowest to recover from. Customers and partners lose trust when PII data is compromised. That erosion of trust translates into lost revenue, higher customer acquisition costs, and diminished credibility in data protection in procurement conversations.
The core problem is that most organizations do not know where all their PII resides. Data sprawl across multi-cloud environments, SaaS applications, and legacy systems creates blind spots. Shadow data – forgotten copies, unsanctioned stores, orphaned backups – contains PII examples of every type, from SSNs to biometric records, sitting outside security controls.
In this vein, a data breach involving PII is not merely an IT incident. It becomes a legal, financial, and operational event that can persist for years. You cannot protect personally identifiable information you have not found – which is why data discovery and data classification are the non-negotiable first steps in any PII protection programme.
Personally identifiable information protection is governed by a patchwork of overlapping regulations. The specific requirements depend on jurisdiction, industry, and the type of PII data involved.
NIST SP 800-122 provides the foundational US federal definition and handling guidelines for personally identifiable information. It applies to federal agencies and their contractors but serves as the reference standard for the broader industry. NIST categorizes PII data by confidentiality impact level – low, moderate, or high – and prescribes corresponding safeguards for each tier.
The GDPR governs personal data (including PII) for EU residents. It mandates lawful basis for processing, data subject rights (access, erasure, portability), data protection impact assessments, and data breach notification within 72 hours. GDPR applies to any organization processing EU personal data, regardless of where the organization is based.
HIPAA governs PHI – the subset of personally identifiable information linked to healthcare. It requires administrative, physical, and technical safeguards under the Security Rule. The Privacy Rule restricts how PHI can be used and disclosed. HIPAA mandates data breach notification to affected individuals, HHS, and – for breaches affecting 500+ individuals – the media.
The CCPA/CPRA gives California residents the right to know what personal information is collected, to delete it, to correct it, and to opt out of its sale or sharing. It applies to businesses meeting specific revenue or data volume thresholds. The CPRA amendment strengthened CCPA by creating the California Privacy Protection Agency and adding the right to limit the use of sensitive personal information.
FERPA prohibits the release of personally identifiable information from education records without consent. It applies to all schools that receive federal funding. PII examples under FERPA include student names, addresses, Social Security numbers, dates of birth, and parent identification information.
The GLBA requires financial institutions to disclose their information-sharing practices and to protect customers' nonpublic personally identifiable information. The Safeguards Rule mandates a written information security plan to protect PII in financial services contexts.
The Privacy Act of 1974 governs PII held by US federal agencies. It establishes fair information practices for collection, maintenance, and dissemination of personally identifiable information, and gives individuals the right to access and amend their records.
Meeting PII compliance across multiple frameworks requires a unified approach to data security management – one that discovers, classifies, and protects personally identifiable information consistently regardless of which regulation applies.
Organizations that handle PII data across jurisdictions face the highest complexity, as the same data may be subject to GDPR, CCPA, and industry-specific frameworks simultaneously.
Protecting PII requires four interlocking capabilities. Discovery tells you where PII lives. Classification tells you what type it is. Protection renders it valueless if stolen. Monitoring catches changes before they become breaches.
The foundation of PII protection is knowing where personally identifiable information exists. This is harder than it sounds. PII data lives in obvious places – production databases, CRM systems, HR platforms – and in less obvious ones.
Automated data discovery scans databases, file shares, cloud storage, SaaS applications, and legacy systems to locate PII – including shadow data and dark data that your security team does not know about. Test environments, analytics sandboxes, log files, email archives, and third-party integrations frequently contain PII data that was never intended to persist there.
Once discovered, data classification categorizes each PII instance by type – direct identifier, indirect identifier – and sensitivity level – sensitive PII, non-sensitive PII. This data classification determines which regulatory frameworks apply and which protection controls are appropriate.
Moreover, discovery and data classification must run continuously. PII data does not sit still – it replicates, migrates, and sprawls across environments every time it is shared or copied. A one-time scan captures a snapshot; continuous discovery captures reality.
Organizations that lack automated PII discovery operate with an incomplete inventory. Incomplete inventories mean unprotected PII data in environments you do not govern – and that is where data breaches happen.
Tokenization replaces PII with non-sensitive substitute values – tokens – that retain no mathematical relationship to the original data. If tokenized data is exfiltrated, attackers get worthless tokens. The actual PII never leaves the protected environment.
This is the critical distinction between tokenization and encryption. Encryption renders PII unreadable but reversible with a key – the sensitive data still exists in encrypted form. Tokenization removes the sensitive data entirely and replaces it with a non-sensitive placeholder. There is nothing to decrypt because the PII is not there.
For organizations handling PII at scale, tokenization also reduces compliance scope. Systems that store only tokens – not actual PII – can be de-scoped from PCI DSS and other regulatory audits, reducing cost and complexity.
Encryption protects PII at rest and in transit using cryptographic algorithms. It is a regulatory requirement under HIPAA, PCI DSS, and GDPR for many categories of sensitive data.
Data masking protects PII in non-production environments – testing, development, analytics – by replacing real values with realistic but fictional substitutes. This prevents PII exposure in environments that typically have weaker security controls.
Both encryption and masking are necessary components of a layered data protection strategy, but neither removes PII from the environment the way tokenization does.
Least-privilege access ensures that users and applications can only reach the PII they need for their specific function. Role-based access controls (RBAC), multi-factor authentication (MFA), and just-in-time privileged access limit the blast radius if credentials are compromised.
Given that non-human identities – service accounts, API keys, machine tokens – now outnumber human identities by significant margins in most enterprises, access controls for PII data must extend to application-layer access, not just user-layer access. An over-permissioned API with access to a production database containing PII examples across millions of records is a higher risk than a single employee's credentials.
Continuous monitoring detects anomalous access patterns – unusual query volumes, off-hours access, bulk data exports – and triggers alerts before personally identifiable information is exfiltrated. A zero trust model treats every access request as potentially hostile, verifying identity, device posture, and context before granting access to PII data.
Thus, PII protection is not a single control. It is a layered architecture in which discovery identifies the data, data classification determines its sensitivity, tokenization and encryption protect it, access controls restrict who can access it, and monitoring detects violations in real time.
Personally identifiable information is the most broadly regulated and most frequently targeted data type in enterprise environments. The gap between where PII actually lives and where security teams think it lives is where breaches happen.
DataStealth closes that gap. The platform discovers and classifies PII, PHI, and PCI data across cloud, SaaS, on-premises, and legacy environments – without agents and without code changes. Agentless tokenization replaces sensitive PII with format-preserving tokens that are valueless to attackers, reducing breach impact and compliance scope simultaneously.
Real-time policy enforcement ensures PII protection in motion and at rest, delivering the data-centric security posture that GDPR, HIPAA, PCI DSS, and CCPA demand. Audit-ready compliance reporting is built in.
PII stands for personally identifiable information – any data that can identify, contact, or trace a specific individual, either on its own or when combined with other information. Common examples include names, Social Security numbers, email addresses, and biometric records. The NIST SP 800-122 standard provides the canonical US federal definition. Protecting PII is a baseline requirement under data security frameworks including GDPR, HIPAA, and CCPA.
A Social Security number is one of the most common examples of PII. Other examples include full legal name, passport number, driver's licence number, email address, phone number, biometric data, and financial account numbers.
These are direct identifiers – each can identify a person without additional context. Indirect identifiers like ZIP code, date of birth, and IP address also qualify as PII when combined. Data classification tools distinguish between direct and indirect PII automatically.
Yes. A phone number is PII because it can be used to identify or contact a specific individual. It is classified as a direct identifier under NIST SP 800-122. Organizations that collect phone numbers must apply appropriate data protection controls and comply with applicable privacy regulations.
Yes. An email address – particularly one that contains a person's name (e.g., firstname.lastname@company.com) – is PII because it can directly identify an individual. Even generic email addresses can function as PII when linked to other data points. Tokenization can protect email addresses in production systems while preserving format for application compatibility.
PII is any data that identifies an individual. PHI is a regulated subset of PII – specifically, personally identifiable data linked to healthcare services and governed by HIPAA. All PHI contains PII, but not all PII is PHI. For a detailed comparison including PCI data, see PII vs PHI vs PCI.
In cyber security, PII refers to the category of data that adversaries target for identity theft, financial fraud, and social engineering. Protecting PII in cyber security requires data discovery, classification, access controls, encryption, tokenization, and continuous monitoring. PII is the primary data type involved in regulatory breach notifications and compliance enforcement actions.
PII protection requires a layered approach. Start with automated data discovery and classification to locate and label all PII. Apply tokenization or encryption to protect sensitive PII at rest and in transit. Enforce least-privilege access controls with MFA and RBAC.
Monitor for anomalous access patterns continuously. Implement data loss prevention to prevent unauthorized exfiltration.