What is Data-Centric Security? Model, Methods & Why It Matters

TL;DR

Protects the data itself, not the perimeter around it.
Tokenization removes sensitive values; encryption alone does not.
Breached tokens carry zero value without vault access.
Five processes: discover, classify, protect, monitor, audit.
Reduces PCI DSS audit scope by 70–90% through tokenization.

‍

Data-centric security is an approach that applies protection directly to the data – through tokenization, masking, and encryption – rather than depending on perimeter defenses around the systems that store it.

The model operates across three data states: at rest, in motion, and in use.

Five processes define it: discover, classify, protect, monitor, and audit.

When a breach occurs under a data-centric model, the attacker reaches the database but finds nothing usable – tokens with no mathematical path back to the original values, or masked fields that reveal nothing.

Data-centric security is the enforcement mechanism within Zero Trust and the foundation of what Forrester defines as a Data Security Platform.

‍

What Is Data-Centric Security?

‍

Data-centric security is an approach to cybersecurity that protects the data itself rather than the networks, servers, or applications that house it.

The focus shifts from infrastructure hardening – firewalls, endpoint agents, intrusion detection – to ensuring that sensitive information remains protected throughout its entire lifecycle: creation, storage, use, sharing, archival, and deletion.

Protection applies across three data states.

Data at rest sits in databases, file shares, cloud storage, backups, and mainframe systems.
Data in motion moves through network transfers, API calls, replication flows, and file synchronization.
Data in use is actively processed – queried by applications, displayed to users, consumed by analytics, or fed into AI models.

The approach emerged because perimeter security, while necessary, proved insufficient as the sole line of defense.

According to IBM's 2025 Cost of a Data Breach Report, 51% of data breaches were caused by malicious attacks and 26% by human error.

Perimeter controls address neither insider risk nor credential compromise – the two attack vectors that bypass the boundary entirely.

The U.S. Department of Defense recognized this when it positioned data as the central pillar of its Zero Trust Strategy, stating that all other security pillars exist to protect the data pillar.

Data-centric security takes that principle and makes it operational. Instead of building a stronger perimeter and hoping no one gets through, the model ensures that even when someone does get through, the data itself is worthless to them.

‍

Data-Centric Security vs Perimeter Security

‍

The simplest way to understand the data-centric model is to compare what happens during a breach under each approach. Perimeter security focuses on keeping attackers out.

Data-centric security assumes they will get in and neutralizes what they find.

‍

Scenario	Perimeter Security	Data-Centric Security
Attacker compromises a credential	Full access to all data the credential can reach – in cleartext	Access reaches only tokens or masked values – originals are in an isolated vault
Insider copies data to personal storage	Data leaves the environment in usable form	Data that leaves is tokenized or masked – unusable without vault access
Database is exfiltrated via SQL injection	Millions of cleartext records exposed	Millions of tokens extracted – no mathematical path to original values
Court orders provider to hand over data	Provider produces cleartext records	Provider produces tokens – no usable data without vault access
Encryption key is compromised	All encrypted data is exposed	Tokens have no key to compromise – the breach reveals nothing

‍

Perimeter security is a prerequisite, not a solution. Firewalls, SIEM, endpoint protection – all necessary. But they share a common assumption: that preventing the breach prevents the loss.

IBM's 2025 report found that the average breach lifecycle was 241 days – a nine-year low, but still eight months of exposure. For 241 days, the data sits accessible. If it is cleartext, the damage compounds with every passing day. If it is tokenized, there is no damage to compound.

The data-centric model eliminates the window between breach and discovery as a risk factor because the data was already protected before the breach began.

‍

The Data-Centric Security Model

‍

Five processes define how data-centric security operates. Each builds on the one before it, and none works in isolation.

Discover

‍

Find all sensitive data across the environment – databases, file shares, SaaS applications, cloud storage, mainframe systems, and shadow data stores.

Automated scanning across structured and unstructured sources is the foundation. You cannot protect what you do not know exists, and most enterprises underestimate how widely sensitive data has proliferated across replicated, test, and analytics environments.

‍

Classify

‍

Tag data by sensitivity level, regulatory category (PII, PHI, PAN, intellectual property), and risk context. Classification drives policy – different data types require different protection methods.

Machine learning-based classification scales across petabytes; rule-based approaches alone cannot keep pace with the volume and velocity of modern data creation.

Classification also determines which compliance framework applies: a PAN triggers PCI DSS obligations, a PHI field triggers HIPAA, a personal data record triggers GDPR.

‍

Protect

‍

Apply controls directly to the data: tokenization replaces sensitive values with non-derivable substitutes, masking renders fields unreadable for unauthorized users, and encryption transforms data using cryptographic keys.

Protection is applied at the field level, not the system level. A database column containing PANs gets tokenized while non-sensitive columns remain unchanged.

This is where data-centric security diverges from every other security model – the data itself is transformed, not just the access controls around it.

‍

Monitor

‍

Continuous observation of data access patterns, user behavior, and anomalous activity.

This feeds into Data Detection and Response (DDR) capabilities that detect deviations from established baselines – bulk downloads, access from unfamiliar locations, privilege escalation, or query patterns that do not match a user's historical behavior.

Monitoring is the runtime layer that catches threats the other processes cannot anticipate.

‍

Audit

‍

Maintain evidence-ready logs of who accessed what data, when, through which channel, and for what purpose.

Map audit trails to compliance frameworks: PCI DSS access logging requirements, HIPAA access controls, GDPR data subject request fulfillment, and SOX integrity mandates.

Without audit capabilities, data-centric security cannot demonstrate compliance outcomes. The audit layer is the proof that the other four processes are working.

‍

Core Technologies in a Data-Centric Architecture

‍

Every competitor article on data-centric security defaults to encryption and access controls.

That framing is incomplete. Six protection methods operate within a data-centric architecture, and they serve different purposes depending on data state, regulatory context, and use case.

‍

Method	How It Works	At Rest	In Motion	In Use	Compliance Scope Impact
Vaulted tokenization	Replaces data with random tokens; originals stored in isolated vault	Yes	Yes	Yes (format-preserving)	Strongest – token-only systems fall outside PCI CDE
Dynamic data masking	Renders data unreadable for unauthorized users in real time	Partial	No	Yes	Moderate – masked data may remain in scope
Field-level encryption	Transforms specific fields using cryptographic key	Yes	Yes	No (must decrypt to use)	Limited – encrypted data stays in scope if key is accessible
Format-preserving encryption	Encrypts while retaining original data format	Yes	Yes	Yes	Limited – ciphertext is reversible with key
Access controls	Restricts who can view or edit data based on role and policy	No	No	Yes	Indirect – controls access, not the data
DLP	Monitors and blocks unauthorized data movement	No	Yes	Partial	Indirect – prevents leakage, not exposure

‍

The critical distinction is between methods that transform the data and methods that control access to it.

Tokenization is the only method that physically removes sensitive data from systems – the original values exist nowhere in the production environment.
Encryption transforms data but leaves it in scope because the ciphertext is reversible with the key.
Masking hides data from view but the underlying values may still be accessible to privileged users or database administrators.

The choice of method depends on the data type, the regulatory framework, and the specific use case.

PCI DSS scope reduction requires tokenization – the PCI SSC Tokenization Guidelines explicitly allow token-only systems to be treated as out of scope when they cannot access the vault, keys, or detokenization service.

HIPAA de-identification can use tokenization or masking. GDPR pseudonymization accepts all three methods, but tokenization provides the strongest separation between the identifier and the data subject's record.

Most mature data-centric architectures use multiple methods simultaneously. A PAN column gets tokenized. An email field gets masked in non-production environments. Data in transit gets encrypted. The protection method matches the risk, not the other way around.

‍

Why Data-Centric Security Matters Now

‍

The threat environment is not worsening because attackers are more sophisticated, though they are. It is worsening because the attack surface is larger, data is more distributed, and perimeter controls are less relevant in hybrid and multi-cloud estates.

IBM's 2025 Cost of a Data Breach Report quantifies the damage. The global average breach cost was $4.44 million – and in the United States, it hit a record $10.22 million.

Breaches involving data distributed across multiple environments cost $5.05 million on average because sprawl multiplies the systems an attacker can reach.

Shadow AI has compounded the exposure.

Twenty percent of breaches in 2025 involved unauthorized AI tools, adding $670,000 to average breach costs. 97% of AI-related breaches occurred in organizations without proper access controls, and 63% had no AI governance policies at all.

Shadow data – the duplicated, orphaned copies of sensitive information scattered across analytics pipelines, test environments, and forgotten cloud buckets – creates the same problem at the storage layer.

Organizations using AI and automation extensively in their security operations saved $1.9 million and reduced their breach lifecycle by 80 days.

But the organizations that benefited most were those that had already reduced the value of the data an attacker could reach. If exfiltrated data is tokenized, the breach still happened – but the financial, regulatory, and reputational impact collapses.

Regulatory pressure is accelerating in parallel. PCI DSS 4.0, GDPR enforcement actions, HIPAA audit expansion, and CCPA/CPRA amendments are all converging toward data-level controls. Regulators are no longer satisfied with perimeter certifications.

They want evidence that sensitive data is protected at the field level, regardless of which system stores it or which network surrounds it.

‍

Data-Centric Security and Zero Trust

‍

Zero Trust is an architecture. Data-centric security is its central pillar.

The DoD Zero Trust Strategy identifies seven pillars: User, Device, Network/Environment, Application/Workload, Data, Visibility/Analytics, and Automation/Orchestration.

Every pillar exists to protect the data pillar. User verification, device posture, network segmentation – all of these are mechanisms designed to limit who reaches the data.

But Zero Trust without data-centric security is incomplete: it verifies users and devices, then grants access to cleartext data once verification passes.

Data-centric security closes that gap. It ensures that even verified, authorized users interact with tokenized or masked data unless their specific role and context require the original values.

A customer service agent sees the last four digits of a credit card – not the full PAN. A developer working in a test environment sees referentially intact but tokenized records – not production PHI. An analytics pipeline processes masked records that preserve statistical distributions without exposing individual identities.

The result is that a Zero Trust breach – where an attacker compromises a verified identity – still produces nothing of value. The identity is verified. The access is granted. But the data behind the access is already protected at the source.

‍

Data-Centric Security for Regulated Industries

‍

The argument for data-centric security becomes concrete when mapped to specific compliance outcomes.

‍

PCI DSS: Scope Reduction Through Tokenization

‍

The PCI SSC Tokenization Guidelines state that systems storing only tokens – where tokens have no mathematical relationship to the PAN and cannot access the vault or detokenization service – can be treated as out of scope.

In practice, this means an organization that tokenizes cardholder data before it enters downstream systems can reduce its Cardholder Data Environment from hundreds of systems to a handful.

The practical outcome is measurable: SAQ-D to SAQ-A transitions, 70-90% audit scope reduction, and significant cost savings on annual assessments.

A national transportation enterprise demonstrated this when it used vaulted tokenization at the edge of its payment flow to maintain processor independence.

When its incumbent processor imposed a 400% transaction-fee hike, the company switched vendors with zero disruption, avoided break fees, and cut processing costs by 20% – because it owned the tokens, not the processor.

‍

HIPAA: PHI De-Identification

‍

The HIPAA Safe Harbor method requires removal of 18 identifiers to achieve de-identification.

Tokenization satisfies this by replacing each identifier with a non-derivable token while preserving data utility for analytics, research, and test environments.

Non-production environments – developer sandboxes, QA databases, analytics pipelines – represent the largest unprotected surface for PHI in most healthcare organizations.

Data-centric protection eliminates this exposure without degrading the fidelity of the data these environments require.

‍

GDPR: Pseudonymization Under Article 32

‍

GDPR Article 32 lists pseudonymization as an appropriate technical measure for data protection.

Tokenization implements pseudonymization by separating the identifier from the data subject's record – the token cannot be linked back to the individual without access to the vault.

For organizations managing cross-border data transfers, tokenized data carries reduced risk because the tokens themselves contain no personal data. The transfer moves tokens, not identities.

‍

Data-Centric Security for Mainframe and Legacy Environments

‍

Most data-centric security literature assumes cloud-first architectures.

But enterprises in financial services, insurance, telecommunications, and government still run mission-critical workloads on IBM Z mainframes with DB2 databases.

These systems process billions of transactions annually and store decades of historical customer data – often in cleartext.

Legacy systems present a specific architectural challenge: they cannot accept agents, code changes, or modern API integrations.

The deployment methods that DSPM and DLP depend on – cloud APIs, endpoint agents, network proxies – are architecturally incompatible with mainframe environments.

DataStealth addresses this directly. The platform deploys inline, using native DB2 and TN3270 protocols, tokenizing data in-place without modifying schemas, application logic, or mainframe code.

A nationwide telecommunications company used this approach to secure vast volumes of historical subscriber data stored in cleartext on an IBM DB2 mainframe.

DataStealth deployed agentlessly, tokenized in-place, and created a secure bridge to share legacy data with modern downstream systems.

A global insurer faced a parallel challenge: protecting sensitive data in non-production environments – test databases, analytics pipelines, developer sandboxes – where real policyholder data was unnecessary but still present.

DataStealth deployed agentless, in-place tokenization that preserved data formats and referential integrity while replacing every sensitive value with a non-reversible token.

The insurer eliminated the breach risk across those environments without modifying a single application.

‍

How a Data Security Platform Delivers Data-Centric Security

‍

Data-centric security is a philosophy. A Data Security Platform (DSP) is the operational implementation.

Forrester defines a DSP through the Define, Dissect, and Defend model – which maps directly to the data-centric security framework. Define the data through discovery and classification.

Dissect data activity through monitoring and analytics. Defend data through protection controls: tokenization, masking, encryption.

The Forrester Wave Q1 2025 evaluated DSP vendors on 23 criteria, and Gartner's Market Guide for Data Security Platforms describes DSPs as platforms that "combine data discovery, policy definition, and policy enforcement across data silos."

A DSP is what turns data-centric security from a model into enforcement. It unifies the five processes – discover, classify, protect, monitor, audit – under a single policy engine, applied consistently across every environment the organization operates in.

DataStealth operates as a DSP.

It discovers and classifies sensitive data across mainframe, cloud, SaaS, and hybrid environments, then applies tokenization, masking, or encryption through a single policy engine – without agents, code changes, or application rewrites.

Deployment starts with a DNS change. The gap between DSPM visibility and actual data protection – the enforcement gap that most security architectures leave open – is what a DSP closes.

‍

Frequently Asked Questions: Data-centric Security

How Protected Is Your Sensitive Data?
Get your free, personalized data security risk report with actionable recommendations. Our assessment is 100% confidential and takes less than five minutes to see your results.

Get Started →‍

About the Author:

Bilal Khan

Bilal is the Content Strategist at DataStealth. He's a recognized defence and security analyst who's researching the growing importance of cybersecurity and data protection in enterprise-sized organizations.

What Is Data-Centric Security? The Model That Changes How Enterprises Protect Data

Bilal Khan

April 21, 2026

TL;DR

What Is Data-Centric Security?

Data-Centric Security vs Perimeter Security

The Data-Centric Security Model

Discover

Classify

Protect

Monitor

Audit

Core Technologies in a Data-Centric Architecture

Why Data-Centric Security Matters Now

Data-Centric Security and Zero Trust

Data-Centric Security for Regulated Industries

PCI DSS: Scope Reduction Through Tokenization

HIPAA: PHI De-Identification

GDPR: Pseudonymization Under Article 32

Data-Centric Security for Mainframe and Legacy Environments

How a Data Security Platform Delivers Data-Centric Security

Frequently Asked Questions: Data-centric Security

About the Author:

Bilal Khan

Recent Blogs

GETTING STARTED

QUICKLINKS

RESOURCES

What Is Data-Centric Security? The Model That Changes How Enterprises Protect Data

Bilal Khan

April 21, 2026

TL;DR

What Is Data-Centric Security?

Data-Centric Security vs Perimeter Security

The Data-Centric Security Model

Discover

Classify

Protect

Monitor

Audit

Core Technologies in a Data-Centric Architecture

Why Data-Centric Security Matters Now

Data-Centric Security and Zero Trust

Data-Centric Security for Regulated Industries

PCI DSS: Scope Reduction Through Tokenization

HIPAA: PHI De-Identification

GDPR: Pseudonymization Under Article 32

Data-Centric Security for Mainframe and Legacy Environments

How a Data Security Platform Delivers Data-Centric Security

Frequently Asked Questions: Data-centric Security

What is data-centric security?

Why is data-centric security important?

What are the key components of a data-centric security model?

How does data-centric security differ from perimeter security?

How does data-centric security relate to Zero Trust?

How does data-centric security help with regulatory compliance?

What is a Data Security Platform and how does it relate to data-centric security?

Can data-centric security protect legacy and mainframe environments?

About the Author:

Bilal Khan

Subscribe to Our Newsletter

Recent Blogs

GETTING STARTED

QUICKLINKS

RESOURCES

Subscribe to Our
Newsletter