Data-centric security protects the data itself – not the perimeter. Learn the five-process model, compare tokenization vs encryption, and see how it reduces PCI scope by 70–90%

Data-centric security is an approach that applies protection directly to the data – through tokenization, masking, and encryption – rather than depending on perimeter defenses around the systems that store it.
The model operates across three data states: at rest, in motion, and in use.
Five processes define it: discover, classify, protect, monitor, and audit.
When a breach occurs under a data-centric model, the attacker reaches the database but finds nothing usable – tokens with no mathematical path back to the original values, or masked fields that reveal nothing.
Data-centric security is the enforcement mechanism within Zero Trust and the foundation of what Forrester defines as a Data Security Platform.
Data-centric security is an approach to cybersecurity that protects the data itself rather than the networks, servers, or applications that house it.
The focus shifts from infrastructure hardening – firewalls, endpoint agents, intrusion detection – to ensuring that sensitive information remains protected throughout its entire lifecycle: creation, storage, use, sharing, archival, and deletion.
Protection applies across three data states.
The approach emerged because perimeter security, while necessary, proved insufficient as the sole line of defense.
According to IBM's 2025 Cost of a Data Breach Report, 51% of data breaches were caused by malicious attacks and 26% by human error.
Perimeter controls address neither insider risk nor credential compromise – the two attack vectors that bypass the boundary entirely.
The U.S. Department of Defense recognized this when it positioned data as the central pillar of its Zero Trust Strategy, stating that all other security pillars exist to protect the data pillar.
Data-centric security takes that principle and makes it operational. Instead of building a stronger perimeter and hoping no one gets through, the model ensures that even when someone does get through, the data itself is worthless to them.
The simplest way to understand the data-centric model is to compare what happens during a breach under each approach. Perimeter security focuses on keeping attackers out.
Data-centric security assumes they will get in and neutralizes what they find.
Perimeter security is a prerequisite, not a solution. Firewalls, SIEM, endpoint protection – all necessary. But they share a common assumption: that preventing the breach prevents the loss.
IBM's 2025 report found that the average breach lifecycle was 241 days – a nine-year low, but still eight months of exposure. For 241 days, the data sits accessible. If it is cleartext, the damage compounds with every passing day. If it is tokenized, there is no damage to compound.
The data-centric model eliminates the window between breach and discovery as a risk factor because the data was already protected before the breach began.
Five processes define how data-centric security operates. Each builds on the one before it, and none works in isolation.
Find all sensitive data across the environment – databases, file shares, SaaS applications, cloud storage, mainframe systems, and shadow data stores.
Automated scanning across structured and unstructured sources is the foundation. You cannot protect what you do not know exists, and most enterprises underestimate how widely sensitive data has proliferated across replicated, test, and analytics environments.
Tag data by sensitivity level, regulatory category (PII, PHI, PAN, intellectual property), and risk context. Classification drives policy – different data types require different protection methods.
Machine learning-based classification scales across petabytes; rule-based approaches alone cannot keep pace with the volume and velocity of modern data creation.
Classification also determines which compliance framework applies: a PAN triggers PCI DSS obligations, a PHI field triggers HIPAA, a personal data record triggers GDPR.
Apply controls directly to the data: tokenization replaces sensitive values with non-derivable substitutes, masking renders fields unreadable for unauthorized users, and encryption transforms data using cryptographic keys.
Protection is applied at the field level, not the system level. A database column containing PANs gets tokenized while non-sensitive columns remain unchanged.
This is where data-centric security diverges from every other security model – the data itself is transformed, not just the access controls around it.
Continuous observation of data access patterns, user behavior, and anomalous activity.
This feeds into Data Detection and Response (DDR) capabilities that detect deviations from established baselines – bulk downloads, access from unfamiliar locations, privilege escalation, or query patterns that do not match a user's historical behavior.
Monitoring is the runtime layer that catches threats the other processes cannot anticipate.
Maintain evidence-ready logs of who accessed what data, when, through which channel, and for what purpose.
Map audit trails to compliance frameworks: PCI DSS access logging requirements, HIPAA access controls, GDPR data subject request fulfillment, and SOX integrity mandates.
Without audit capabilities, data-centric security cannot demonstrate compliance outcomes. The audit layer is the proof that the other four processes are working.
Every competitor article on data-centric security defaults to encryption and access controls.
That framing is incomplete. Six protection methods operate within a data-centric architecture, and they serve different purposes depending on data state, regulatory context, and use case.
The critical distinction is between methods that transform the data and methods that control access to it.
The choice of method depends on the data type, the regulatory framework, and the specific use case.
PCI DSS scope reduction requires tokenization – the PCI SSC Tokenization Guidelines explicitly allow token-only systems to be treated as out of scope when they cannot access the vault, keys, or detokenization service.
HIPAA de-identification can use tokenization or masking. GDPR pseudonymization accepts all three methods, but tokenization provides the strongest separation between the identifier and the data subject's record.
Most mature data-centric architectures use multiple methods simultaneously. A PAN column gets tokenized. An email field gets masked in non-production environments. Data in transit gets encrypted. The protection method matches the risk, not the other way around.
The threat environment is not worsening because attackers are more sophisticated, though they are. It is worsening because the attack surface is larger, data is more distributed, and perimeter controls are less relevant in hybrid and multi-cloud estates.
IBM's 2025 Cost of a Data Breach Report quantifies the damage. The global average breach cost was $4.44 million – and in the United States, it hit a record $10.22 million.
Breaches involving data distributed across multiple environments cost $5.05 million on average because sprawl multiplies the systems an attacker can reach.
Shadow AI has compounded the exposure.
Twenty percent of breaches in 2025 involved unauthorized AI tools, adding $670,000 to average breach costs. 97% of AI-related breaches occurred in organizations without proper access controls, and 63% had no AI governance policies at all.
Shadow data – the duplicated, orphaned copies of sensitive information scattered across analytics pipelines, test environments, and forgotten cloud buckets – creates the same problem at the storage layer.
Organizations using AI and automation extensively in their security operations saved $1.9 million and reduced their breach lifecycle by 80 days.
But the organizations that benefited most were those that had already reduced the value of the data an attacker could reach. If exfiltrated data is tokenized, the breach still happened – but the financial, regulatory, and reputational impact collapses.
Regulatory pressure is accelerating in parallel. PCI DSS 4.0, GDPR enforcement actions, HIPAA audit expansion, and CCPA/CPRA amendments are all converging toward data-level controls. Regulators are no longer satisfied with perimeter certifications.
They want evidence that sensitive data is protected at the field level, regardless of which system stores it or which network surrounds it.
Zero Trust is an architecture. Data-centric security is its central pillar.
The DoD Zero Trust Strategy identifies seven pillars: User, Device, Network/Environment, Application/Workload, Data, Visibility/Analytics, and Automation/Orchestration.
Every pillar exists to protect the data pillar. User verification, device posture, network segmentation – all of these are mechanisms designed to limit who reaches the data.
But Zero Trust without data-centric security is incomplete: it verifies users and devices, then grants access to cleartext data once verification passes.
Data-centric security closes that gap. It ensures that even verified, authorized users interact with tokenized or masked data unless their specific role and context require the original values.
A customer service agent sees the last four digits of a credit card – not the full PAN. A developer working in a test environment sees referentially intact but tokenized records – not production PHI. An analytics pipeline processes masked records that preserve statistical distributions without exposing individual identities.
The result is that a Zero Trust breach – where an attacker compromises a verified identity – still produces nothing of value. The identity is verified. The access is granted. But the data behind the access is already protected at the source.
The argument for data-centric security becomes concrete when mapped to specific compliance outcomes.
The PCI SSC Tokenization Guidelines state that systems storing only tokens – where tokens have no mathematical relationship to the PAN and cannot access the vault or detokenization service – can be treated as out of scope.
In practice, this means an organization that tokenizes cardholder data before it enters downstream systems can reduce its Cardholder Data Environment from hundreds of systems to a handful.
The practical outcome is measurable: SAQ-D to SAQ-A transitions, 70-90% audit scope reduction, and significant cost savings on annual assessments.
A national transportation enterprise demonstrated this when it used vaulted tokenization at the edge of its payment flow to maintain processor independence.
When its incumbent processor imposed a 400% transaction-fee hike, the company switched vendors with zero disruption, avoided break fees, and cut processing costs by 20% – because it owned the tokens, not the processor.
The HIPAA Safe Harbor method requires removal of 18 identifiers to achieve de-identification.
Tokenization satisfies this by replacing each identifier with a non-derivable token while preserving data utility for analytics, research, and test environments.
Non-production environments – developer sandboxes, QA databases, analytics pipelines – represent the largest unprotected surface for PHI in most healthcare organizations.
Data-centric protection eliminates this exposure without degrading the fidelity of the data these environments require.
GDPR Article 32 lists pseudonymization as an appropriate technical measure for data protection.
Tokenization implements pseudonymization by separating the identifier from the data subject's record – the token cannot be linked back to the individual without access to the vault.
For organizations managing cross-border data transfers, tokenized data carries reduced risk because the tokens themselves contain no personal data. The transfer moves tokens, not identities.
Most data-centric security literature assumes cloud-first architectures.
But enterprises in financial services, insurance, telecommunications, and government still run mission-critical workloads on IBM Z mainframes with DB2 databases.
These systems process billions of transactions annually and store decades of historical customer data – often in cleartext.
Legacy systems present a specific architectural challenge: they cannot accept agents, code changes, or modern API integrations.
The deployment methods that DSPM and DLP depend on – cloud APIs, endpoint agents, network proxies – are architecturally incompatible with mainframe environments.
DataStealth addresses this directly. The platform deploys inline, using native DB2 and TN3270 protocols, tokenizing data in-place without modifying schemas, application logic, or mainframe code.
A nationwide telecommunications company used this approach to secure vast volumes of historical subscriber data stored in cleartext on an IBM DB2 mainframe.
DataStealth deployed agentlessly, tokenized in-place, and created a secure bridge to share legacy data with modern downstream systems.
A global insurer faced a parallel challenge: protecting sensitive data in non-production environments – test databases, analytics pipelines, developer sandboxes – where real policyholder data was unnecessary but still present.
DataStealth deployed agentless, in-place tokenization that preserved data formats and referential integrity while replacing every sensitive value with a non-reversible token.
The insurer eliminated the breach risk across those environments without modifying a single application.
Data-centric security is a philosophy. A Data Security Platform (DSP) is the operational implementation.
Forrester defines a DSP through the Define, Dissect, and Defend model – which maps directly to the data-centric security framework. Define the data through discovery and classification.
Dissect data activity through monitoring and analytics. Defend data through protection controls: tokenization, masking, encryption.
The Forrester Wave Q1 2025 evaluated DSP vendors on 23 criteria, and Gartner's Market Guide for Data Security Platforms describes DSPs as platforms that "combine data discovery, policy definition, and policy enforcement across data silos."
A DSP is what turns data-centric security from a model into enforcement. It unifies the five processes – discover, classify, protect, monitor, audit – under a single policy engine, applied consistently across every environment the organization operates in.
DataStealth operates as a DSP.
It discovers and classifies sensitive data across mainframe, cloud, SaaS, and hybrid environments, then applies tokenization, masking, or encryption through a single policy engine – without agents, code changes, or application rewrites.
Deployment starts with a DNS change. The gap between DSPM visibility and actual data protection – the enforcement gap that most security architectures leave open – is what a DSP closes.
Bilal is the Content Strategist at DataStealth. He's a recognized defence and security analyst who's researching the growing importance of cybersecurity and data protection in enterprise-sized organizations.