Data Security
May 7, 2026

What Is a Data Breach?

Summary
A data breach is a security incident in which unauthorized parties access sensitive or confidential information — including personal data (Social Security numbers, bank account numbers, healthcare records) and corporate data (customer records, intellectual property, financial information). Breaches are caused by external attacks (phishing, ransomware, credential theft), insider threats, and human error. The global average cost is USD $4.44 million (IBM 2025), with United States breaches averaging $10.22 million. The average time to identify and contain a breach is 241 days. Organizations that apply data-level protections — tokenization, dynamic data masking, and encryption — mitigate breach impact by rendering stolen data worthless to attackers.

A data breach is a security incident in which unauthorized parties access sensitive or confidential information — including personal data (Social Security numbers, bank account numbers, healthcare records) and corporate data (customer records, intellectual property, financial information). 

Breaches are caused by external attacks (phishing, ransomware, credential theft), insider threats, and human error. The global average cost is USD $4.44 million (IBM 2025), with United States breaches averaging $10.22 million. 

The average time to identify and contain a breach is 241 days. Organizations that apply data-level protections — tokenization, dynamic data masking, and encryption — mitigate breach impact by rendering stolen data worthless to attackers.

Types of Data Breaches

Data breaches fall into distinct categories based on their origin and intent. Understanding the taxonomy is essential for data breach prevention — it helps organizations design data security and data protection controls that address each vector.

External Breaches

External breaches originate outside the organization. Common methods include phishing and social engineering, ransomware, malware, credential theft via brute-force attacks, exploitation of system vulnerabilities (including zero-day exploits), and supply chain attacks where attackers compromise a vendor's software to reach downstream customers. The SolarWinds attack in 2020 demonstrated that a single compromised vendor can expose entire government agencies.

Internal Breaches

Internal breaches originate from within the organization. Malicious insiders — employees, contractors, or partners — may intentionally steal or leak personal data for financial gain, revenge, or competitive advantage. Accidental exposure is more common: misconfigured cloud storage, unsecured databases, or lost devices can expose sensitive data without any cyberattack involvement. The IBM Cost of a Data Breach 2025 report found that human error accounts for 26% of all data breaches, while IT failures account for 23%. Both categories represent data security failures that effective data breach prevention programs must address.

Data Breach vs Data Leak vs Data Loss

A data breach involves unauthorized access — someone who should not have the data gains access to it. A data leak is the unintentional exposure of data through misconfiguration or error, such as an unsecured Amazon S3 bucket. A leak creates the opportunity for a breach, but not every leak is exploited. Data loss is the destruction or corruption of data (hardware failure, accidental deletion) — the data is gone, not compromised. The distinction matters because each triggers different response and notification obligations.

Type Origin Examples Intent
External — targeted Outside the organization Phishing, ransomware, credential theft, vulnerability exploits Intentional
External — supply chain Third-party vendor/partner Compromised software updates, backdoored platforms Intentional
Internal — malicious Authorized insiders Employee selling data, disgruntled admin exfiltrating records Intentional
Internal — accidental Authorized insiders Misconfigured cloud storage, email to the wrong recipient, lost device Unintentional
Data leak System misconfiguration Unsecured database, public S3 bucket, exposed API endpoint Unintentional

How Do Data Breaches Happen?

Most intentional data breaches follow a three-phase pattern. 

  • First, the attacker conducts reconnaissance — identifying the target and probing for data security weaknesses, whether technical vulnerabilities or employees susceptible to social engineering.

  • Second, the attacker launches the initial cyberattack — a phishing email, a vulnerability exploit, a stolen credential used to log in.

  • Third, the attacker compromises the data — locating sensitive personal data and exfiltrating, encrypting, or destroying it.

The attack vectors are well-documented. The IBM Cost of a Data Breach 2025 report provides specific data on the most common methods attackers use to access sensitive data.

Attack Vector % of Breaches Average Cost Avg Days to Identify
Phishing 16% $4.62M
Stolen / compromised credentials 10% 186 days
Ransomware $5.08M
Human error 26%
IT failures 23%

(Source: IBM Cost of a Data Breach Report 2025)

Phishing remains the most common attack vector, accounting for 16% of breaches at an average cost of $4.62 million. 

Stolen or compromised credentials account for 10% and take up to 186 days to identify — the longest detection time of any vector. 

Ransomware costs an average of $5.08 million per incident, and that figure excludes ransom payments, which can reach tens of millions. Data access controls that restrict credential scope and detect anomalous usage patterns are the primary defence against credential-based attacks.

Beyond targeted attacks, system vulnerabilities and supply chain compromises provide entry points when software is unpatched, or vendor security is weak. 

Attackers exploit known vulnerabilities before organizations apply patches — the Equifax breach in 2017 was caused by a single unpatched web application flaw.

 Data discovery and classification are essential because attackers who breach the perimeter will seek out the highest-value data first.

What Does a Data Breach Cost?

The financial impact of a data breach extends far beyond the immediate incident. 

Data breach costs include lost business, detection efforts, post-breach response, and regulatory notification — and the total continues to rise in most regions. 

The IBM Cost of a Data Breach 2025 report — based on 600 organizations breached between March 2024 and February 2025 — provides the most detailed data security cost breakdown available for organizations protecting sensitive data.

Cost Component Average Cost
Detection and escalation $1.47M
Lost business $1.38M
Post-breach response (fines, settlements, legal, credit monitoring) $1.20M
Notification $390K
Total (global average) $4.44M

(Source: IBM Cost of a Data Breach Report 2025)

The global average cost fell 9% from $4.88 million in 2024 to $4.44 million in 2025, driven by faster detection through AI-powered security tools. 

The United States moved in the opposite direction: the average US breach reached $10.22 million, up 9% year-over-year, driven by regulatory penalties and longer detection times. 

Healthcare recorded the highest average cost for the 15th consecutive year at $7.42 million, reflecting the sensitivity of protected health information and the strictness of Health Insurance Portability and Accountability Act (HIPAA) enforcement.

Organizations using artificial intelligence (AI) and automation extensively in their security operations resolve breaches 80 days faster and save $1.9 million per breach on average. 

Shadow AI — unauthorized generative AI tools used without IT oversight — was involved in 20% of breaches, adding $670,000 to average costs. Breaches involving data distributed across multiple environments (cloud, on-premise, SaaS) cost $5.05 million on average.

Customer trust compounds the financial damage. Research shows that 79% of consumers say data protection underlies their trust in a company, and more than 80% would stop doing business after a breach. The reputational cost is often the hardest to recover from — data security platforms that prevent cleartext exposure can eliminate this risk at the source.

Data Breach Notification Laws

When a data breach occurs, organizations face legally mandated data breach notification timelines that vary by jurisdiction and data type. 

The trend across all jurisdictions is consistent: shorter notification windows, stricter penalties, and broader scope. 

Your data security and data protection program must include pre-built data breach notification workflows tied to each regulatory framework that governs the personal data you hold.

Regulation Notification Deadline Who Must Be Notified Maximum Penalty
General Data Protection Regulation (GDPR) 72 hours Supervisory authority + individuals (if high risk) €20M or 4% global revenue
Health Insurance Portability and Accountability Act (HIPAA) 60 days Department of Health and Human Services (HHS), individuals, media (500+ affected) $2.13M per violation category/year
California Consumer Privacy Act / California Privacy Rights Act (CCPA/CPRA) "Expeditiously" Affected consumers + California AG (500+ affected) $100–$750/consumer statutory damages
Cyber Incident Reporting for Critical Infrastructure Act (CIRCIA) 72 hours Department of Homeland Security (DHS) / Cybersecurity and Infrastructure Security Agency (CISA) Enforcement TBD
Personal Information Protection and Electronic Documents Act (PIPEDA) "As soon as feasible" Privacy Commissioner + affected individuals CAD $100K per violation

The EU's GDPR sets the global standard with its 72-hour data breach notification requirement. 

In the United States, all 50 states have their own data breach notification laws with varying timelines and definitions of "personal data." 

HIPAA requires covered entities to notify the US Department of Health and Human Services (HHS), affected individuals, and — for breaches affecting 500 or more people — prominent media outlets within 60 days.

Enforcement is accelerating. The FTC fined Epic Games USD $275 million for Children's Online Privacy Protection Act (COPPA) violations in 2022. 

Cumulative GDPR fines have exceeded €4 billion since 2018. PCI Data Security Standard (PCI DSS) non-compliance can trigger fines of up to $100,000 per month, with tokenization being the standard approach to reducing scope and exposure.

Notable Data Breaches

The largest data breaches in history share a common pattern: attackers exploited a known, preventable vector, and the personal data they accessed was stored in cleartext — unprotected at the data security level.

Yahoo (2013). Hackers exploited a weakness in Yahoo's cookie system to access the names, birthdates, email addresses, and passwords of all 3 billion users. The full scope was revealed in 2016 during Verizon acquisition talks, reducing the purchase offer by $350 million. Had the sensitive personal data been tokenized, the stolen records would have been worthless.

Equifax (2017). An unpatched web application vulnerability gave attackers access to the personal data of more than 143 million Americans, including Social Security numbers, driver's licence numbers, and credit card numbers. The breach cost $1.4 billion in settlements and fines. A single missing patch was the entry point; the absence of data-level protection was the amplifier.

SolarWinds (2020). Russian threat actors compromised the Orion network monitoring platform and distributed malware to SolarWinds customers, including the US Treasury, Justice, and State Departments. This supply chain attack demonstrated that even well-defended organizations are vulnerable through their vendors.

Colonial Pipeline (2021). Ransomware forced the shutdown of the pipeline supplying 45% of the US East Coast's fuel. The entry point: a single employee password found on the dark web. The company paid a $4.4 million ransom in cryptocurrency.

23andMe (2023). Hackers stole 6.9 million user records — including genetic data and family trees — through credential stuffing, a technique that exploits password reuse across platforms. The breach highlighted that even non-financial personal data carries significant privacy and security risk.

What most people miss: in every case above, perimeter defences failed — and the data itself was exposed in cleartext. Organizations that apply tokenization or dynamic masking to sensitive fields before storage ensure that even a successful breach yields nothing an attacker can use.

Data Breach Prevention Best Practices

1. Discover and Classify Your Sensitive Data

Data breach prevention starts with visibility. 

You cannot protect personal data that you do not know exists. Deploy automated data discovery tools that scan on-premise, cloud, SaaS, and legacy environments. 

Classify data by sensitivity level — PII, PHI, PCI, financial, intellectual property — and align classifications to regulatory requirements. 

Address dark data: unstructured, ungoverned datasets in forgotten backups, email archives, and legacy databases that create unmonitored data security gaps.

2. Apply Data-Level Protection: Tokenization, Masking, and Encryption

Perimeter security controls who gets in. Data-level protection controls what happens when those controls fail.

  • Tokenization replaces sensitive data with non-reversible surrogates. Stolen tokens are worthless — there is no mathematical relationship between the token and the original value, and no key to reverse it.

  • Dynamic data masking reveals only the data elements each user's role requires.

  • Encryption protects data in transit (Transport Layer Security (TLS) 1.3) and at rest (Advanced Encryption Standard (AES)-256).

A critical distinction: encryption does not reduce compliance scope under most frameworks because encrypted data is reversible with the key. 

Tokenization eliminates the sensitive data from systems entirely, which is why it is the preferred method for PCI DSS scope reduction and data breach mitigation.

3. Enforce Zero Trust and Least-Privilege Access

Never assume trust. Verify every access request regardless of origin. Implement role-based access control (RBAC) and attribute-based access control (ABAC) with dynamic policy enforcement at the data layer — not just the network layer. 

Use multi-factor authentication (MFA) for all access to systems containing sensitive data. Zero trust principles applied at the data level mean that even authenticated users see only de-identified data unless their role, context, and attributes explicitly authorize cleartext access.

4. Develop and Test an Incident Response Plan

The IBM 2025 report found that incident response (IR) planning and testing was the third most popular area of security investment, cited by 35% of respondents. 

Define roles, escalation paths, communication procedures, and containment steps before an incident occurs. Include regulatory notification workflows tied to specific deadlines (72 hours for GDPR, 60 days for HIPAA). 

Test with tabletop exercises and breach simulations — an untested plan is not a plan. Integrate IR workflows into your data security management program.

5. Deploy AI-Powered Detection and Automation

Organizations using AI extensively resolve breaches 80 days faster and save $1.9 million per breach. 

Integrate AI into Security Information and Event Management (SIEM), data loss prevention (DLP), and monitoring tools for real-time anomaly detection. Automate response workflows to contain threats during the detection-to-remediation window — the period where most damage occurs. Real-time monitoring transforms static security into active defence.

6. Train Employees to Recognize and Report Threats

Phishing (16%) and human error (26%) together account for 42% of data breaches. Conduct regular data security awareness training, phishing simulations, and clear reporting procedures. 

Train employees on proper handling of personal data and sensitive data — accidental exposure through misconfiguration, email errors, and unsecured storage is the most preventable breach risk an organization faces. Data breach prevention depends on every employee understanding their role.‍

7. Patch and Update Relentlessly

Unpatched vulnerabilities were the entry point in multiple major breaches — Equifax (2017), SolarWinds (2020). Automate patch management wherever possible. 

Prioritize patches for internet-facing systems and any system handling sensitive data. Vulnerability scanning should be continuous, not quarterly. 

Data protection platforms that operate at the network layer add a safety net: even if a vulnerability is exploited, the data an attacker reaches is tokenized.

8. Manage Third-Party and Supply Chain Risk

According to the Verizon Data Breach Investigations Report (DBIR), 30% of data breaches involve a third party. 

A single vendor cyberattack can expose your personal data even when your own data security controls are strong. Assess vendor security during procurement. Enforce contractual data protection obligations. 

Apply tokenization to data shared with third parties so they never handle cleartext PII, PHI, or PCI data. If a vendor is breached, the data they hold is surrogates — not exploitable personal data.

Responding to a Data Breach

When a breach occurs, speed and structure determine the outcome. Every hour of delay increases data breach costs and exposure. A tested data security response process is the difference between containment and catastrophe.

  1. Contain. Isolate affected systems. Revoke compromised credentials. Preserve forensic evidence. The goal is to stop the bleeding without destroying the data you need to investigate. Zero trust segmentation limits lateral movement and reduces the blast radius.

  2. Assess. Determine what data was accessed, how the breach occurred, and the scope of impact. Identify which regulatory frameworks apply based on the data types compromised — PII, PHI, and PCI data each trigger different notification requirements.

  3. Notify. Report to regulators within required timelines. Notify affected individuals. Engage legal counsel. Under GDPR, you have 72 hours. Under HIPAA, 60 days. Under state breach notification laws, timelines vary. Missing a notification deadline compounds the financial and reputational damage.

  4. Remediate. Patch the vulnerability that was exploited. Reset all potentially compromised credentials. Strengthen access controls based on the attack vector identified.

  5. Recover. Restore systems from clean backups. Monitor for persistent access, backdoors, or reinfection.

  6. Review. Conduct a post-incident analysis. Update your IR plan, security controls, and employee training based on findings. Every breach is a data source for improving your next response.


Reducing Breach Impact at the Data Layer

Traditional data breach prevention focuses on keeping attackers out. That approach is necessary but insufficient — the IBM 2025 data confirms that data breaches continue to occur even in organizations with strong perimeter defences. Sensitive data that remains in cleartext behind perimeter controls is one cyberattack away from exposure.

Modern data security and breach resilience focuses on making the data itself worthless to attackers. Data security platforms apply tokenization, dynamic data masking, and encryption inline — before sensitive personal data reaches downstream systems, third parties, or AI pipelines.

DataStealth enforces field-level data protection at the network layer, without code changes, API integrations, or agent installations. It protects sensitive data across legacy, on-premise, cloud, SaaS, and AI environments. Even if attackers breach your systems, they find only surrogates — not exploitable data.

See how DataStealth protects your sensitive data →

Frequently Asked Questions: Data Breaches