Skip to content

Emails, documents, smartphone sensors, machine learning: Everything relies on unstructured data. Why is it so difficult to protect?

The challenges of securing unstructured data are considerable. From the increasing sophistication of cyberattacks to the growing number of data privacy regulations, CISOs today are faced with a highly complex landscape.

Unstructured data — data that is not or cannot be organized into a typical relational database — is particularly difficult to safeguard. It doesn’t adhere to conventional data models, and many companies have unstructured datasets on the scale of tens or hundreds of billions of items to protect. The challenges are manifold.

First, the average enterprise stores massive amounts of unstructured data. Approximately 97 zettabytes of data were created worldwide in 2022, and 80-90% of all data is unstructured data. By 2025, there are expected to be 175 zettabytes of unstructured data in the world.

Second, unstructured data is by its very nature difficult to categorize and organize. I’ve personally spoken with many companies that don’t know exactly what their unstructured datasets contain. Similarly, a report by the Institute of Directors (IoD) and Barclays reveals that over 40% of companies don’t even know where all their critical data is stored.

Third, unstructured data is the linchpin to application security and availability. The functionality of all applications, from email tools to databases and VMs, depends on the health of the underlying file systems. When the unstructured data in those file systems is lost, corrupted, or deleted in a cyberattack, the result can include crashed applications, broken databases, SLA violations, and interruptions to business continuity.

Going beyond RBAC and single points of failure

The current standard for securing unstructured data is to rely on role-based access controls, RBAC. While RBAC is absolutely necessary and does offer protection against unauthorized users without the proper credentials, it’s also a limited approach. RBAC doesn’t stop many common attacks, so you’re always one credential abuse away from being in trouble.

Unstructured data is typically left under-protected beyond RBAC. The systems that host it have a single point of failure: privileged credentials. This is true for all data repositories, and it’s true for any third parties like cloud storage providers that retain privileged access.

The result is that it’s nearly impossible to prevent data exfiltration when every system has a root password. Likewise, it’s nearly impossible to prevent data tampering, a risk to valuable AI/ML models and training data in particular, when a single credential incident is all that it takes for an attacker to gain access. At-rest encryption offers a layer of security for data stored in the cloud, but it is not a realistic mitigation strategy to protect against risks to availability, data exfiltration, or breaches caused by human error, ransomware, and other cyberthreats.

Top risks to unstructured data security

As CISOs, our job is to assess and mitigate risk to our data assets. For unstructured data, those risks come in many forms, and our response needs to be balanced against all the different kinds of risk.

Risks to availability. Poor data quality and incompatibility can both cause problems with availability, but far more common threats are server crashes, network failures, and cloud provider outages. Cloud storage SLAs may specify a small refund in the case of downtime, but that refund won’t defray the significant, multimillion dollar costs of downtime — nor will it bring your service back online. Data availability is further complicated by the complexity of hybrid- and multi-cloud environments.

Operational risks. A recent study by a Stanford University professor found that 88% of breach incidents are caused by employee mistakes, while a similar study by IBM Security estimated it to be a shocking 95%. That’s because complex operational environments with multiple tools over multiple disparate cloud providers have become commonplace, and increased complexity makes mistakes more likely. As a result, we’ve seen a significant number of large breaches involving human error — most commonly when an employee accidentally leaves a storage bucket open to public access while migrating or sharing data. From there, it’s only a short step to a costly data breach.

Ransomware and data exfiltration risks. Another significant risk to unstructured data is ransomware. 2023 is on track to be a record-breaking year for ransomware, with a 49% YoY increase in the first six months. The rise of ransomware threatens availability in on-prem and cloud environments, and it also affects data privacy. A significant number of ransomware attacks today are double extortion attacks, where any unsecured data that is accessed in a ransomware incident is easy prey for attackers who threaten to publish or leak exfiltrated data.

Compliance risks. We’ve written extensively about data privacy and compliance over the last year. The most important takeaway is that data protection laws are on the rise — and the penalties for noncompliance are staggering. Many regulations now require certain technical safeguards to consumer personal data, from the European Data Protection Board’s recommendations for cross-border data transfers to the German Federal Financial Supervisory Authority’s requirements for data backups. Minimizing legal liability and financial risk in the face of these many requirements is an ongoing challenge.

Mitigating risk and strengthening security for unstructured data

Unstructured data is the backbone of your organization. It’s also a vulnerable point. The vast majority of cyberattacks are carried out on unstructured data, so this kind of data must be the primary focus of protection.

Cybersecurity experts offer a wide array of individual recommendations to protect unstructured data. Your company might consider encryption by default, privacy by design, principle of least privilege, and data minimization.

While all these approaches are essential, it’s also useful to take a holistic view of the risks to unstructured data. The ideal technologies will address not just one category of risk but many. Specifically, I recommend choosing a security solution with following features to safeguard unstructured data:

  • Robust data resilience, to mitigate external risks to availability.
  • Advanced data privacy, to mitigate issues with data exfiltration, legal liability, and regulatory compliance.
  • Self-healing, to mitigate data loss from human error and from ransomware and malware attacks.
  • Infrastructure-agnostic platform, to accommodate the complexity of hybrid- and multi- cloud architectures.

The ShardSecure platform offers an award-winning approach to mitigate risks and protect unstructured data, wherever it resides. To learn more about how ShardSecure can support a robust cybersecurity strategy with advanced data security, privacy, and resilience, take a look at our white paper or check out our resources page.