FAQs: Data resilience
Q: How are you defining “data resilience”?
The term data resilience generally refers to how well an organization’s data — and by extension its critical operations — can withstand major disruptions. It is what allows companies to maintain their critical operations during unexpected events like ransomware attacks, malware attacks, server crashes, physical storage device failures, cloud provider and data center outages, misconfigurations, and more.
Q: What are the consequences of weak data resilience?
Financial loss from interruptions to business continuity is the main risk of weak data resilience. In some cases, losing access to data and interrupting business operations may lead to legal penalties and loss of customer trust. For instance, some consider a ransomware attack to be a data breach, rather than a security event, because authorized access to protected data has been prevented. This, of course, has a direct impact for an organization’s compliance posture.
Q: Does microsharding help improve data resilience?
Yes. (It would be foolish of us to create an FAQ on something we don’t do.) Microsharded data is “self-healing”, and we facilitate this at both the macro storage level and the micro data level:
- At the storage level, think of what we do as RAID for data in the cloud. This means that you can lose access to one or more storage locations, and your operations continue without interruption.
- At the data level, we are able to reconstruct microsharded data that has been tampered with (including encrypted by ransomware) or deleted by an unauthorized user. Again, your operations continue without interruption.
Q: How does “self-healing data” work?
Let’s look at this from the storage level and the data level, starting with the data.
Data-level resilience
Microsharding includes multiple data integrity checks both as data is shredded and as it’s reconstructed. If we detect a discrepancy at any point as we reassemble the data, we are able to reconstruct the affected portion of microsharded data returning it to its unaffected state as it was saved. This includes unauthorized deletions, as well as data tampering.
We do this automatically and in real time.
Storage-level resilience
We offer RAID-5-like and RAID-50-like redundancy for your cloud storage locations. If you’d read our FAQ on microsharding, you’ll recall that the last step in microsharding is to distribute the data across multiple storage locations.
In the RAID-5-like configuration, we use a level of erasure coding similar to the usage of parity in a RAID-5 array. That means that you can lose access to any one storage location and operations will continue without interruption. If you lose access to more than one location, though, you will experience an outage.
For capacity planning, the equivalent to one location will be allocated to parity. So, if you plan to use four 50TB storage locations for your microsharded data, you will have approximately 150TB available for the data and approximately 50TB for parity.
A RAID-50 configuration consists of a mirrored RAID-5 array. The advantage is that you are able to sustain the loss of access to multiple storage locations without impacting your operations, but you’ll need to use significantly more storage. Using the example in the paragraph above, you will need to allocate eight 50TB locations. Three will be for the data, another three to mirror that data, one for parity and another one for the mirror of the parity.
Instance-level resilience
It is also possible to synchronize two or more instances of ShardSecure for additional resilience. This way, if one complete instance is unavailable, traffic fails over to another instance.
Q: What is the right level of resilience I should configure?
You can imagine that you could go pretty nuts with this whole thing, but what do you really need? That will depend on the value of the data and the value of the operations using it. Keep in mind that these settings are per-instance, so you can have mission critical apps using one instance set at ludicrous resilience, while using a less-ludicrous instance for other data.
Q: What about resilience for ShardSecure, itself?
Each instance of our application is a virtual cluster, so there are multiple nodes. The minimum recommended number of nodes is three and the maximum that we recommend is eleven. (Yes, our clusters go to 11!) Clustering, of course, means higher availability of the application.
Two or more instances may be synchronized for high availability and failover. We recommend that instances be deployed such that the likelihood of a complete outage is low. For example, with two instances, we recommend that one instance be on-prem and the other in the cloud. There are many permutations as more instances are added. Again, what is the right fit for your organization will depend on a number of factors, but the net is that you have options.