FAQ: What is microsharding?
Q: What is microsharding?
Good question. Microshard™ technology (“microsharding”) is a patented cloud data protection technology that we created to help ensure data in hybrid-cloud and multi-cloud environments stays secure and available even if compromised or during an outage. Most importantly, it was created to keep control of the data in the hands of the data owner no matter where it’s stored using your storage.
Q: How does microsharding work?
We use a three-step process we refer to as Shred. Mix. Distribute.
1. Shred
We compress and then digitally shred data into small pieces called microshards. The size of the microshards is user-configurable and may be as small as four bytes. To put this into context, if a text file that was shredded into four-byte microshards, each microshard could only contain between 1 and 4 characters, depending on the encoding of the file.
2. Mix
Microshards are then mixed into multiple logical containers along with a user-configurable amount of decoy data (formerly called "poison" data). Decoy data are “fake” microshards that are mixed with legitimate microshards to add complexity to attempts at unauthorized reassembly of Microshard data.
Additionally, filenames, file extensions, metadata, and anything that could be used to establish relationships between microshards and Microshard containers is removed.
The names of the Microshard containers are random, alpha-numeric strings and the number of Microshard containers is equal to the number of storage locations to which the data will be distributed in the next step.
3. Distribute
Finally, the Microshard containers are distributed to multiple, customer-owned storage locations. These can be in hybrid-cloud and multi-cloud environments. We also support multi-region environments with a single cloud provider, as well.
We reverse the process to reassemble the Microshard data.
Q: What is self-healing data?
Self-healing is a feature of Microshard data. Simply put, it’s like RAID-5 for your data, but that is an oversimplification. We perform multiple data integrity checks during the microsharding and reassembly processes. If there is a disparity during the data integrity process, we reconstruct the affected Microshard data in real-time to its unaffected state so that business operations are not impacted.
The benefits of self-healing data are:
- Data resilience: If microsharded data has been deleted or tampered with, including encrypted by ransomware, we reconstruct the affected Microshard data in real-time to its unaffected state.
- Data availability: If you lose connectivity to a storage location, we reconstruct the missing microsharded data in real-time.
Additionally, alternate storage locations may be assigned. If there are X data integrity check failures in Y timeframe, the data from the primary location is moved to the alternate location. The purpose of doing this is to move your microsharded data to a “clean” location should the primary be affected by malware or if the primary is unavailable due to an outage.
Q: Do you introduce a single point of failure?
No. Our solution has been designed for high-availability and fail-over:
- Each instance of ShardSecure is a virtual cluster with multiple nodes. Each node in the cluster is capable of running the application should all other nodes in the cluster go down.
- Multiple ShardSecure clusters may be synchronized to enable automatic fail-over between sites, regions, cloud providers, and between an on-premises instance and a cloud-based instance.
Q: Is your solution hardware- or software-based?
Each instance of our solution is a virtual cluster and is completely software-based. Each node in a cluster is a virtual machine.
Q: Is this a SaaS solution and where are you storing my data?
No, we are not a SaaS solution. We are a software-based solution that you can deploy on-prem and/or in the cloud and we only ever use your storage. Think of us as a BYOS (Bring Your Own Storage) solution. The important thing to note is that you are in control of your data, where it's stored, and who has access. We are a storage abstraction layer between your applications and your storage.
Q: Is ShardSecure a cloud-based or on-prem solution?
Both. You may deploy ShardSecure virtual clusters in the cloud and on-prem.
Q: How many nodes are there in a cluster?
We recommend a minimum of three and a maximum of nine.
Q: How does microsharding impact application performance?
On the whole, it doesn’t. The microsharding process reads and writes in parallel, so the limiting factor is often the network speed between the storage locations and the ShardSecure virtual cluster. Other factors that may impact performance of high-transaction applications are the size of the microshards and the use and size of the decoy data. Both are user-configurable.
Q: Can different types of data be microsharded differently?
Yes. The solution includes a policy engine that allows you to set different parameters based upon file type and other properties.
Q: What modifications do I need to make to my applications to use your solution?
Not too many. On the front end, our solution presents itself as cloud storage through an S3-compatible API and as network storage through a locally-installed iSCSI module.
Q: How do I migrate data from an existing application to your solution?
The process is simply to copy data from your original storage location to your new storage location, which is the target ShardSecure instance. Where our front end is an S3-compatible API, it would be very similar to moving data from your existing storage location to S3 storage.
Q: How do I decommission ShardSecure?
While we hope that is unnecessary, decommissioning is the reverse process of deploying your data to ShardSecure. Again, with our API on the front end, all of your data could be copied or moved via the API, which appears as S3 storage. The microsharding engine will reassemble the data automatically, so there are no additional steps for you to ensure your data is reassembled.
Q: Can I move my data between different storage locations or cloud providers?
Yes. You can move your microshaded data to wherever you like. It’s your data and your storage. Data moves do not require application down time and do not affect performance.
Q: Which cloud providers do you support for storage?
We support all major cloud providers for storage and our software is supported on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Q: If an attacker had access to all of my microsharded data, could they put it back together?
Anyone who has been in InfoSec for long enough knows that there are no absolutes. But whatever the closest value is to "no" without being an absolute "no" is, that's our answer. The thing about microsharding is that breaking it is not a math problem - like, say, encryption - but there is a lot that an attacker needs to know.
Let's try some back-of-the-napkin math where we have a 1MB plaintext file that will be shredded into 4-byte microshards and distributed to five storage locations. We'll forego adding decoy data for this exercise. That means we'd have just over 262,000 microshards. Depending on the encoding used for the file, one microshard can contain 1-4 characters. These microshards will be mixed across multiple, logical containers, which are just files containing the mixed microshards. Each container is given a random alphanumeric string as a name.
So, let's say an attacker gains access to one of the storage locations. What would that attacker have to do to get all of the microsharded data back together?
They'd have to:
- Know how many other storage locations there are.
- Know where all of the other storage locations are.
- Compromise each of those locations.
- Know which containers from each storage location correspond to every other container to represent our original file. How many files do you have in a single storage location? 100s? 1,000s? 10,000s? Whatever that number is, that's how many containers would be in each of the storage locations.
- Know the size of the microshards. The microshard size is configurable and may vary based on file type based upon policies in our policy engine.
- Know whether or not any decoy data was used and what proportion of the microshards are decoy data, if any.
Let's move this along, lest this exercise becomes tedious.
Now, let's say they somehow manage to determine which containers go together for our file; they figured out the microshard size; and they know there is no decoy data. There are still over 262,000 mixed up microshards that can hold as few as one character. Here's where the FAQ author has to confess ignorance as to whether the attacker would have to look at the number possible combinations or permutations, which could be millions or trillions, respectively.
The net is that's very, very difficult and verging on impossible to do.