Microsharding data was an idea born out of frustration.
I was the CTO at TD Ameritrade, a large online stockbroker, where I was responsible for security and fraud. Like many large financial institutions, we relied heavily on encryption to protect our data.
This was especially true as we used more cloud services and storage. Shared environments provided more opportunities for unauthorized users to access sensitive data, which presented three key issues:
- No single security control was perfect all the time. With a defense-in-depth approach, we layered controls so that if one failed, another would protect us — but we still saw the potential for single points of failure with key management.
- There was always the potential for people to forget to apply controls like encryption. Teams could fail to reassess datasets that were initially not sensitive after confidential material was added. Cloud buckets could be misconfigured and left open.
- As attackers acquired faster computers and better approaches to unscramble encrypted data, we had to keep updating our algorithms and keys to make encryption stronger. Worse yet, we had to go back and find our old data to re-encrypt. We were on a treadmill with no way to get off; expecting that adversaries wouldn’t get even faster computers in the future was a bet against Moore’s Law. No one has ever won that bet.
Our patented Microshard™ technology arose from three key realizations.
The first big realization was that encryption didn't keep people from possessing your sensitive data.
Once someone had your encrypted data, by definition they had your data. It was just a matter of time, motivation, and compute power before they could unscramble it.
With that understanding, we changed the question from "how do we encrypt our sensitive data more strategically?" to "how do we keep attackers from having our sensitive data in the first place?"
The second big realization came from the vastness of online data.
Without a search engine pointing to content, it's very difficult to find things on the internet. We think about it in terms of dispersion: Once you dump a glass of water in a swimming pool, you can't reassemble that same glass of water. The molecules are all still there — but finding them without pointers is absurdly hard. Pour fractions of the glass of water into several different pools, and you make it even more difficult.
We saw that we could break our data into "molecules" and dump them into "pools" of data in order to make reassembly virtually impossible. If we dumped randomly selected data molecules into random pools across the country, an attacker wouldn’t even know where to look. They might gain access to one or more pools, but no pool would contain enough molecules to reassemble the original glass of water in a meaningful way.
The third and final realization was that breaking data into chunks and distributing it is conceptually similar to RAID in data storage.
With RAID, datasets are broken into shards, duplicated, and written to multiple disks to provide protection against drive failures and improve performance.
What was new was applying the idea of RAID to data security. That required making the shards too small to hold meaningful data (microsharding) and then distributing them to remote destinations (our random swimming pools).
Done correctly, we could completely devalue and desensitize data — eliminating security and compliance concerns — and still maintain the performance and resiliency benefits of RAID. Each remote destination would hold only a portion of tiny fragments of a data file. We would then mix those fragments with completely unrelated data fragments from other sources to effectively poison the well. And, if we used policy routing to the destinations, we could ensure that data took diverse network paths to protect it in transit as well as at rest.
ShardSecure's solution tied it all together.
That didn't mean it would be easy. We would have to manage billions of pointers to fragments so we could reassemble data on demand. We would also have to make this solution easy for organizations to deploy and use within their existing architectures.
Here, we were again helped by the storage industry, which was already making use of the concept of a virtual storage appliance. What if our microsharding and reassembly appliance just looked like a disk to users and applications? No changes would be needed to introduce the capability; we could just start writing to a new storage location.
From these realizations, ShardSecure was born. We created an engine based on microsharding technology that looked like a disk or cloud storage bucket. This platform enabled rapid cloud adoption without the usual security, compliance, resiliency, and performance challenges. And it did all this without encryption — though that could still be layered in for defense in depth.
Frustration solved.