I was the CTO at TD Ameritrade, a large online stockbroker, where I was responsible for security and fraud. Like many large financial institutions, we relied heavily on encryption to protect our data.
This was especially true as we used more cloud services and storage. Shared environments provided more opportunities for unauthorized users to access sensitive data, which presented three key issues:
Once someone had your encrypted data, by definition they had your data. It was just a matter of time, motivation, and compute power before they could unscramble it.
With that understanding, we changed the question from "how do we encrypt our sensitive data more strategically?" to "how do we keep attackers from having our sensitive data in the first place?"
Without a search engine pointing to content, it's very difficult to find things on the internet. We think about it in terms of dispersion: Once you dump a glass of water in a swimming pool, you can't reassemble that same glass of water. The molecules are all still there — but finding them without pointers is absurdly hard. Pour fractions of the glass of water into several different pools, and you make it even more difficult.
We saw that we could break our data into "molecules" and dump them into "pools" of data in order to make reassembly virtually impossible. If we dumped randomly selected data molecules into random pools across the country, an attacker wouldn’t even know where to look. They might gain access to one or more pools, but no pool would contain enough molecules to reassemble the original glass of water in a meaningful way.
With RAID, datasets are broken into shards, duplicated, and written to multiple disks to provide protection against drive failures and improve performance.
What was new was applying the idea of RAID to data security. That required making the shards too small to hold meaningful data (microsharding) and then distributing them to remote destinations (our random swimming pools).
Done correctly, we could completely devalue and desensitize data — eliminating security and compliance concerns — and still maintain the performance and resiliency benefits of RAID. Each remote destination would hold only a portion of tiny fragments of a data file. We would then mix those fragments with completely unrelated data fragments from other sources to effectively poison the well. And, if we used policy routing to the destinations, we could ensure that data took diverse network paths to protect it in transit as well as at rest.
That didn't mean it would be easy. We would have to manage billions of pointers to fragments so we could reassemble data on demand. We would also have to make this solution easy for organizations to deploy and use within their existing architectures.
Here, we were again helped by the storage industry, which was already making use of the concept of a virtual storage appliance. What if our microsharding and reassembly appliance just looked like a disk to users and applications? No changes would be needed to introduce the capability; we could just start writing to a new storage location.
From these realizations, ShardSecure was born. We created an engine based on microsharding technology that looked like a disk or cloud storage bucket. This platform enabled rapid cloud adoption without the usual security, compliance, resiliency, and performance challenges. And it did all this without encryption — though that could still be layered in for defense in depth.
Frustration solved.