Amazon’s recent outage (details here) points to why having a centralized storage model won’t cut it in the long-term vision of the cloud. Although Amazon customers realize an economy of scale as promised by the cloud, they also risk the fact that they don’t have control of where their data resides in the cloud, making it subject to centralized failures.
As a clear result, cloud storage will need to be designed on one key principle – it needs to shift to a decentralized model. So what does it mean to decentralize storage?
When thinking of decentralization, the Internet is the first place one should look to since it was designed to be fault tolerant. Storage will need to learn some lessons from the network and behave similarly in order to be successful. TCP/IP decentralizes packet routing enables packets to be routed on different paths through the network, which is beneficial in the long-run.
To borrow from the network and to prevent bottlenecks of scale, the data itself will need to take on a “packetization” architecture. This means that besides using packets to traverse the network, the data itself should be packetized and stored as packets.
Rather than saving data to centralized servers in a single data center that’s at risk for local outages, data will need to be virtualized into packets and stored across multiple servers in multiple data centers. When data is requested back, the system is smart enough to gather enough packets to reconstruct the data. Zero outage in the US Eastern Region data center? No problem.
But wait – you must be screaming about the latency… First off, networks are still increasing in speed so by the time the cloud is truly adopted by the masses, the network isn’t going to be the bottleneck. Second, for the early adopters out there, you could configure such as system to have enough packets at local data centers to address latency concerns while still having some packets spread out. That way, if the local data center is down, you can still access data seamlessly.
Once we can all embrace a decentralized storage architecture, a whole slew of design requirements come into play. Such as, how does the storage system optimize reading and writing of the packets knowing that all may not be available? How can such a decentralized system be used for content distribution to the masses?
This naturally leads to Cleversafe, where we are already packetizing data into something we call slices and already thinking of how a decentralized world of storage will work.
So how could the Amazon outage be avoided and what will the cloud storage model need to evolve to in order to be stable? Move to a decentralized platform.
