Why you never put all your eggs in one basket
I’m about as big a proponent of the cloud as you can get. I think almost every company could find a way to utilize cloud services to decrease their costs. But, naturally, most people don’t use the cloud ideally, as shown by the recent outage of Amazon’s AWS service which killed large sites like Reddit and Netflix, which apparently run almost entirely in the cloud.
If you’ve played with the cloud, you may be wondering how that’s possible. After all, the point of the cloud is to have a decentralized system in place so that it can’t go down. If a server goes down, another one is instantly ready to replace it. Amazon has different zones so you can create server instances closer to your users and hopefully get better performance. And if, as has happened recently, a data center that represents one of the zones does down? Well just spin up new instances of the lost servers in another zone and continue on your merry way!
By the looks of this article, however, that hasn’t happened. Companies, it seems, put a little too much faith in the reliability of Amazon and other cloud providers. Part of that is Amazon’s fault for playing up the fact that they are the Titanic, the unsinkable ship. But the other part of the blame is squarely on companies that don’t plan for failure.
“The best-laid plans of mice and men often go awry” comes to mind right now. Failures do happen. Software becomes corrupted, hardware fails, lightning strikes take out a data center, and chaos will ensue if you aren’t ready for it.
Let’s look at the recent Amazon EC2 zone outage as an example. If I were running a large web site, firstly I’d likely have mirror images of my server structure in several instances, from Europe to America, so that everyone got a good connection to my site. This would also help when a zone went down, because if I was watching it, or had a clever script doing it, it would migrate that load to other zones. For instance, if like in the current example US-East goes down, I’d migrate traffic to US-West until the problem is resolved. Sure the site may be a little slower than usual for people on the East side of the country, but my site is still up.
Cloud technology not only lets you scale your site up and down as demand spikes occur, but you also can expand geographically. Every major cloud provider has multiple data centers. Take advantage of them, even if only so they are there in an emergency. And if worst comes to worst, have a backup cloud provider ready to step in if someone at Amazon messes up an upgrade.