There’s no such thing as a truly infallible system. Every platform, no matter how extensive or powerful, has points of failure. The Amazon Web Services outages we’ve seen over the years are proof enough of that – evidence that even the cloud can be brought down from time to time.
“Outages are a thing that happens, whether your computing is happening in your office, in colocation, or in ‘the cloud,’ which is just a shorthand term for someone else’s computer,” writes Forbes Contributor Justin Warren. “To think that putting applications ‘ in the cloud’ magically makes everything better is naive at best.”
That’s something of a bitter pill to swallow, isn’t it? Even if AWS fails infrequently, a single failure can still be downright devastating for your organization. It’s in your best interests to ensure that you’re prepared for such failures.
The keyword here is redundancy. As mentioned by Netflix after it was brought down by an AWS outage in 2011, the key is that you build your systems to fail from the start. What they mean by that is simple – layer redundancy into your app while it’s in development. Build it to smoothly handle failover through measures such as stateless services, store your data across multiple zones, and code things for graceful degradation – set each component of your app with an aggressive timeout, and ensure that each component is redundant.
More importantly than all of that, you shouldn’t ever rely exclusively upon a single service to keep yourself online. Always have a “Plan B” waiting in the wings. Australian real-estate giant REA is a good example of this in practice.
In June, when an AWS outage sent some of the country’s largest web properties crashing down, REA managed to weather the storm with only “a broken ad server, one offline app, a wobbly Android application and slightly slower response times for some services,” reads a post on IT News Australia. Deployed as a multi-availability zone setup by default, REA’s most critical systems have been designed so that they run across multiple regions, while its IT team operates independent copies of each system, each of which interacts with a master.
The lesson you should take away from all of this is that you can no longer take the ‘ostrich approach’ to business continuity. You cannot reasonably expect your cloud provider to handle all the details of failover and disaster recovery without any intervention from you. Instead, you need to take business continuity into account before you even think of adopting the cloud – whether you’re using PaaS, SaaS, IaaS, or any of its other permutations.
Again, I’d like to be clear that AWS outages are rare. This is a multinational company which has built much of its business on reliability. At the same time, like any system, it can fail.
Being aware of that is the first step to mitigating it.