Amazon EC2 outages - lessons not learned

Really Simple Systems' John Paterson stresses firms should never have one single point of failure

Recently Amazon's hosting service, EC2, went down for several hours causing customers such as Airbnb, Instagram, Netflix and Vine to go offline, plus a myriad of lesser known companies.

The lesson that cloud providers still don't seemed to have learned from the Amazon outage is that you should never have a single point of failure, and trusting 100 per cent of your hosting to one supplier - no matter how big they are - is a single point of failure.

Every datacentre and hosting provider goes down once a year; there are just so many links in the chain to keep unbroken 24/7/365.

If cloud providers are serious about uptime they need to spread the load across multiple data centres or hosting providers across two disparate datacentres, geographically separate and each with different power and internet connections and have a facility to switch the service when one system fails.

But that is tricky to do with services like EC2. It is technically possible to have a failover system for EC2 in place, but it would be costly to have a system sufficiently powerful enough to handle all your load just twiddling its thumbs for 364 days in the year.

Load balancing across that system and EC2 would be complicated, to say the least. You really need to have two hosting providers, of equal size and with identical architecture, then spread the load across each, but with both having the capacity to handle the whole load if needed in an emergency.

If cloud providers are serious about customer satisfaction, they need to change their mindset so that downtime is not acceptable, and build their architecture accordingly.