(The Hosting News) – The recent cloud downtime mostly concerning Amazon Web Services caused quite the stir online last week. Now the company has released a new report detailing the disruption of the cloud services based out of Dublin, Ireland.
Commenting on the origin of the problem, AWS states, “The service disruption began at 10:41 AM PDT on August 7th when our utility provider suffered a failure of a 110kV 10 megawatt transformer. This failure resulted in a total loss of electricity supply to all of their customers connected to this transformer, including a significant portion of the affected AWS Availability Zone.”
Services that appeared effected according to the AWS health dashboard at the time of the problem included Elastic Compute, a web-scale tool and CloudWatch, an app monitoring service.
While Amazon Web Service’s utility provider initially thought that the outage was a result from a lightning strike, AWS is reporting that it no longer thinks that’s the case. Meanwhile, the outage was prolonged by a failure of AWS’s backup generators to properly activate following the initial power loss. AWS states, “With no utility power, and backup generators for a large portion of this Availability Zone disabled, there was insufficient power for all of the servers in the Availability Zone to continue operating.” Meanwhile, further delays resulted when AWS ran out of spare capacity needed to re-launch EBS volumes.
When things go bad, there’s usually something to learn. This case is no exception and in the report, AWS outlined some new plans to prevent future problems of the same type. AWS states, “To further prevent the loss of power, we will add redundancy and more isolation for our PLCs so they are insulated from other failures. Specifically, in addition to correcting the isolation of the primary PLC, a cold, environmentally isolated backup PLC is being worked with our vendors. We will deploy this as rapidly as possible.”
Other steps include adding more load balancing, a better alarm system, and “the capability to recover volumes directly on the EBS servers upon restoration of power, without having to move the data off of those servers.”
Despite all the problems resulting from the outage, there’s some good news for those effected. Amazon is providing customers with credit for ten days. It’s also worth noting that Microsoft’s BPOS also faced downtime in the same area and at the same time as AWS did.
This isn’t the first major cloud outage to effect Amazon Web Services. Earlier this year in April, AWS faced downtime that resulted after an upgrade on the cloud service’s system went wrong. The problem apparently resulted from an incorrect traffic shift. To view the report by AWS, visit: http://aws.amazon.com/message/2329B7/