Amazon claims that the shutdown on Tuesday of over one million websites who use the company’s web-hosting service was due to an employee “typo.”
“At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process,” explained Amazon Web Services in a post on their website. “Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”
“We are making several changes as a result of this operational event. While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly,” they continued. “We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level. This will prevent an incorrect input from triggering a similar event in the future. We are also auditing our other operational tools to ensure we have similar safety checks.”
Amazon finished off their post apologizing to those who were affected by the outage.
“Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses,” Amazon concluded. “We will do everything we can to learn from this event and use it to improve our availability even further.”
The outage on Tuesday affected several large sites, including Quora, Trello, Wix, Snap, and Alexa.
Some government agencies were also hit by the unexpected shutdown, as well as Isitdownrightnow.com — a service used to see what other sites are down.