The Register has been reporting the outage at a major London data centre. The Telecity outage has impacted a range of VoIP firms’ services, Amazon and its Direct Connect service. The Register reports that ‘both primary and backup power supplies went down, potentially affecting thousands of customers’, this despite Telecity claiming to provide ‘Infrastructure you can rely on. Resilient and always available’.
This outage demonstrates that we cannot be complacent about the resilience of back-up provision. We have previously blogged about power failures at the Royal Free Hospital and Peterborough District Hospital, highlighting that losing both mains power and failure of backup generators to work as planned are both very unlikely events so the likelihood of both occurring must be miniscule. The point still remains that for planning it is not the likelihood of specific events, each of which we can dismiss as being highly improbable, but the likelihood of something unexpected happening. We cannot predict precisely what will happen, or when; but it is safe to assume that we will continue to be surprised by ‘unlikely’ events in the future.
Things can and do wrong, but there is a limit to the degree of protection that can be installed, particularly when contrasting the costs vs the potential losses. Power loss incidents serve to demonstrate the fallibility of back-up power, but they also demonstrate the necessity to plan and train effectively to be able to implement alternate working procedures, effective incident management and speedy communications.
Written by Helen Molyneux