Incident report hosting outages 21 May 2012

On Monday May 21, 2012, our hosting platform was hit by two separate incidents, which were actually completely unrelated.

At 09:19 local time (03.19AM EDT), a problem occurred in one of the primary database servers, as a result of a routine task, importing a database. The effect was that this database server was not responding as quickly as it should, which in turn caused problems for the so called ‘delivery devices’. They could no longer communicate with the database server and started to crash. Because the primary database server was still functioning well at the hardware level, the problem was initially not noticed by our automated monitoring, but when the delivery devices started to get in trouble, technicians on duty were alerted. They initiated a “fail over” to the secondary database server manually, thus correcting the problem. After the primary database server was reset, the normal situation was reinstated. This incident lasted for a total of about 4 minutes and ended at 09:23 local time (03.23AM EDT).

At 23:42 local time (5.42PM EDT), a problem occurred in one of the so called ‘routers’, these are hardware devices in our data center. This resulted in a total outage of all network traffic inside the data center. The automated fail-over to the backup router did not happen, thus prompting a technician on site to manually initiate the fail-over procedure at 23.55 local time. This outage lasted for a total of about 14 minutes and ended at 23:56 (5.46PM EDT). We are investigating why the problem in the router was not noticed by monitoring software.

We apologize for the inconvenience these two outages must have caused, especially since they happened in such a short space of time.