Skip to Main Content

INFORMATION FOR

YSM Websites and Systems Were Down for Two Hours Today

November 05, 2017
by Justin Fansler

At approximately 2:45pm today, Microsoft's cloud service in the North Central Region suffered a critical outage that impacted the School of Medicine's websites and systems until approximately 4:45pm.

We originally chose this region due to its relieve safety from natural disasters; however, over the summer our team also began implementing a plan to duplicate data infrastructure in other regions around the world.

We began setting up redundant infrastructure in another United States region as well as in Europe. Our priority was a reliable fallback that would prevent our public-facing websites from going down, with secondary priority given to the backend systems used to edit those websites.

This afternoon when Microsoft's cloud service (Azure) began experiencing outages in the North Central Region, traffic to our websites should have been rerouted to the secondary data center in the United States and then Europe. However, the issues went beyond Microsoft and were a result of a problem in an underlying Internet provider that affected the Internet in the U.S. and other parts of the world. As a result the underlying fallback between regions unexpectedly failed, resulting in widespread outages for such sites and providers as Snapchat, Facebook, and Comcast Internet.

We are working with Microsoft engineers to better understand the issues and how this can be prevented from happening again. In the meantime, our team is assessing other options to prevent future outages.

Submitted by Justin Fansler on November 07, 2017