d
Mar 06, 2012

Why would "leap day" take down Windows Azure?

As many of us know, Azure had a significant service outage on Feb. 29th. Haven't we heard this tune before? I seem to recall a little something about 12/31/99 and y2k. I would assume Microsoft was using a boolean calendar for its software. That's what I don't get. How would it matter whether it was leap year or not if they were using a boolean calendar?

d
03/07/2012
It is somewhat baffling that after all the Y2K hysteria, we would have a calendar related failure 12 years later. Apparently, this time it was caused by SSL certificates that were valid for a year....and by "a year" that meant 365 days. Unfortunately for some Azure customers, the number of days in a leap year is not 365, it is 366. Oops. It will be interesting to see if something similar happens four years from now, or if the lesson was learned.
03/06/2012
Apparently the leap year date bug prevented the systems from knowing the correct time. See this article for details.

Yes, Microsoft Azure Was Downed By Leap-Year Bug
http://www.wired.com/wiredenterprise/2012/03/azure-leap-year-bug/

"Microsoft has confirmed that Wednesday’s Windows Azure outage that left some customers in the dark for more than 12 hours was the result of a software bug triggered by the Feb. 29 leap-year date that prevented systems from calculating the correct time.

In a post, Azure lead engineer Bill Laing said his team was able to put a fix in place that restored service to most customers around 3 a.m. PST on Wednesday, a little more than nine hours after it became aware of the issue. In a follow-up bulletin, he promised to provide a fuller post-mortem on the root cause soon. Point-of-sale terminals in New Zealand supermarkets were also reportedly bitten by leap-year bugs."
Answer this