The below is what I call an “after action report.” This blog post summarizes work that has been done over the last week in a more narrative form than our usual technical documentation employs.
Greetings! My name is Matthew Stublefield and I am the Lab Support Administrator here at Missouri State University. My primary responsibility is to oversee the staff and facilities of the Computer Services Open-Access Computer Labs in Cheek, Glass, and the second floor of Meyer Library, and I also provide support for other academic spaces on the Springfield campus. In addition, I run our Windows Server Update Services (WSUS), which provides Microsoft Updates for campus computers, and I wanted to share some of the work we’ve been doing over the last week to diagnose and remediate a Microsoft Update that negatively affected Microsoft Outlook 2013 for staff and faculty.
On Saturday, November 16th, I received an email that some faculty and staff users of Microsoft Outlook 2013 were having trouble accessing some features of their email. Specifically, they were receiving errors when they attempted to set an automatic out-of-office reply, or check free/busy information for a person when creating an appointment, or view a shared calendar. The errors were either “server unavailable” or “no connection,” but if you used an older version of Microsoft Outlook, like Outlook 2010, or performed any of these functions through the web at http://bearmail.missouristate.edu, then everything worked fine.
Microsoft released updates to Microsoft Outlook on Tuesday, November 12th, so it seemed likely that one of those was the culprit. At Missouri State University, we use WSUS to approve Microsoft Updates to a limited set of machines for testing before we roll them out to the rest of the campus, and then we use a tiered distribution model to help prevent problems from becoming too widespread. We prioritize security and critical updates, testing and approving those within the first week after Patch Tuesday (which in this case was November 12th), and then begin work on general Microsoft Office and Microsoft Windows updates the following week with a slower approval cycle. This way, if a problem doesn’t come up in testing, hopefully it will show up in one of the earlier phases of deployment before the update reaches the entire campus.
In this case, the problem with Microsoft Outlook 2013 didn’t surface until after the security updates were approved for the entire campus. The good news is that, because of our tiered approval and distribution system, we had not yet approved and distributed Microsoft Office updates
I began testing on Saturday morning by removing updates one-by-one and seeing if Outlook started working. After checking a couple of dozen updates over the course of several hours, though, I was unable to identify which was causing Outlook to not work. What was even more confusing was that Outlook wasn’t broken 100% of the time–sometimes it work would work, then stop working for a while, and then start working again. The problem was intermittent, with no rhyme or reason to be detected.
After several hours, and with my Outlook 2013 working just fine (though I knew it would break again eventually), I bundled up my notes and screenshots and sent them to the Enterprise Systems and Operations group, who began work on Monday by both troubleshooting updates to Microsoft Outlook and by opening a support case with Microsoft. I also emailed a patch management distribution list, which is comprised of IT professionals from around the world, to begin discussing the problem and find out how others were dealing with it.
By Tuesday afternoon, we were fairly confident that Microsoft Update KB2837618 was involved. In addition, this update corrupted the Microsoft Outlook 2013 profile, which means the email profile of the mailbox had to be removed and recreated in the local client to start working again. But KB2837618 couldn’t be the sole contributor because the problem continued to occur intermittently even after it was removed.
Tuesday night, Microsoft confirmed that KB2837618 was the problem, but I was convinced it couldn’t be the entirety of the problem because Outlook 2013 still stopped working intermittently with the update removed and the profile rebuilt. I continued researching and working on the problem Tuesday night and Wednesday, and on Wednesday morning one of the Systems Administrators in Enterprise Systems and Operations and I identified another Microsoft update within minutes of each other. This second update was KB2837643, which I blocked from WSUS no more than a minute before I received an email from Enterprise Systems about it. We were definitely on the right track.
KB2837643 is a roll-up update to Microsoft Outlook, which means it is a single update that contains a number of other updates. This makes version control and patch testing/approval difficult because we can’t test each patch individually, and is a large part of why this problem was so hard to track down. KB2837643 contains KB2837618, along with a dozen other updates, and contributes to the errors we were seeing with automatic replies, free/busy information, and shared calendars. We deduced that both updates must be uninstalled–if you have one or the other installed, you’ll have intermittent access to those resources, and if you have both installed, it won’t work at all. In addition, both updates cause a profile corruption, requiring the Microsoft Outlook 2013 profile to be recreated.
This profile corruption appears to happen at the time that the autoconfiguration.xml on the Exchange 2007 server is read, which means that creating a new profile and launching Outlook 2013 while these updates are installed will let Outlook work as expected for anywhere between a few seconds and a few minutes. And due to the intermittent nature of the problem, I have hypothesized that the autoconfiguration.xml is setting something on the client that breaks the connection to those three services, but that the settings get purged from the cache eventually. At that time, everything would start working again until the next sync with the autoconfiguration.xml, at which time it would break again. It is possible that removing the updates and letting Outlook 2013 run for a while would work, but we’ve been recreating the profile as a fast and relatively painless fix.
Enterprise Systems and Operations completed their server-side verification with Microsoft support on Thursday morning, and Microsoft signed off on our servers being setup right–they couldn’t find anything with our Exchange 2007 environment that would contribute to this problem. At that time, the support call was passed to me and I shared with the Microsoft support person what I had deduced so far:
- Uninstalling KB2837618 and KB2837643 followed by recreating the Outlook profile appears to fix the problem with Automatic Replies, Free/Busy messages, and Shared Calendaring.
- Both updates have to be uninstalled. We’ve used multiple computers to leave one and uninstall the other, but both have to go. Also of note, KB2837643 is the roll-up that installs KB2837618.
- Both updates will likely appear twice in the list of Installed Updates. If KB2837643, the roll-up, is still installed, it may silently re-install KB2837618 because the approval for KB2837618 is actually granted by the approval to KB2837643. This despite being able to explicitly unapprove KB2837618 in WSUS. It’s KB2837643 that appears to matter.
- Uninstalling KB2837618 over the last several days resulted in Automatic Replies, Free/Busy messages, and Shared Calendaring working intermittently. So far, uninstalling KB2837643 in addition to KB2837618 has resulted in a permanent solution.
- This only applies to staff and faculty accounts. It appears that Outlook 2013 with these two updates installed can still connect to Office 365 for Students and manage Automatic Replies, Free/Busy messages, and Shared Calendaring with no problems. It is only when connecting to Missouri State University Exchange and Autoconfigure that Outlook 2013 with these updates has problems.
The Microsoft support person was unaware that KB2837643 was a contributing factor, but added that KB2885093 also needs to be uninstalled. We hadn’t yet approved KB2885093 for the campus, though, so it wasn’t a factor in our environment. She told me that she would take the above information, as well as the testing notes and screenshots I sent her, to Microsoft’s debugging team for review, and then asked me what I wanted them to do. I asked for one of two things:
- For KB2837618 and KB2837643 to be patched so they don’t cause a problem between Outlook 2013 and Exchange 2007, or
- For Exchange 2007 to be patched so the security updates to Outlook 2013 can be applied
Thursday night, Microsoft wrote us that they had investigated KB2837643 in regards to this problem and discovered that it was, indeed, a contributing factor. They also confirmed that both updates must be uninstalled, because only uninstalling one will result in intermittent service. Last, this solution is really a temporary workaround. The email concluded with news that this problem will be patched and resolved in Service Pack 1 for Microsoft Outlook 2013. We might see Outlook 2013 SP1 in the first quarter of 2014.
Since then, we have learned that Outlook 2013, when connecting to Office 365 for Students over IMAP connections, has similar problems to what we saw when using Outlook 2013 in our Exchange environment.
Many thanks to Moris Montejo in User Support, who helped me troubleshoot this throughout the day on Saturday the 16th, and to the folks in Enterprise Systems who worked diligently on this and initiated the support ticket with Microsoft: Pat Day, Lynn Dickison, Chris Rees, Joe Arens, and Brad Walters. In addition, Jeremy Blades has used SCCM (Microsoft System Center Configuration Manager) to identify which computers on campus have these updates installed and has begun working on a way to remove them proactively so people don’t run into problems. I also greatly appreciate the folks on the patch management DL to which I subscribe for sharing their feedback and experiences. Identifying the problem and resolving it has been a group effort, and I really appreciate everyone who helped.