Once upon a time when you updated Windows you would test an update and leave it for a couple of weeks in case there were any reported problems, and then install to your production environment. This usually went well, and with the advent of virtualisation you always had the snapshot to doubly make sure that if anything went wrong you could roll back and guarantee the system was recovered to the point before you made the change. However it seems with Windows 2008, and particularly R2, rather more subtle issues can arise and while not actually causing you major problems the fix involved is pretty arduous – especially if you don’t notice the problem straight away.

The problem

I first noticed this when an update was applied to a Windows 2008 R2 server and appeared to go through fine, it certainly reported as being installed successfully and tests of this patch had showed no issues. However I noticed on the console after rebooting the server that the Updating screen never went away to give a CTRL+ALT+DEL screen, even though I could access the server remotely. Logging on via Remote Desktop seemed OK and the server was working, except I then noticed the Roles and Features section on Server Manager was reporting an error and wouldn’t allow me to modify these features.

I firstly rebooted the server (as you do) and sure enough the problem with the update screen cleared and a CTRL+ALT+DEL appeared on the server console. However Server Manager was still showing a problem, which meant it was investigation time.

Checking the following log (Event Viewer -> Applications and Services Logs -> Microsoft -> Windows -> ServerManager -> Operational) as instructed in the Server Manager error message brought forward the typically helpful MS error message:

Could not discover the state of the system. An unexpected exception was found: 

System.Runtime.InteropServices.COMException (0x800F0818): Exception from HRESULT: 0x800F0818

This is the point you realise your day is no longer going to be straightforward, so it was time to look for other people who had experienced this problem. After some searching (what would we do without Google?), the consensus view appeared to be that a package manifest had become corrupted during the application of one of the updates. However in all cases it appeared that trying to reinstall the package didn’t work because the corruption meant it couldn’t be uninstalled! So at this point I realised that it was time to get the lid off Windows and rummage around inside – not really something I relish.

The fix involves finding out the problem files in the package and then plucking them out of a downloaded replacement package and replacing them in the right place, and the process to do this is shown below.

The fix

To fix such a failed update, you will need the download the System Update Readiness Tool from Microsoft KB947821 (http://support.microsoft.com/kb/947821). Once you run it, and it successfully “installs”, you can read the log report it creates in C:WindowsLogsCBS. The file we’re interested in is CheckSUR.log. It might take some time for the app to finish its report, MS reports that it will usually take less than 15 minutes to run.

Note that you can run this tool multiple times. Each time, it will “install”, and create a new log file. The CheckSUR.persist.log contains all previous copies of the log file, so you can see the changes between each test.

The log file for my problem looked as follows:

=================================

Checking System Update Readiness.

Binary Version 6.1.7601.21645

Package Version 11.0

2011-04-12 11:33

Checking Windows Servicing Packages

Checking Package Manifests and Catalogs

(f)           CBS MUM Corrupt           0x00000000                servicingPackagesPackage_for_KB2393802_RTM~31bf3856ad364e35~amd64~~6.1.1.1.mum                   Expected file name Microsoft-Windows-Foundation-Package~31bf3856ad364e35~amd64~~6.1.7600.16385.mum does not match the actual file name

Checking Package Watchlist

Checking Component Watchlist

Checking Packages

Checking Component Store

Summary:

Seconds executed: 307

Found 1 errors

CBS MUM Corrupt Total count: 1

Unavailable repair files:

servicingpackagesPackage_for_KB2393802_RTM~31bf3856ad364e35~amd64~~6.1.1.1.mum

servicingpackagesPackage_for_KB2393802_RTM~31bf3856ad364e35~amd64~~6.1.1.1.cat

(w)         Unable to get system disk properties     0x0000045D        IOCTL_STORAGE_QUERY_PROPERTY      Disk Cache

 

From  this we can clearly see that the package with the problems is KB2393802, and indeed this was one of the latest to be installed. The “unavailable” files are actually there and the right size, so I do not know why they are reported as such.

Therefore this fix package needs to be downloaded again, which can sometimes be just case of going to the right support area on the Microsoft site, but in this case http://support.microsoft.com/kb/2393802 gave me lots of information about the patch but no way of downloading it again! Therefore I needed to go to the Microsoft Update Catalog at http://catalog.update.microsoft.com/v7/site/Search.aspx?q=kb2393802 (you may need to install the addon first if you have not visited this site before), and this then provided a list of packages for this update. It was then just a case of downloading the correct one for the server OS, Windows 2008 R2 in my case.

Once downloaded you will see a file with a .msu extension and this needs to be renamed as a .cab so you can extract files from it. Once I had extracted the archive, there was another .cab inside containing the package files (the file name should be pretty obvious, as in my case it was Windows6.1-KB2393802-x64.cab).

Now, there are a couple things that must be done. Remember, back in the log file, it was complaining about a corrupt manifest file?

The file it was complaining about was Package_for_KB2393802_RTM~31bf3856ad364e35~amd64~~6.1.1.1.mum. At the bottom of the log file, it was also complaining about some unavailable repair files. Find these “unavailable” files in this folder, and copy them to a separate directory. I also had to copy the “update” files (update.cat and update.mum), and these should be renamed to the correct name for this package.

In my case, it was Package_for_KB2393802~31bf3856ad364e35~amd64~~6.1.1.1.

Once I had done this, I had a folder with the following files:

/Package_for_KB2393802~31bf3856ad364e35~amd64~~6.1.1.1.cat

/Package_for_KB2393802~31bf3856ad364e35~amd64~~6.1.1.1.mum

/Package_for_KB2393802_RTM ~31bf3856ad364e35~amd64~~6.1.1.1.cat

/Package_for_KB2393802_RTM ~31bf3856ad364e35~amd64~~6.1.1.1.mum

These files have to be copied to %windir%servicingpackages, replacing any existing files in the destination directory. Note that these files will be locked to the SYSTEM user, so you’ll have to take ownership and change the permissions on them to be able to overwrite them.

With the right files replaced, you should be able to open the Server Manager and see the server roles and features without any issues, which in my case was exactly what happened. No restart is required, but obviously this was done outside prime hours and again with a snapshot!

You can then run the Update Readiness Tool again, to make sure that the packages are now all valid.

=================================

Checking System Update Readiness.

Binary Version 6.1.7601.21645

Package Version 11.0

2011-04-12 12:37

Checking Windows Servicing Packages

Checking Package Manifests and Catalogs

Checking Package Watchlist

Checking Component Watchlist

Checking Packages

Checking Component Store

Summary:

Seconds executed: 337

No errors detected

(w)         Unable to get system disk properties     0x0000045D        IOCTL_STORAGE_QUERY_PROPERTY      Disk Cache

So there you are – fixed. What a job though and to think this could happen to any server at any time in a farm of hundreds of servers. This has increased my resolve that updates to VMs at least will be applied to snapshotted VMs only and only after a quick check of the System Manager will I then consider the patches are fully installed and remove the snapshot. On a physical server this may not be so easy, but you can make sure you have a recovery point by taking a manual system restore before running the updates – just to make sure in case the auto system restore point the updates are supposed to take doesn’t work either.

However it does beg the question “Is the Windows Update system up to the job?”. One thing an admin doesn’t need is a patch saying it has installed and then breaking something, not because the patch is faulty, but because it corrupted during the update and didn’t report a failure – not clever.

Hopefully this problem will not be common, but at least it is good to know there is a way out if you discover the error after users have been let back on and the system is returned to production use.