SCOM Installer Failure with RC4 Protocol Disabled

I need to start this by tipping my hat to a couple colleagues, Louise Willis for pointing me to Ryan Christman, who dealt with the same issue about a month prior and was able to save me a support call.

To set the stage, we were doing a SCOM 2016 install on a hardened Server 2016 OS with SQL 2016 running in the background.  I want to emphasize that the OS was hardened, so those of you doing SCOM installs in a higher security environment will likely face this issue. There’s not much out there on the subject. The install failed at the account validation section with the UI stating that the run as accounts for all four SCOM accounts could not be validated. Switching to local system also failed for what it was worth. The following errors were prevalent in the OpsMgrSetupWizard log:

Error:     :GetCrackNameResult() DS_NAME_RESULT_ITEM crack failed with error = DS_NAME_ERROR_NOT_FOUND

Error:     :ValidateEssentialsAdministratorAccount() failed to crack NT4 format.

Info:      :ValidateEssentialsAdministratorAccount() Try to crack account with directory searcher.

Info:      :No need to validate Data Reader and Data Writer are the same as the Management Group.

Switching to the SCOM 2012 R2 media produced the same results. There really wasn’t much in terms of public documentation on the issue. I was aware of some issues with the SCOM installer failing with TLS shut off, but this didn’t appear to be the issue here, as enabling TLS did not allow us to proceed. Perhaps that was due to a tatooed registry GPO or something like that, but the culprit ultimately ended up being RC4.

The registry key in question is HLKM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\Kerberos\Parameters\

The Dword Value is “SupportedEncryptionTypes” which needs to be set to a decimal value of 2147483644

Via GPO, this can be addressed by adding RC 4 to the following GPO setting:

Computer Configuration >> Windows Settings >> Security Settings >> Local Policies >> Security Options >> “Network security: Configure encryption types allowed for Kerberos”  If RC 4 is missing here and this setting is enabled, you will want to change it.

There are a couple things worth noting about this. This appears to only affect the install. You can back this out post install. It’s an issue with the installer itself that prevents account validation.

As well, this is a fix where two end points may need to be addressed. In speaking with Ryan, they were dealing with a hardened Domain Controller effectively blocking it. On our end, it was the SCOM 2016 server itself that had the offending policy set. If you come across this, you may need to address the domain controller servicing the authentication requests as well.

SCOM Agent Stuck in a Not Monitored State

I ran into a rather peculiar issue with a SCOM agent, and after speaking to Ainsley Blackmon in SCOM support, it was pretty clear that this hasn’t been seen before. Hopefully that means that it is something you won’t ever see, but it did have enough similarities to the TLS/Schannel issues that I’d occasionally observe with a SCOM agent that it’s worth writing it down, especially since all of the log information was rather cryptic about what was actually going on.

First, the scenario. It was straight forward. We were deploying SCOM 2016 as a part of a migration from a 2012 R2 environment. All systems checked in except for one. It remained stuck on “Not Monitored”. A quick trip to the system showed the standard authentication issues that you see in this issue. The connection was immediately being closed. The server was domain joined to the same domain as the management server, so there was nothing to troubleshoot with authentication. Reinstalling the agent, both manually and via console, and repairing the agent all gave a success, but the end result was the same. On the management server, I did dig up a rather cryptic error about the agent not having what it needs to open communication, and after some digging it was very obvious. It was missing it’s self-signed certificate.

For a background, that self signed cert is something SCOM uses to (as I understand it) encrypt communication between an agent and a management server so that things such as runas account passwords can be securely transmitted between them. You don’t need to do anything with this particular certificate. The health service will generate it when it starts. It’s self-signed, and it just sits in your certificate store. The below screenshot is an example of this from my 2012 environment. Note that in 2016, the folder name changes from “Operations Manager” (as shown below) to “Microsoft Monitoring Agent”.

image

In this particular agent’s case, the Microsoft Monitoring Agent folder and certificate were missing. That seemed odd. The logs weren’t very helpful on this issue either. There was nothing in the app/system log. There was, however, a bunch of 5061 audit failures in the security log. I could get a screenshot here, so I grabbed an example off the internet.

image

The major differences were the Operation (highlighted above). During a health service restart, this event would be shown as pictured above. During an agent install, the install would generate the same event, but the value in the Operation field was “Create Key”.

We eventually had to take to procmon to figure this one out, but ultimately, LSASS was getting denied access to a single folder in the OS:

C:\ProgramData\Microsoft\Crypto\Keys

In this case, the permissions on this folder were corrupted. Again, I don’t think this will be a common issue, but I suspect that with the move away from TLS, that these might pop up from time to time. For the record, this is what the permissions to that folder should be:

image