AutoPilot, Timeouts, and PowerShell Scripting

I don’t normally write about autopilot, and I’m not going to try and crowd the space that Michael Niehaus has done an excellent job documenting, but when I find myself spending large amounts of time troubleshooting something that is ultimately not documented or not well documented, it’s probably worth a quick post…

The scenario was pretty straight forward. I had to automate the configuration of some OEM images for a customer that involved configuration profiles, compliance policies, app installs, and PowerShell scripts. What I found during the initial build was that there were a lot of time outs. Troubleshooting wasn’t much better as the logs I was able to obtain weren’t showing any errors. It was as if AutoPilot had simply stopped… All I saw from the enrollment status page was an 0x800705B4 error indicating that it had exceeded the time limit configured by the administrator.

In order to troubleshoot this, I created a separate group and added a test device to it and slowly added my targeted items. It eventually came down to a couple of PowerShell scripts that removed windows features that required a reboot (internet explorer and internet printing if you’re curious). For whatever reason, even with the –norestart switch to the disable-windowsoptionalfeature command AutoPilot simply didn’t know what to do with it. I would note that removing optional features that didn’t require reboots processed normally. I’m not going to pretend to be an autopilot expert, but I’m guessing anything that could potentially force a restart could be in play here. So if there’s a moral to that story, pay close attention to PowerShell scripts where the result may require reboots.

Using Cloud Shell to Fix a Dead VM

So for the first time, I get to blog about an Azure related experience that might be worth a read. As a MSFT employee, I have a small lab in Azure that I use to test out changes to Security Monitoring. That lab happens to have an offline CA with an enterprise subordinate CA allowing me to play around with ADCS.  As it happens, my enterprise CA’s certificate was due to expire and so I went to fire up my root CA and get that process started. I was a bit surprised to find out that I had left this on as I normally turn it off, but when I RDPed to it, I got nothing. Ping and any other attempt to connect got nothing… as did a reboot… nothing. I was locked out of this VM.

I initially followed the advice of a colleague and deleted the VM and attached the drive as a data disk to another server… Looking at the event logs, this running server hadn’t logged a thing since February… well that’s not. I did a check disk on the drive and attempted to re-attach and reboot… and nothing… That left me with a completely dead root CA that I could not access to save my life… I did some checking around and stumbled on this cool feature. It wasn’t the only one I found detailing the steps, but this one had the most correct information.

Step 1, turn on cloud shell.

That’s pretty straight forward, from your Azure console, simply click the shell button:


This is going to create a separate resource group if that matters with its own storage account.

Step 2, and missing from most of the guides I found, is to install the repair commands.

They aren’t installed by default, so if you find the wrong guide, you’re going to get an error saying the command doesn’t exist. Run the following line:

az extension add -n vm-repair

Step 3, create a repair VM.

This is pretty straight forward from the guide. What it will do is create a new resource group with a single virtual machine named repair-<name of dead VM>. It will copy the OS disk of the dead VM (this will be located in the dead VM’s resource group) and attach it as a data disk to your repair VM. You can RDP to this VM if you wanted to as it will have a public IP and RDP access, so if you want to do some sleuthing once you create this, you can.

az vm repair create –g <RG of Dead VM> –n <VM Name of Dead VM> –repair-username <Local admin name of your choosing> –repair-password <local admin password of your choosing> –verbose

Step 4, start doing repairs.

The run IDs are not well documented, and I’d add that the example they gave you doesn’t really do anything…. So I’d start by doing the list-scripts run ID:

az vm repair list-scripts

This was quite useful, as showed me all of the windows and linux options available. This is what they are as of this post:


I highlighted the useful ones… in my case, I’m pretty sure the sfc script is the one I needed to run, but I also did the bcdedit script as well:

az vm repair run -g <RG of Dead VM> –n <VM Name of Dead VM> –run-on-repair –run-id win-sfc-sf-corruption –verbose

There was a bit of a surprise here in this step… SFC takes a while to run, and your cloud shell only stays connected for 20 minutes. I didn’t see a way of changing that config, so this times out while the script is running, defeating the purpose of watching it in the console. Fortunately, that does not kill the command. I did see the system file checker running in task manager on the repair VM even after the shell timed out. I did not find an easy way to adjust that time out (admittedly, I didn’t spend much time looking). My only real way of knowing that it finished was to try and run another run ID… that fails as long as the current task is running. That’s not ideal, but it works.

Repeat for any additional run-id that you want to run changing only the highlighted piece.

Step 5, restore your VM.

This also didn’t work quite right. It failed on the first restore attempt. You can manually do this by detaching the data drive to the repair VM and then swapping the OS drive on the dead VM with the new repaired disk… I was eventually able to get this working, which was nice because it cleaned up the repair resource group along with it… That’s kind of important if you care managing costs, which I’m guessing most people do. It also swaps the OS disk for you on the dead VM, so the old OS disk will be detached and you’ll be booting off of the disk copy:

az vm repair restore -g <RG of Dead VM> –n <VM Name of Dead VM> –verbose

One last thing though, it does leave the old OS drive behind… so back to that cost thing… once you get your restored VM up and running, you may want to get rid of the old disk.

Anyways, after that I had a bootable RootCA… so crisis averted.

SCOM 2019 and Later Versions of 2016 No Longer Need FIPS Configuration

I’m a bit surprised on this as our documentation does not imply that this is the case, and I know personally I’ve had to setup FIPS for SCOM 2016 on numerous occasions, but I ran into a couple of issues recently with newer versions of SCOM 2016 when configured for FIPS were working in spite of what the instructions said needed to be done.

I decided to test this in my lab and configure FIPS on my Web Console server without going through the process I detailed on my blog. To my surprise, the console continues to work. I did get an authentication screen asking for credentials at first, which doesn’t always happen, so that may be worth watching. It also seems that later versions of 2016 will work with FIPS on as well. I’m not sure when that transition was made.

Security Monitoring: New Account Lockout Report

This was a customer request. There’s not much to it, but I had a customer ask if they could get an account lockout report displaying locked out accounts. I’ve added a collection rule and a report that does this for them. That is straight forward. There’s also a report that will list the accounts that locked out, the source of the lockout, and the date the account was locked out. This will be in the 1.8.x release of the MP. As always, any questions, feedback, or feature requests, feel free to reach out to me on linked in and I will gladly do what I can to improve this product.

It looks like this:


Cyber-Security for the IT Professional: Part 4 Asking the Right Questions and Implementing Easy Wins

You can find Part 1 of this series here.

You can find Part 2 of this series here.

You can find Part 3 of this series here.

I want to start by simply noting that I don’t want to give the impression that mitigating against exploits is a bad thing. I’d note though that technical exploits typically fall on the vendor to mitigate against, and to that extent, understanding their formal guidance for patching and design for these is critical. As such, items such as a robust patching process is a given. Simple mitigations against PtH are also a given since there’s nothing in place from a technology standpoint that can stop it on Server 2008 or 2012 R1. A lot of mitigations to technical vulnerabilities are as simple as staying current, which IT organizations often struggle to do.

That said, the anatomy of a modern attack follows a fairly straight forward plan:

  1. Compromise an asset via some means (usually, but not exclusively, phishing).
  2. Add some sort of persistence mechanism to allow ease of returning to the asset.
  3. Harvest any credentials that they can.
  4. Use those credentials on other systems and repeat step 2 and 3 as needed.
  5. Continue until you get the credentials that you need (i.e. administrators).
  6. Go do what they initially set to do (steal your data, ransomware, whatever)

Let’s revisit our assumptions for a second. Assumed breach notes that we cannot prevent step 1. It’s going to happen. I’m not saying don’t educate end users, but I am saying that ultimately we need to prepare well beyond step 1. We have to keep the attackers on that system and that system alone. Keep in mind that most attackers are organizations with a limited amount of capital, just like our organizations. We cannot necessarily stop them from doing anything, but what we can do cheaply and easily as a first step to securing your environment is to make it very expensive for them to do what they set out to do. Keep that that in mind, because with the right measures in place, they won’t waste their time on you. I’ll admit that if the organization is determined and has deep enough pockets, you will likely have a long road ahead, but this is also a rare scenario. Commoditizing a zero day vulnerability, for example, is very expensive for them to do… but a nation state, for instance, could have the pockets to do it if they thought it would achieve their goal. The average attacker, however, will not be willing to undertake said costs but instead will be quite happy to continue exploiting the same vulnerabilities.

They key to stopping a bad guy is to address the design vulnerabilities listed in part 2 of this series. Restricting movement at any tier can be implemented cheaply with relative ease:

  • Randomize your local admin passwords using a tool like LAPS. This way the attacker can no longer reuse the local admin hash. There are also GPO settings that can be configured to restrict local admin usage so that it’s not being used across the network.
  • Block all inbound connections at the local firewall and only allow administrative connections from designated administrative addresses (i.e your PAWS).

Dealing with administrative credentials is a bit harder to do, but there are still some things we can implement that can greatly improve security posture.

  • With all accounts, we really need to identify which administrative accounts we need and make a conscious effort to limit them in a least privilege setting. There should not, for instance, be very many people that need domain admin accounts. Very few accounts should need to be admins across the entire data tier (T1) either. A few administrators might need those rights and perhaps a deployment account for software deployment, but by and large, there shouldn’t be many accounts with these types of needs. For the most part service accounts should not need these rights, and accounts that need to run in memory should only have rights to log on to the systems that need those accounts. A good strategy here will have a measured affect on reducing the attack surface of most organizations as the bad guys have fewer accounts that they can compromise.  This can be a political issue across many companies and it’s a training issue as well, but if security is a focus, this is a great place to start, and this one needs to start with management.
  • As well, immediately implementing a PAW structure is another thing that can prevent lateral movement. It’s worth noting that you don’t necessarily need a separate PAW for each tier (one PAW can be configured to boot from multiple VHDs for instance), but a dedicated PAW is a very good idea. administration of any tier should be done from the tier specific PAW. That allows you to restrict administration from the IP addresses of the PAW machines and prevent administrator logons from the Tier 2 environment. This PAW should be hardened with no productivity applications on it. Internet should be restricted if at all possible and ideally the devices should be built using a known good media (i.e. download the media from the vendor site and validate the hashes, do not grab it off of your software share). At this point, you’ve split your tiers cleanly. If a bad guy is on a T2 system, even if a T2 admin signs on to it, those credentials won’t be useful because they can only reuse them from a PAW. If that is hardened properly, they won’t be able to get into it. They’ll also never be able to scrape a T1 or T0 cred off of said asset because those creds are also restricted to their PAW’s location.

Those are the easy wins. Long term, implementing tools such as Just in Time administration, credential guard, identity management, and more advanced monitoring will all prove to be beneficial to various degrees. Microsoft has plenty of guidance on this subject and while getting everything in place in a short time is a tall order, you can focus on the items that mitigate the most pressing vulnerabilities facing your organization. The first steps are not full proof obviously, as attackers can still try other things from that compromised machine, but their ability to move will be greatly restricted, so much so that they are much more likely to give up. So what do we do to continue hardening our environment? The question I think we need to ask is probably more philosophical. What problem are we attempting to solve? Are we closing a vulnerability that will make it difficult for attackers to move through the environment? Are we simply hardening a particular system? Often times security purchases or decisions are made to solve a issues around a particular exploit. Pass the hash is a good example of this. It’s without question one of the most exploited technical vulnerabilities on the market today. While there is some sense to mitigating it, I would argue that resources would be better dedicated towards eliminating credential leakage and unrestricted movement. Ultimately, if an organization were to take care of those issues, mitigating pass the hash in particular is not nearly as important because the attacker won’t have as many credentials to steal, nor will they be able to easily move in your environment with the ones they’ve acquired.

It’s also worth asking how common a particular exploit is before you mitigate against it. There are probably tens of thousands (if not more) of exploits out there, many are simply theoretical. The real question that needs to be answered is whether or not said exploit is in use. A commonly used exploit makes much more sense to mitigate against because it has effectively been commoditized, and the more of those that are mitigated will force an attacker to go elsewhere.. Rushing to stop a zero day my have merits if you’re particularly vulnerable to it or have adversaries with deep enough pockets to exploit it, but not surprisingly, attackers are usually exploiting technical vulnerabilities that have had patches available for them for years which means your patching strategy is something that should be heavily scrutinized.

Another place I would start is simply asking which (or all) of these design vulnerabilities that your particular organization faces and whether or not the solution is addressing them. I’m going to pick on admin password randomization software for a minute (note, not local admin randomization). Will something like this make it harder to brute force your passwords? Yes. But how often have we seen a bad guy on the inside of an organization brute forcing passwords? It doesn’t happen. Attackers don’t need to brute force your password. They have a number of means to get your password without guessing it. If you’ve seen password randomization systems in use, you’ll understand some of the other problems as well. What typically happens when they get implemented? I’ve seen this before, but typically an administrator opens up notepad and pastes their password in clear text on to their machine so that they can use it as needed, since memorizing said password is usually out of the question. Attackers can still get that info if they want it. They can still install a key logger and get it in that capacity. This type of approach fails in large part because the password still exists on Tier 2. That would be true, I might add even without these systems if now PAWS are not in use. The bottom line is that password randomization systems don’t secure Tier 2, which means you’re still exposed to the underlying design vulnerability that allows for the bad guys to steal your credentials.

The point being is that we need to do a better job asking questions and understanding where we are exposed. We aren’t going to be able to mitigate against every vulnerability, but understanding the important vulnerabilities and how to mitigate them will do wonders in improving our posture.

Cyber-Security for the IT Professional: Part 3 Design around Vulnerabilities not exploits

You can find Part 1 of this series here.

You can find Part 2 of this series here.

The title of this piece is fairly self explanatory, but how that works in practice is much more difficult. I’d like to say that this would come naturally, but it doesn’t. Right now, pass the hash is all the rage. It got that status for good reason as this is the exploit that attackers have most commonly used, so that extent, there’s definitely good reason to mitigate against it. But do understand that pass the hash is an exploit to a technical vulnerability, but mitigating it does not necessarily mitigate the underlying design vulnerability. Some form of PtH will always portend to be a risk because people are not going to want to reauthenticate every time a token expires or every time they open a new app. Eliminating single sign on will destroy the usability of most technology. Like most exploits, pass the hash requires administrative access to a system. I’m not bringing this up to recommend that users not be allowed to be admins (remember that there hundreds if not thousands of application/OS layer flaws that can allow a bad guy to get administrative rights to a system). I’m bringing it up to remind us that once an attacker has access to your system, they can do anything they want to it; this includes self-elevation.

If we could completely eliminate pass the hash with a magic pill, a bad guy would simply adjust. They could install a key logger, for instance, and get credentials that way if organizations continue to practice poor credential hygiene. One of Microsoft’s initial mitigations to PtH was Credential Guard. It didn’t take long however for the author of Mimikatz to turn around and add key logging capabilities to his commercially available hacking/security tool. This shouldn’t at all be surprising. So while pass the hash gets all of the press within IT world, we have to remember that it is nothing more than an exploit to an underlying technical AND design vulnerability. In this particular case PtH exploits both a technical vulnerability across windows and cross platform systems, but more importantly a design vulnerability related to mixing credentials that is baked into the design of nearly every IT organization. In my pre-Microsoft days, every single environment I worked in allowed me to do my administration from my desktop. Even though they all followed best practices using separate administrative accounts, I would do what every other IT professional does or has done in their careers: I would check my email and use productivity applications on the same system that I administered servers and domain controllers. That means my admin credentials were exposed if a bad guy happened to be on it.  If I was the one clicking on the phishing email, my org would compromised within minutes. Even if I was a perfect admin and never clicked on something containing malicious code, all an attacker had to do is find their way too my system to harvest my administrative credentials. This is the impetus behind the push for PAWS. They isolate administrative credentials and allow you to isolate where administrative credentials can be used. They’re hardened too with a much lower attack surface than a typical productivity device. Productivity systems are the hardest to secure. They have end users who, as I mentioned earlier, have varying levels of skill and education. Someone is bound to allow the bad guy into your environment. From there, the bad guys follow the same script: move and collect.

IT organizations have responded to attackers in all the wrong way… Here are some examples:

  • Obscuring server naming convention (your attackers probably aren’t in the US, don’t necessarily speak English, and most of what they do harvests what they need using scripts. It doesn’t even slow them down).
  • Randomizing Tier 1 and Tier 0 passwords and storing them in a system that forces an administrator to look them up (the passwords are still used on Tier 2 devices)
  • Not randomizing local admin passwords or restricting their usage… Note LAPS is free and easy to implement. Everyone should be doing this. Every system has a local admin password not associated with the domain. They are generally standardized for ease of management, but this also provides attackers with the ability to quickly move through your environment, as they can harvest the password hashes.
  • Increasing the complexity requirements to passwords and requiring them to be changed more often (this encourages bad practice such as writing down passwords).
  • Focusing design around mitigating specific exploits.

We’ve all seen this to some extent or another. Every organization does it, and that’s ultimately because we don’t have a good grasp of the vulnerabilities we’re looking to mitigate or an understanding of how an attack works. Password strategies in organizations in particular can be frustrating, and never address credential leakage across tiers. They also don’t address free movement across the environment, and ultimately they provide the illusion of security without any improvement in an organization’s security posture. It’s why most security professionals are reminding people that the concept of a password has created a system of use that is difficult for users to remember and easy for computers to guess (side note, this is why MSFT is moving in passwordless direction focusing on MFA using certificates and biometrics). Likewise, this type of approach will usually lead to a security measure in place that accomplishes nothing but making things more difficult to use… and sadly, that will be because someone read an article somewhere about an exploit related to a hardware or software flaw that attackers can take advantage of. That’s fine to a point, but these all miss the fundamental vulnerabilities that are baked into nearly every network. I think we’ve all likely seen that to some extent. It’s not hard to do, and these problems are limited to password strategies.

Ultimately, we need to look at our networks the way an attacker does. That means looking at how we’re protecting our credentials, as those are what they are after. Many in the cyber world have lost focus on that, myself included sometimes as it is very easy to focus in an technical exploits while overlooking the fact that a bad guy can still get what because we never mitigated the underlying vulnerability. The moral of this piece is simple. Design around vulnerabilities… and make sure your designs don’t have vulnerabilities baked into them.

This brings me to my last point, which will be covered in the last piece in this series. As IT professionals, we need to look at Cyber a bit different and start asking the right questions.

Part 4 can be found here.

Cyber-Security for the IT Professional: Part 2 Sobering Statistics and Organizational Design. Why Identity is the New Perimeter.

You can find Part 1 of this series here.

I’m going to start this section with some rather sobering numbers. These should scare you whether you’re in the C-Suite or a day to day operator as they say a lot at how poorly we’ve secured our environments and why every week we hear about another big name company that’s in the news due to compromise. It’s worth noting that these are 2014 numbers so I wouldn’t consider them entirely accurate today. I did some quick research to find something a bit more current with no luck. From what I understand, the modern numbers are trending in the right direction, but are not significantly better.

  • The median amount of time an attacker will be in a compromised environment without detection is 146 days.
  • 81% of the time, it’s not the compromised organization that detects the attack, but someone else.
  • 60% of attacks, from initial penetration to Domain Admin rights, are accomplished in minutes.
  • The average cost per breach is $3.8M.

That’s bad. There’s no way to sugar coat those numbers, as by the time an attacker is discovered, they have already accomplished everything they set out to do. It also speaks to how poor typical design is for stopping an intruder. Most compromises are achieved in minutes, and the rest are typically finished in no more than a day or two, leaving an attacker 5 months on average to accomplish what they set out to do while being undetected.

It’s probably worth noting that the vulnerabilities that have allowed many of these data breaches to happen were built into IT networks from their inception, as the concept of a tiered model and protecting identity didn’t exist. Since most networks typically follow the same design and have the same needs, the vulnerabilities that attackers are ultimately using exist in nearly every corporate network. If I were to give a few design vulnerabilities that are not mitigated well or at all, it would be as follows.

  • Movement within an organization is largely unrestricted.
  • Administrative credential usage is also largely unrestricted.
  • There are way too many administrative credentials in most environments.

Understand that these vulnerabilities were effectively design vulnerabilities; and because of this, no matter what the security posture, your organization is likely exposed in some capacity to these vulnerabilities.

When networks were developed, there was little in terms of management software. That built the way for a basic directory infrastructure, but the management of said infrastructure focused a lot more on ease of use. Known vulnerabilities at that time had more to do with outside threats as the main entry point into your organization was typically through the external firewall or viruses that followed typical signature patterns. As such, networks were effectively built in a similar capacity to a medieval castle with high walls and perhaps an expansive moat with dangerous animals in them to prevent you from swimming it. We did something similar using expensive firewalls and IDS/IPS solutions (intrusion detection/intrusion prevention). That worked… for a while. But ultimately the threat actors evolved. In the mid-2000s we saw network based worms/viruses such as Nimda and Zotob (in some cases written by teenagers) which could take down entire networks in a matter of hours. As one of the managers on my team at that time put it, our networks were hard on the outside but soft and chewy on the inside. He wasn’t wrong… and the solution? Again… more firewalls. This time, it was a host based firewall which built a similar wall around each and every endpoint. That effectively mitigated those threats.  Attackers, however, continued to evolve (side note, please don’t construe this as me saying that firewalls aren’t the answer. They are still a very key part of any good cyber approach).

Modern attack still represents a similar story. However, instead of bored teenagers writing viruses or code to potentially penetrate an organization, your attackers are now sophisticated entities that operate in much the same capacity as a typical business. These organizations have their own C-suite, management infrastructure, and in most cases of effectively commoditized their malware. They might be using the latest copy of Mimikatz, but more likely they’ve written their own tools so that they have a signature unknown to traditional antivirus.… They are built specifically to go after your data. Sometimes, it’s even nation states who have much deeper pockets and can effectively throw an unlimited amount of resources at you until they get what you have. But the general premise behind an attack is much the same: all they need to do is penetrate one node and the vulnerabilities I listed above will make it very easy for them to get what they want. Much like artillery rendered the city wall obsolete, phishing scams and various other techniques have lessened the value of a firewall. Firewalls are certainly still useful for stopping denial of service attacks, stopping scripted methods to brute force your environment, and if configured properly can be used to restrict movement within an environment. However, all it takes is one person to click on a link they shouldn’t click on to give an attacker a footprint into your environment. This is why Microsoft operates under what is called an assumed breach model. We assume at any given time that some system in your organization is compromised. If an attacker has decided they want into your environment, preventing them from getting to this step is nearly impossible, and from there they have only one goal until they have compromised the environment: collect and reuse as many credentials as they can.

Since assumed breach should be anyone’s approach, we now need to look a lot closer into securing our identities and restricting movement. Why? The answer lies in the first piece in this series. Once an attacker has a system, they can do anything they want to it. An endpoint system is loaded with productivity applications, any of one of which can be exploited to give them full system access. This is true no matter how tightly an organization locks down a system. Once the bad guys have full system access on one computer, harvesting credentials through techniques like pass the hash is relatively easy. Even if said end user is only a low level administrative assistant or factory operator with little to no user rights, an attacker can still reuse those credentials and rapidly move through an organization until they get to a workstation that is running more powerful credentials. They can do this because user movement is typically not restricted to a specific set of endpoints. They can do this because the type of user logon is also rarely restricted (i.e, any user can do a network or RDP logon to another machine in most environments, regardless of need for this type of access). They can do this because communications between machines is not restricted. They can do this because local admin passwords are almost always the same. They can do this because the types of local admin logon (this should never be anything but interactive to the machine itself) are not restricted. And lastly, and probably the most important, they can do this because they will eventually make their way to a workstation where an administrator sits, and if they happen to be on an admin workstation when an administrator uses an administrative credential as opposed to a user credential, they now own that person’s access. The same premise, I’d add, is typically true of system administrative access. There’s virtually no restrictions on where that can come from and how it can move, and for various reasons, most organizations have too many overprovisioned accounts making it very easy for an attacker to find one.

If you ever decide to take the time to look at Microsoft’s security papers, you will see a common theme: Identity is the New Perimeter. This is even more true in today’s environment where BYOD (bring your own device) and IOT (Internet of Things) devices are incredibly prevalent and getting more and more usage in a corporate environment. There are simply too many end points for an organization to control. Attackers are quite capable of identifying who works at your organization. It’s all over social media or it can be deduced very easily. Trying to stop their access here is pretty much a waste of time. Attackers may rely presently on phishing, but remember that even if a magic cure was found for that tomorrow, they would soon have another way of establishing a toehold in your organization. Our success as cyber professionals does not depend on preventing these things, but instead it depends on making sure that the attackers have no options at their disposal once they get said toehold.

You can find part 3 here.

You can find part 4 here.

Cyber-Security for the IT Professional: Part 1 Terms and Assumptions

For those of you that do not know, I’ve been given the privilege to be a speaker at SCOMathon later this month focusing on SCOM and how it relates to Cyber-Security. As I was putting my presentation together for that conference, one thing that came to mind was that the average attendee is going to be a typical IT professional and not someone versed on the details of IT Security. That’s not meant to be disparaging in any way, but it simply acknowledges that IT is a pretty big field and that security represents only one component of the job description.  While we would all agree that security is a focus perhaps even the primary focus, the number of compromises we’ve observed over the last decade tells us that this is not something we do well from a practical standpoint. In my opinion this is in large part because we as professionals don’t necessarily know where to start. Sure, we understand the basics of things such as “don’t click on links you don’t trust”, “reset default passwords”, and “patch your systems regularly”, but this only covers a small surface of what it means to operate in a secure manner. When it comes to topics of specific premises, the anatomy of an attack, vulnerabilities, exploits, and where to allocate IT resources, that discussion gets much more confusing. The meanings of specific terms can be blended sometimes and often times the problem is not the technology itself, but how we designed it.  And then there is a noise component. The sheer volume of opportunities than attacker can take advantage of is somewhat mindboggling, and of course with every exploit, there’s also someone willing to sell you something designed to mitigate those specific risks, whether the solution is something your organization needs or not. 

As such, I’m going to make an attempt with this multi-part series to take the complex subject of cyber-security and boil it down into something that can be a bit more useful for the average IT person. My hope in doing this will be to shed light on some common design flaws that are usually the root causes behind a typical breach, as good design can make up for many of the flaws that will inevitably be found in any technological solution.

While it will be somewhat boring, for this first part we’re going to start with some terms and key assumptions. I’ll be using these terms somewhat frequently across this series; and as such, I think it’s very important that we are all on the same page as to what they mean, or at least how I define them:

Tier Model: This is really a series of terms, but the reality is that all IT organizations are generally broken into 3 tiers. Organizations can go in a bit more depth with this if needs be, but for the most part, the 3 tier system is true for all organizations. This is in all of our formal documentation, so it might not be new, but it should be defined in case you’re not familiar with it.

  • Tier 0 – this tier is the god tier so to speak. Users who have access to this tier effectively own all of the information technology of an organization. Logically speaking, we recognize that these are our Domain Administrators, but we often forget that any system that touches a domain controller also qualifies in that sense. The reason for that is that if that system is compromised, your attacker is now a domain admin, even if they haven’t compromised a specific DA account. SCOM or SCCM, for instance, can be a Tier 0 system if the Microsoft Monitoring Agent/SCCM client is installed on a domain controller. If either of those systems were compromised, your domain controllers will be as well. This will be true for your antivirus and configuration management systems as well. If it touches a DC in any way, it’s a Tier 0 system because at the end of the day, an administrator of this system has direct control over the Tier 0 environment whether they are a domain administrator or not (note: this is why we recommend separate instances of these kinds of systems if they are going to manage domain controllers). Tier 0 typically represents the ultimate goal for an attacker. Once they have this tier, they own your environment because they can get to anything. As such, a lot of effort has been (rightfully) directed towards security Tier 0.
  • Tier 1 – this is your data tier. While getting to Tier 0 means an attacker now owns you, we need to remember that the typical attacker is actually interested in what is stored in this tier. You have trade secrets and personally identifiable information in this tier. This is where financial data is stored, as well as your email, intranet, management software, supply chain, etc.  Pretty much any system that your organization uses resides in this tier. While we will prioritize Tier 0 from a command and control standpoint, it’s worth noting that if an attacker has breached this tier, then they are already likely to have access to what they’ve come for. That said, breaching this tier means they may have less tools at their disposal than they would if they were domain admins and may still limit them in some capacity depending on what they’re after.
  • Tier 2 – this is your user tier. Effectively, it’s your desktop computing environment. To your average cyber professional, productivity machines are equivalent to the wild wild west. This tier has internet access and it has users. When an organization is breached, the breach usually starts here, and that’s in large part because your users are not IT Professionals and productivity machines are loaded with commercial applications that all have their own unique set of vulnerabilities, and as such the represent a weak link… which brings me to another term.
  • Crossing tiers (or credential bleed or poor credential hygiene) is what happens when a single system (such as SCOM for instance) is configured across multiple tiers. This also happens when you have an account from one tier being used on a system that is in another tier. I’ll simply note from a cyber perspective that this is very bad. This series will go into a lot more detail about that, but I’m going to content throughout this series that this remains the biggest vulnerability that most organizations possess, and until it is appropriately addressed, your organization is at risk of easy compromise, no matter what security posture you take.

PAW: PAW stands for Privilege Access Workstation. I’ll go into the setup a bit more detailed later on, but it is one of the main premises behind all of this in that it is a hardened workstation dedicated to administrative use. It has no business productivity applications (i.e. email) and it’s sole purpose is to keep tier administration within the same tier (i.e, you have Tier 0 paw dedicated to administering domain controllers. A Tier 1 PAW will only administer Tier 1 systems, etc.).

Assumed Breach: This is both an assumption and a term, but as a cyber professional, it’s critically important to recognize that we cannot prevent a breach at Tier 2.  No amount of user education is going to change the fact that someone is going to fall for that phishing scam, which currently represents the largest vector into your organization. Someone is going to visit a site on their work computer that they have no business visiting. Someone is going to download something from a non-trusted source. I’m not saying that education in these areas is a bad thing, but it’s worth noting that this should never be a primary defense. The bottom line is that the desktop is the most unpredictable device in the environment. Users range from technically competent to completely uneducated, and often times the more competent the user, the more dangerous they are. Setting the users aside, desktops also have a much larger attack surface. Unlike servers, desktops run dozens of productivity applications all of which have vulnerabilities that can be exploited. Speaking of which…

Vulnerabilities: A vulnerability is flaw. This can be a flaw due to design or inherent to the technology that has been implemented, and that distinction is something we need to recognize, as we often try and mitigate to the technical vulnerability and not the design vulnerability. A good example of a technical vulnerability is SQL. All SQL, whether Microsoft or another version of it, is vulnerable to a SQL injection attack due to the nature of structured query language.  As such this vulnerability has to be mitigated. Operating systems are vulnerable to pass the hash because single sign on allows them to store credentials. A good example of a design vulnerability is credential theft in a broader sense. If a higher level credential is being used on a lower level system, an attacker has numerous exploits at their disposal to acquire said credential. In my opinion design vulnerabilities are typically much more dangerous than technical vulnerabilities.  While this won’t be true every time, technical vulnerabilities can be patched, or the vendor will provide mitigation guidance. That doesn’t excuse poor coding by vendors or poor documentation (such as saying a service account must be a domain admin), but good architectural design can significantly mitigate technical vulnerabilities. Vulnerabilities related to a specific system will always exist, and while we should be patching and staying up to date with security practices as related to said system. However, we need to think about our design first.

Exploits: This is something the bad guys do to take advantage of a vulnerability. Pass the hash is an exploit that takes advantage of that stored credential on an operating system. Pass the ticket is similar to pass the hash except that the bad guy is stealing a Kerberos ticket instead. SQL Injection is an exploit that takes advantage of the way SQL processes statements, and the list goes on. There are a lot of exploits. One of the big mistakes we make on this is missing the forest for the trees. An organization cannot reasonably mitigate against every exploit. There are simply too many of them. To some extent we rely on our vendors. This is why we patch, but even the best patching strategy will still leave an attacker with a number of exploits at their disposal.

Risk: This is what the organization assumes with any vulnerability. Risk effectively amounts to a cost and should ultimately determine the spend in terms of design for security. If system X is compromised, what is the cost to the organization? Understand that risk can be eliminated, mitigated, or assumed. That’s also pretty straight forward. You can eliminate certain types of risk based on your design. In some cases, you’re only mitigating it because you’ve reduced your exposure to said risk in some way, but there is still risk to compromise. When you assume risk, you’re simply acknowledging that you cannot or will not fix it… and most importantly, it’s wroth noting that ignoring said risk is the same as assuming it. Too often, risk is ignored.

These terms aren’t necessarily exciting to the average IT professional, but do understand that when speaking to managers or C level individuals, these are the terms they care about. There is only a limited amount of dollars available in the org, and while security budgets are finally growing due to the cost of compromise. They aren’t limitless. Management is ultimately concerned about what it will cost to mitigate the risks vs the potential cost if they don’t.

My last point for this piece is a bit more than an assumption. It’s a fact and one that should not be forgotten.  Namely this. Once an attacker has control of a system, they can do pretty much anything they want to it. Keep this thought in mind when it comes to terms like assumed breach and solution design. Ultimately, the biggest flaws that an organization faces from their security posture is not their technology and tools, but ignoring this fact when they design their environment.

You can find part 2 here.

You can find part 3 here.

You can find part 4 here.

Security Monitoring 1.7.x is up

There isn’t much to this year’s update. I didn’t get a ton of feature requests, but I did get a couple and built them in. This is the change log.

  • Updated Local Admin Change rule to account for GPO enforced Local Admin Settings.
  • Fixed a couple of alert replacement bugs.
  • Added more overrides options for some powershell rules.
  • Updated Log Clearing alerts to allow for a user account override.
  • Added an exclusion to PowerShell logging for an Azure path as well as SCOM 2019 default path.
  • Fixed a bug with the alert description for the PowerShell running in memory rule.
  • Added rule for suspicious user logons.
  • Added an exclusion for WindowsAzureNetAgent on the service creation on DC rule.

Also worth noting that I’ve moved all content off of technet galleries and on to github. I’m not a github expert by any means, so I’m still figuring out the pull requests and fun stuff associated with that, but this could eventually become a community project with the right volunteers. Here is a link to both the previous and current content.