For all its benefits, the drive to virtualize everything has created a very big security issue: Virtualization creates a single target for a potential security breach. When a host runs 50 virtual machines (VMs) and is attacked, then you have a real problem. One compromised host compromises the 50 VMs running on it, and now you have what I lovingly call a “holy s**t” moment. Because you virtualized, you turned a whole bunch of servers and operating systems into just a couple of files that are super easy to steal.
The industry needs a way to protect against online and offline attacks that could compromise entire farms of VMs. Microsoft has done some work in this area in Windows Server 2016 with the shielded virtual machine, and its sister service, the Host Guardian Service (HGS). Let’s look at what the folks in Redmond have done.
Understanding the security problem with virtualization
Let’s frame the problem as a set of challenges that need to be solved for a security solution to mitigate the issues virtualization poses.
- On any platform, a local administrator can do anything on a system. Anything a guest does to protect itself, like encryption, can be undone by a local administrator. This is comparable to a data center, where all of the access control lists and fancy stuff you do on the inside of an operating system running on a racked server doesn’t matter when you can plug hacking tools into a USB port, boot off it, and copy everything there. Or I can take the system off the rack, drive off with it, and boot it up at home. Even drive encryption can be bypassed by some of these tools by injecting malware into boot sequences and stealing keys out of memory.
- Any seized or infected host administrator accounts can access guest VMs. As you might predict, the bad guys know this and target these individuals with increasingly sophisticated phishing attacks and other attempts to gain privileged access. The prized targets are no longer individual desktops and poorly protected home machines. The hacking target market has matured. The new targets are VM hosts in cloud data centers, public and private, with 10 or 15 guests on them, almost always packed to the gills with important information and the fabric administrator accounts that control those hosts. This virtualization fabric has to be protected, since more than just the host administrator has the ability to do harm. With VMs, the server administrator, storage administrator, network administrator, backup operator, and fabric administrator all have virtually unfettered access.
- Tenant VMs hosted on a cloud provider’s infrastructure (fabric) are exposed to storage and network attacks while unencrypted. The two main points here are: First, being encrypted at rest while not booted is worthless when your VM is infected while it is running in production. Second, the best offline defenses are worthless against network and storage attacks that execute while a machine is on.
- As technology currently stands, it is impossible to identify legitimate hosts without hardware based verification. There is no way you can tell a good host from a bad host without some type of function keying off a property of a piece of silicon.
Microsoft’s answer to these four points is new to Windows Server 2016—the shielded VM and the Host Guardian Service.
What is a shielded virtual machine (VM)?
A shielded VM protects against inspection, theft, and tampering from both malware and data center administrators, including fabric administrators, storage administrators, virtualization host administrators, and other network administrators.
Let me explain how a shielded VM works: It is a Generation 2 VM. The main data file for the VM, the VHDX file, is encrypted with BitLocker so that the contents of the virtual drives are protected. The big problem to overcome is that you must put the decryption key somewhere. If you put the key on the virtualization host, administrators can view the key and the encryption is worthless. The key has to be stored off-host in a siloed area.
The solution is to equip the Generation 2 VM with a virtual trusted platform module (vTPM) and have that vTPM secure the BitLocker encryption keys just like a regular silicon TPM would handle the keys to decrypt BitLocker on an ordinary laptop. Shielded VMs run on guarded hosts, or regular Hyper-V hosts that are operating in virtual secure mode—a setting that provides process and memory access protection from the host by establishing a tiny enclave off to the side of the kernel. (It doesn’t even run in the kernel, and all it does is talk with the guardian service to carry out the instructions about releasing or holding on to the decryption key.)
What is the Host Guardian Service?
How does the VM know when the release the key? Enter the Host Guardian Service (HGS), a cluster of machines that generally provide two services: attestation, which double-checks that only trusted Hyper-V hosts can run shielded VMs; and the Key Protection Service, which holds the power to release or deny the decryption key needed to start the shielded VMs in question. The HGS checks out the shielded virtual machines, checks out the fabric on which they are attempting to be started and run, and says, “Yes, this is an approved fabric and these hosts look like they have not been compromised. Release the Kraken! I mean keys.” The whole shebang is then decrypted and run on the guarded hosts. If any one of these checks and balances failed, then keys are not released, decryption is not performed, and the shielded VM fails to launch.
How does the HGS know whether a virtual machine is permitted to run on a fabric? The VM’s creator—the owner of the data—designates that a host must be healthy and pass a certain number of checks to be able to run the VM. The HGS attests to the health of the host requesting permission to run the VM before it releases the keys to decrypt the shielded VM. The protections are rooted in hardware as well, making them almost surely the most secure solution on the market today.
How to create shielded virtual machines
Creating shielded VMs is not that different than creating a standard VM. The real difference, apart from being a Generation 2 VM, is the presence of shielding data. Shielding data is an encrypted lump of secrets created on a trusted workstation. This lump of secrets can include administrator credentials, RDP credentials, and a volume signature catalog to prevent putting malware in the template disk from which future secure shielded VMs are created from. This catalog helps validate that the template has not been modified since it was created. A wizard, called the Shielding Data File Wizard, lets you create these bundles. A Protected Template Disk Creation Wizard makes that process run a little more smoothly as well.
Differences between shielded VMs and regular VMs
A shielded VM truly is shielded even from the fabric administrator, to the point where in System Center Virtual Machine Manager or even the bare Hyper-V Manager, you simply cannot connect via VM console to a shielded VM. You must use RDP and authenticate to the guest operating system, where the owner of the VM can decide who should be allowed to access the VM console session directly.
The fabric administrator doesn’t get automatic access. This effectively means that the administrator on the guest operating system of the VM ends up being the virtualization administrator in shielded VM scenarios, not the owner of the host infrastructure as would be the case with typical standard virtualization deployment. This makes shielded VMs a perfect choice for domain controllers, certificate services, and any other VM running a workload with a particularly high business impact.
This transfer of virtualization administrator capabilities begs the question of what to do, then, when a VM is borked and you can no longer access it over the network. This is what the “repair garage” is for. An administrator can park a broken VM inside another shielded VM that is functional and use nested virtualization (Hyper-V within Hyper-V) to run it, connect to the shielded repair garage over RDP like any other shielded VM, and make repairs to the nested broken VM within the safe confines of the shielded garage VM. Once repairs are complete, the fabric administrator can back the newly repaired VM out of the shielded repair garage and put it back onto the protected fabric as if nothing had happened.
The guarded fabric can run in a couple of modes: First, to make initial adoption simpler, there is a mode where the fabric administrator role is still trusted. You can set up an Active Directory trust and a group in which these machines can register, and then you can add Hyper-V host machines to that group to gain permission to run shielded VMs. This is a weaker version of the full protection, since the fabric administrator is trusted and there are no hardware-rooted trust or attestation checks for boot and code integrity.
The full version is when you register each Hyper-V host’s TPM with the host guardian service and establish a baseline code integrity policy for each different piece of hardware that will host shielded VMs. With the full model, the fabric administrator is not trusted, the trust of the guarded hosts is rooted in a physical TPM, and the guarded hosts have to comply with the code integrity policy for keys to decrypt the shielded VMs to be released.
Other notes about how shielded VMs behave and requirements for running them:
- Guarded hosts require you to be running Windows Server 2016 Datacenter edition—the more expensive one, of course. This feature does not exist in Standard edition.
- Windows Nano Server is not only supported in this scenario, it is recommended. Nano Server can be both the guest operating system within a shielded VM as well as handle the guarded Hyper-V host role as well as run the HGS. Nano Server is a great lightweight choice for the latter two roles, in my opinion.
- Shielded VMs can only be Generation 2 VMs, which necessitates that the guest operating systems be Windows 8 and Windows Server 2012 or newer (including Windows 10, Server 2012 and R2, and Server 2016.
- Contrary to what you might think, the vTPM is not tied to physical TPM on any particular server. For one, dividing up a physical TPM securely would be a real challenge. Secondly, the TPM has to move with the VM so that shielded VMs maintain all of the high availability and fault tolerance capabilities (Live Migration and so on) that regular VMs have.
The last word
The rush to virtualize all things has left a key attack vector virtually unprotected until now. Using shielded VMs adds a super layer of security to the applications that you have right now, even those that are running on Linux. Think of shielded VMs as the anti-Edward Snowden -- protection against the rogue administrator. It could make Windows Server 2016 easily worth the price of admission for your business.