SGX After Spectre and Meltdown: Status, Analysis and Remediations

Much has been written about the recently disclosed micro-architectural cache probing attacks named in the title of this document. These attacks, while known as a possibility for some time, have created significant concerns and remediation activity in the industry, secondary to the significant confidentiality threats they pose. These attacks are particularly problematic since they evade long standing protections that the industry has used as foundational constructs in the security design of modern operating systems.

While the threats to operating system protections have undergone significant discussion, there has been little official information surrounding the impact of this new threat class to Intel’s Software Guard eXtension (SGX) technology. This document is intended to provide support for system security architects and software engineers with respect to the impact of this new class of attack on SGX security guarantees. The development of this document was inspired by dialogue on the Intel SGX developer’s forum surrounding whether or not enclaves provide credible security guarantees in the face of these new threats.

Hardware and microcode enhancements introduced in the Intel Skylake micro-architecture provide the framework for the SGX Trusted Execution Environment (TEE). The SGX security architecture uses the notion of an enclave, which is an area of memory which contains data and code which can only be referenced by the enclave itself. Unauthorized access to these protected memory regions are blocked regardless of the privilege level of the context of execution attempting the access. As a result the premise is that enclaves will provide confidentiality and integrity guarantees even if the hardware, BIOS, hypervisor or operating system are compromised.

In addition to providing protections against malware or compromised operating platforms, the technology focuses on providing security guarantees for ‘cloud’ based computing models. The notion is that clients cam forward computational tasks, encased in enclaves, into cloud based infrastructures. with an expectation that data and computational activity conducted inside the enclave, will be protected from possible surveillance by the cloud service provider.

The most important security finding currently available is that there is no credible engineering rationale to support the contention that SGX enclaves will provide confidentiality guarantees in the face of these new micro-architectural cache probing attacks. This is disappointing for a technology that was designed to provide security guarantees in the face of an IAGO threat model or in the previously described service provider models.

The analysis supporting this conclusion is derived from several foundational sources of SGX architecture and design information. Readers are encouraged to reference the following documents for additional background information. The foundational document for readers interested in the SGX architecture is the MIT SGX review paper by Costan and Devadas. The design and implementation of the SGX Management Encryption Engine (MEE), which is the foundation for the SGX integrity and confidentiality guarantees, is fully described by Shay Gueron in the following paper. Finally, a comprehensive review of the x86 micro-architecture has been done by Agner Fog.

The prima facie rationale for security concerns by SGX developers is that a functional implementation of Variant 1 (conditional branch misprediction) of the Spectre vulnerability has been published. An independent implementation of this exploit was confirmed in our lab using the Linux operating system and an alternative implementation of the Intel SGX Platform SoftWare (PSW). The exploit is demonstrating an 88% accuracy rate in exfiltrating secrets from an enclave.

At the time of this writing it is unclear how the published exploit would be functional given what appears to be an error in its implementation. Since this may have been the intent of the authors we are currently not disclosing the modifications needed to achieve our functional implementation.

Since the Variant 1 implementation essentially involves an application spying on itself, the concern is not over the exploit itself but rather in its demonstration that SGX enclaves do not enjoy unfettered guarantees with respect to protection of their contents from the surrounding host system. With this finding, security architects must operate under the premise that content in an enclave enjoys no more protections from micro-architectural cache probing attacks then data in standard user or kernel space.

While this finding is initially disturbing, given the design intent of SGX, it is not without precedent, nor is it unexpected after a review of the SGX literature. The original description of the potential for SGX to provide disruptive security guarantees was followed by a sobering study of the vulnerability of TEE’s to controlled side channel attacks. Official guidance from Intel advises caution with respect to the need to employ algorithms and software design strategies in SGX enclaves that are resistant to side-channel timing attacks.

An obvious response to these findings is to question how this could have have happened from an engineering perspective. A review of the SGX security model and architecture is useful in understanding the rationale for vulnerability and measures for responding to the implications of this new threat scenario.

Without descending deeply into the technical details of the SGX architecture, the micro-architectural vulnerabilities, while significant and troubling, are not inconsistent with the design decisions behind the technology. The fundamental underpinning in any trusted system is the designation of the root of trust for the system. Reviewing the SGX trust model is helpful in understanding the nature of the vulnerability.

In the development of SGX, the decision was made to trust the processor itself, under the notion that the processor had to be inherently trusted. The micro-architectural vulnerabilities can be argued as an example of the processor failing to properly implement protection domains and is thus a failure in the security design of the processor. As a result, the impact of these vulnerabilities on SGX would not be unexpected, since it could be argued that the processor could not be ‘trusted’.

The root of the protection domain violation can be understood by tracing backwards to fundamental design decisions that date back decades. The x86 Instruction Set Architecure (ISA) initially carried a taxonomical classification as a Complex Instruction Set Computer (CISC). The design intent of such architectures was to maximize programmer productivity by providing high level abstractions for desired computational functions.

The competing design taxonomy was the Reduced Instruction Set Computer (RISC) architecture. In this model instructions are designed to be much more concise and targeted with respect to their functionality, with an objective of decreasing the complexity of decoding and executing these instructions.

In approximately the Pentium era Intel merged these architectures and the CISC ISA was implemented as a RISC architecture with respect to the physical implementation of the processor logic. This can be thought of, in the abstract sense, as the operating system and applications running on a virtualized CISC architecture implemented on a physical RISC architecture. This model unlocked the potential for pipelined, superscalar architectures capable of implementing out-of-order and speculative execution.

For the purposes of clarity, this document will mint the term, ‘representational level‘, to refer to the CISC emulation level of the processor as seen by the OS and application layer. The term. ‘operational level‘,will be applied to the RISC implementation level of the processor.

Virtual memory, the fundamental protection tenant required for secure and stable operating system environments, is notionally implemented at the representational level of the processor. Illegal memory accesses are denied by delivering an exception to the OS or application when the operational state completes execution of the representational form of the instruction.

The security design of SGX is based on a virtual memory implementation strategy. As a result, its integrity and confidentiality guarantees rely on the ability of the processor to enforce strict security domain partitioning between the operational and representational levels of the processor with respect to memory access protection. Since the micro-architectural probing attacks are a breakdown in the integrity of this partitioning, it would not be unexpected for SGX security guarantees to be compromised.

A brief review of the SGX virtual memory protection architecture is useful in understanding these issues. In an SGX capable platform the BIOS allocates an array of ‘protected’ memory referred to as Processor Reserved Memory (PRM). This memory region can only be accessed by microcode running on the processor or by a virtual to physical memory mapping which has been defined by a validated enclave loading and initialization process.

PRM is sub-divided into two separate regions; referred to as Enclave Page Cache (EPC) memory and Enclave Page Cache Map (EPCM) memory. EPC contains memory which is allocated to enclaves and accessible through validated virtual memory mappings. EPCM memory is only accessible by processor microcode and contains metadata that is used by the Memory Encryption Engine (MEE) and page fault handlers to validate memory that is fetched from the EPC.

The processor Page Miss Handler (PMH) is implemented in microcode and runs at the operational level of the processor. When an instruction at the representational level of the processor requires access to a page of EPC memory, the SGX modified PMH validates that the access is valid by referencing metadata in the EPCM. A denied access is implemented at the operational level by the delivery of an exception fault to the executing instruction.

These access protections operate AFTER the operational level of the processor has conducted the actual memory fetch with consequent population of the on-chip caches. As a result, SGX protection guarantees do not extend to memory accesses conducted at the operational level of the processor. Any breakdown in the ability of the processor to implement effective partitioning between the operational and representational levels would thus be expected to result in a compromise of SGX security guarantees.

The micro-architectural probing attacks are based on the fact that the operational state of the processors are capable of reading memory in advance of the representational state of the processor. This is a basic requirement needed to support both out-of-order and speculative execution. The operational level of the processor thus possesses data that may not be technically allowed at the representational level of the processor. As these attacks demonstrate, this data can be exfiltrated across the operational/representational security domain barrier by probing the cache state of the processor.

This analysis thus supports the premise that the virtual memory based SGX security guarantees would be ineffectual in the face of these attacks. This would seem to be in contrast to a finding that the SGX virtual memory security model has been formally proven.

The MIT review paper contains a formal and extensive proof of the SGX virtual memory security model. The proof relies heavily on the invariant premise that the processor Translation Lookaside Buffer (TLB) can only contain mappings consistent with the execution domain (trusted or untrusted) that the processor is operating in. Entry and exit of an enclave results in the TLB being invalidated which forces the PMH to validate subsequent page accesses against the SGX virtual memory security constraints, as it applies to the current execution domain.

As with all formal proofs, the correctness of the proof is dependent on the validity of the assumptions that the proof is based on. The proof assumes that data transfers will occur only by virtual memory mediated transfers and its validity thus fails in the face of data transfers which can occur without violating the virtual memory security constraints embodied in the design of SGX. Exfiltration of data from the processor caches is an example of such a transfer.

A central tenant of SGX security protections is that the contents of EPC memory are encrypted and integrity protected by the MEE. This is designed to thwart cold-boot and bus snooping attacks. This proves to be ineffective in the micro-architectural probing attacks since the MEE sits at the ‘edge’ of the memory controller and thus at the ‘base’ of the cache hierarchy. Operational level memory fetches, in support of out-of-order or speculative execution, results in the decryption of the data, which is then available in plaintext form in the cache heirarchy. As a result the plaintext form of the data is available to be targeted by cache probing strategies.

The important question to the SGX community is the possibility for mitigating this security regression, either through external action or by a change in SGX itself.

At the time of this writing, a rogue cache fill, aka Meltdown, style attack has not been demonstrated. Given that this attack works by probing for operational level induced changes in cache state, prudence would dictate that page table isolation techniques be immediately deployed on all platforms implementing security guarantees through the use of SGX enclaves. If proven to be successful, this attack would be particularly devastating to enclaves, as is the case with standard operating system protections.

The SGX architecture literature indicates its implementation was done in microcode in order to support security changes if that were to become necessary. The ability to induce major functional changes would presume to be limited by the minimal amount of space that is available for microcode patching.

With that being said. there is reason to believe that SGX will be an attractive environment for the implementation of Spectre style mitigations. As part of a general mitigation strategy for these attacks, Intel has used microcode modifications to implement three ‘barrier’ strategies to thwart the ability for speculation to be used to probe memory. These enhancements implement an Indirect Branch Prediction Barrier (IBPB), a Single Thread Indirect Branch Prediction (STIBP) barrier and Indirect Branch Restricted Speculation (IBRS) support.

The IBRS functionality is used to protect regions of higher privilege from speculation in lower privileged regions. Available guidance specifically indicates that IBRS is not required in order to isolate branch prediction for either SGX enclaves or System Management Mode (SMM) routines.

The ENCLU[EENTER] and ENCLU[EEXIT] instructions are implemented in microcode. It would thus be a reasonable assumption that an effort will be made to modify the behavior of these instructions to implement appropriate speculations barriers on enclave entry or exit. This would be consistent with the current virtual memory protection architecture which issues a full TLB invalidation as a component of the operational implementation of these instructions.

This would presumably thwart the transfer of covert speculation information between trusted and untrusted execution domains. It should be possible to validate this by testing the Spectre variant 1 attack on an SGX capable platform running updated microcode. Given current stability concerns surrounding current microcode updates, any potential implementation may not yet be available or arguably be very unstable. If this is the approach being taken it would be beneficial to the SGX community to have solid guidance on potential microcode interventions or improvements.

In advance of the availability of such fixes, standard software development mitigation practices can also be employed. Support is emerging for compilers which implement ‘retpoline’ support as a means of blocking indirect branch prediction poisoning. Index masking and the use of explicit fencing instructions (lfence and/or mfence) can also be employed.

If microcode based speculation fencing on enclave entry and exit become available, the case can be made that defensive practices inside of the enclave may not be strictly necessary. The Spectre vulnerabilities represent a class of attack dependent on the execution of untrusted code. It would be a reasonable assumption that code being executed in enclave context would be fully trusted.

The SGX enclave environment provides an additional, immediately practical, defensive measure as well. Within an enclave the ENCLU[EGETKEY] instructions provide the basis for generating keys which support either platform or enclave specific sealing keys. These keys can be used to encrypt high value data within the enclave, thus blocking its exfiltration, since the data would then exist in the operational level of the processor as ciphertext. The protected data would need to be decrypted before use but this strategy would require a speculative attack which would need to cope with a highly dynamic target in what could be engineered to be an extremely narrow security vulnerability window.

It is important to note that all of these micro-architectural attacks are only capable of breaching confidentiality, not integrity of data. By retaining their strong integrity guarantees SGX enclaves provide a framework which guarantees the integrity of mitigation modalities such as intra-enclave encryption.

An important consideration moving forward is the possibility of enclaves being used as a modality to support the remote exfiltration of data obtained by micro-architectural probing. This is a possibility secondary to the SGX memory model which allows a process running in trusted enclave context full access to untrusted memory. A preliminary review of the SGX2 architecture suggests this may be a concern on such platforms if interventions are not implemented.

We will update this document with additional findings and guidance as more information becomes available.

Dr. Greg

One thought on “SGX After Spectre and Meltdown: Status, Analysis and Remediations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s