- Final Remarks and Conclusion
- Future Work
We report a preliminary performance evaluation of AMD SEV (Secure Environment Virtualization) technologies. We are running virtual machines using QEMU/KVM as hypervisor on a host we control. Details about our environment can be found in section Environment.
Confidential Computing technologies may become predominant in the future, as more and more customers with sensitive computing workloads move their code from on-premise hardware to public cloud vendors. An important aspect of successfully migrating workload will be identifying new performance bottlenecks. After thoroughly explaining how the technologies introduced by AMD work we run some measurements to assess the impact these have on micro-benchmarks and traditional workloads such as compilation of popular open-source projects.
We find that Confidential Computing technologies cause some slowdowns for memory operations, while the performance degradation in CPU intensive workloads is generally negligible.
Confidential Computing has become increasingly important in recent years. In the last two decades the way software is released to production changed radically: the majority of software deployed nowadays is hosted by cloud providers (Google Cloud Platform, Amazon Web Services, Microsoft Azure, etc.), while some code is still run on-premise for privacy/safety reasons. Confidential Computing aims to provide a safe environment for developers to run highly sensitive code on hosts they don't fully control.
Cloud providers' customers want to be sure no one can access their disks, memory or CPU registers, neither other customers running virtual machines on the same hardware, nor whoever is controlling the hypervisor, be it the cloud vendor, or, in worst case scenarios, malign actors who compromised the physical machines. In the appendix we present a simple demo attack where a process running on the host accesses memory from the Virtual Machine.
Encryption at rest, designed to prevent the attacker from accessing the unencrypted data by ensuring the data is encrypted when on disk1, has been around for a long time, and is currently supported by all major providers, but, almost by definition, it leaves a few components used in daily computing unencrypted, namely RAM and CPU registers. To tackle this issue major chip producers started to develop technologies to enable Confidential computing, like AMD Secure Encrypted Virtualization (SEV), Intel Trusted Domain Extensions (TDX) and ARM Confidential Compute Architecture (CCA).
In this article we will focus on summarizing the AMD technologies before running some measurements to evaluate the impact of Confidential Computing technologies on performance.
AMD Secure Memory Encryption (SME)
AMD Secure Memory Encryption (SME) is the basic building block for the more sophisticated technologies we will cover later, so it is imperative to understand how it works. In a machine with SME enabled memory operations are performed via dedicated hardware, on an entirely different chip on the same die as the CPU. AMD EPYC™ processors introduced two hardware security components:
- an AES-128 hardware encryption engine embedded in the memory controller, a chip whose job is to make sure data transferred to and from the main memory is encrypted during write operations and decrypted during read operations. See the figure below to better understand how data is managed when SME is used. The memory controller is inside the EPYC SOC, so that memory lines leaving the SOC are always encrypted.
- the AMD Secure Processor (SP), a small processor providing cryptographic functionality to perform secure key generation and key management.
Memory Encryption Behavior, from https://www.amd.com/system/files/TechDocs/memory-encryption-white-paper.pdf
The AES engine needs a key to work with, which is generated securely by the AMD Secure-Processor, a 32 bit micro-controller not accessible by software running on the main CPU. Furthermore SME does not require software running on main CPU to participate in Key Management making the enclave even more secure by reducing the attack surface.
The C-bit is a bit present in every memory page and it is meant to indicate whether the current page is to be encrypted. It can be retrieved together with some additional information by running the
cpuid command to inspect leaf
0x8000001F, as specified by the AMD Reference Manual4:
$ cpuid -1 -l 0x8000001F
AMD Secure Encryption (0x8000001f):
SME: secure memory encryption support = true
SEV: secure encrypted virtualize support = true
VM page flush MSR support = true
SEV-ES: SEV encrypted state support = true
SEV-SNP: SEV secure nested paging = true
VMPL: VM permission levels = true
Secure TSC supported = true
virtual TSC_AUX supported = false
hardware cache coher across enc domains = true
SEV guest exec only from 64-bit host = true
restricted injection = true
alternate injection = true
full debug state swap for SEV-ES guests = true
disallowing IBS use by host = true
VTE: SEV virtual transparent encryption = true
VMSA register protection = true
encryption bit position in PTE = 0x33 (51)
physical address space width reduction = 0x5 (5)
number of VM permission levels = 0x4 (4)
number of SEV-enabled guests supported = 0x1fd (509)
minimum SEV guest ASID = 0x80 (128)
The command shows SME, SEV, SEV-ES and SEV-SNP are enabled, and that the C-bit on the machine is the 51-th byte.
SME is a very powerful mechanism to provide memory encryption, however a major downside is it requires support from the Operating System/hypervisor, at the same time another tecnology has been developed: Transparent SME (TSME) is a solution to encrypt every memory page regardless of the C-bit. As the name suggests this technology provides encryption without further modification to OS/HV, in certain scenarios this may be crucial because it means that neither Operating System developers nor hypervisor developers have to implement and maintain additional code and older operating systems can be protected with TSME without any additional intervention.
AMD Secure Encrypted Virtualization (SEV)
AMD SEV is an attempt to make virtual machines more secure to use by encrypting data to and from a virtual machine, and it enables a new security model protecting user processes from higher privileged resources, such as hypervisors or privileged code running on the physical computer hosting the virtual machines. This makes running code in a secure environment without needing to trust the hypervisor possible.
SEV security layers
When SEV is enabled virtual machines tag their data with a VM ASID (an identifier unique to every machine running on the same host), which is used inside the SOC to isolate any workload from other VMs and the hypervisor. When data is transferred form the chip it is encrypted using the previously exchanged AES-128 bit key and thus protected.
The aforementioned expedients provide strong cryptography isolation between VMs run by the same hypervisor and between VMs and the hypervisor itself. SEV guests can choose which pages to encrypt setting the C-bit as mentioned before for SME. Only memory pages explicitly meant for outside communications with the hypervisor are considered shared and thus not encrypted. More details about SEV are discussed in depth in the introductory white-paper2.
AMD Secure Encrypted Virtualization-Encrypted State (SEV-ES)
The technologies presented until now only pertain data stored in memory, for the moment a crucial portion of the system is not protected with the aforementioned strategies, namely CPU registers. AMD SEV-ES encrypts all CPU registers when a VM stops running and decrypts them as soon as the machine is restarted.
Protecting CPU register may prove a daunting task because sometimes an hypervisor may need to access VM CPU's register to provide services such as device emulation. These accesses must be protected, and the ES technology allows the guest VM to decide which registers are encrypted, in the same way it can choose which memory pages are to be encrypted via the C-bit.
SEV-ES introduces a single atomic hardware instruction,
VMRUN, when this instruction is executed for a guest the CPU loads all registers, when the VM stops running (
VMEXIT), register's state is automatically saved back to memory. It is crucial that these instructions are atomic, in order to stop any malicious actor from sneaking into the process and alter it. This way ES guarantees it is not possible to leak memory.
Whenever hardware saves registers it encrypts them with the very same AES-128 key mentioned before, furthermore the CPU computes an integrity-check value and saves it into a memory region not accessible by the CPU, on the next
VMRUN instruction the data saved is checked to ensure register's state was not tampered.
Similarly to AMD-SEV AMD-ES is completely transparent to application code. It is solely the responsibility of guest VM developers and hypervisors developers to implement the specific protection features described.
Further information about external communication can be found in the white-paper5 and in the AMD reference manual chapter 154.
AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP)
After the introduction of AMD-SEV and AMD-ES, AMD introduced a newer generation of SEV called Secure Nested Paging (SEV-SNP), which builds on top of the technologies we have seen and extends them further to implement strong memory integrity protection to prevent hypervisor based attacks, such as replay attacks, memory remapping, data corruption and memory aliasing. Let's now explain what these attacks are before we cover how SEV-SNP mitigates them.
- a replay attack happens when a malicious actor captures the state at a certain moment and modifies memory successfully with those values;
- data corruptionattacks happen when the attacker, knowing they are not able to read memory, decides to corrupt the memory to trick the machine into unpredicted and possibly dangerous behavior;
- a memory aliasing attack happens when an external actor may map a memory page to multiple physical pages they control;
- a memory remapping attacks happens whenever the intruder maps a page to a different physical page.
These attacks are a problem because a running program has no notion of memory integrity and it could end up in a state that was not originally considered by the developers and this may lead to security issues.
The basic principle of SEV-SNP integrity is that if a VM is able to read a private (encrypted) page of memory, it must always read the value it last wrote. What this means is the VM should be able to throw an exception if the memory a process is trying to access was tampered by external actors.
In this computing model we consider:
AMD System-On-Chip (SOC) hardware, AMD Secure Processor (AMD-SP) and the VM as fully trusted, to this extent the VM should enable Full Disk Encryption (FDE) at rest, such as LUKS, major cloud providers have been supporting FDE for long time;
BIOS on the host system, the hypervisor, device drivers and other VMs as untrusted, this means the threat model assumes they are malicious and they may conspire to compromise the security of our Confidential Virtual Machine.
The way SEV-SNP ensures protection against the attacks mentioned before is by introducing a new data structure, a Reverse Map Table (RMP) that tracks ownership of memory pages. Using the RMP it is possible to enforce that only the owner of a certain memory page can alter it.
A page can be owned by the VM, the hypervisor or by the AMD Secure Processor. The RMP is used in conjunction with standard x86 page tables mechanisms to enforce memory restrictions and page access rights. Introducing RMP check for write operations on memory mitigates replay, remapping and data corruption attacks.
To prevent memory remapping a technique called Page Validation is introduced. Inside each RMP entry there is the Validated bit. Pages assigned to guests that have no Validated bit set are not usable by the hypervisor, the guest can only use the page after setting the validated bit through a
PVALIDATE instruction. The VM makse sure that it is not possible to validate a SPA (System Physical Address) corresponding to a GPA (Guest Physical Address) more than once.
More details are discussed in the introductory white-paper6.
Having introduced the main building blocks of AMD's effort to popularize Confidential Computing, we now run some benchmarks to measure if and how these technologies impact performances.
We are running our experiments in QEMU/KVM virtual machines, with 16GB of RAM and 16 vCPUs each, Table below contains more details about our hardware and software versions. The code to setup the environment and launch VMs can be found at https://github.com/rcastellotti/gr.
- QEMU is a generic open source machine emulator and virtualizer7, we use QEMU together with KVM, the Kernel Virtual Machine to virtualize our machines;
- OVMF is a project8 aiming to enable UEFI support for virtual machines, based on EDK 2. We use OVMF to generate the executable firmware and the non-volatile variable store. It is imperative to create a vm-specific copy of
OVMF_vars.fdbecause the variable store should be private for every virtual machine. UEFI support is mandatory to run a SEV-SNP machine.
AMD EPYC 7713P 64-Cores
HMAA8GR7AJR4N-XN (Hynix) 3200MHz 64 GB * 8 (512GB)
6.3.0-rc2 #1-NixOS SMP PREEMPT_DYNAMIC (NixOS 23.05) commit:
8.0.0 (AMD) (patched) commit:
Stable 202211 (patched) commit:
5.19.0-41-generic #42-Ubuntu SMP PREEMPT_DYNAMIC (Ubuntu 22.10 )
We expect the aforementioned encryption techniques used in SEV/SEV-ES/SEV-SNP to cause a degradation in performance, especially for memory-intensive workloads. Additionally we want to investigate whether this technology slows down CPU intensive workloads and disk operations. To quantify the performance impact we have to trade off in order to have a more secure virtual environment we are using rcastellotti/tinyben, an experimental benchmarking tool aimed at replacing the popular Phoronix Test Suite 9 benchmarking suite to run three distinct kinds of benchmarks: Compilation, Memory Benchmarks and I/O benchmarks
Compilation Benchmarks: We are using compilation as an all-around benchmark because it is a "real world" benchmark. We are compiling some popular open-source projects like Godot GameEngine, Linux (defconfig), the entire LLVM Project (using ninja) and measuring how long does it take to complete the compilation process. From the Figure below we can see there is a barely measurable slowdown in completion times, on average of a factor of 0.96.
Compilation benchmarks, in milliseconds (Lower is Better)
Memory Benchmarks: To benchmark memory we are using
ssvb/tinymembench, a tool to measure memory throughput and latency. We are mainly interested in bandwidth for the
MEMSEToperations. In the Figure below we can see bandwidth for
MEMSETis barely impacted, while the slowdown in
MEMSETis a little more evident, in a factor of approximately 0.92 on average. We suspect it might be related to lookups in the Reverse Map Table, but we are not excluding the bottleneck might be related to something entirely different.
tinymembench bandwidth benchmark,
MEMCPY(Higher is Better)
I/O Benchmarks: To measure I/O performance we are running two different benchmarks, first we are performing 2500 SQLite insertions in a table, then we are running
redis-benchmark, a tool included in Redis. We are interested in Requests per Second and latency for the main operations (namely SET and GET). In the figures below we see the results. Redis benchmarks highlight a certain performance degradation, in terms of requests per second this approximately translates in a slowdown of factor 0.89 for GET operations and 0.91 for GET operations, this is reflected in the minimum latency visualization.
Miscellaneous benchmarks, LZ4 compression and decompression and SQLite insertions time to completion (Lower is Better)
Redis-benchmark, SET and GET operations Requests Per Second (Higher is Better)
Redis-benchmark, SET and GET operations minimum latency (milliseconds) (Lower is Better)
As we can see the overhead introduced by the usage of Confidential Computing technology across the three categories of benchmarks is generally mild, as discussed before, in almost every benchmark, with the exception for LZ4-compression, where a slowdown in factor or 0.8 approximately is noticeable. Further investigation tweaking the amount of resources available to each machine is necessary to correctly esteem the exact performance degradation values.
One reason why CPU-intensive workloads are little to no penalized by SEV-SNP is the processor might not be doing much more when executing that kind of workload on a confidential machine. Instructions that really slow down the execution are
VMRUN, since the compilation of a certain program should not include a considerable amount of those our workloads are barely impacted.
To investigate storage related operations we ran some additional benchmarks with different storage virtualization techniques, more specifically we tested virtio-scsi, virtio-blk and nvme technologies. Virtio is the main platform for IO virtualization in KVM, providing a common framework for hypervisors to do IO virtualization.
- nvme is the user-space NVMe driver that enables virtual machines to interact with NVMe devices;
- virtio-blk devices are very simple block devices, the front-end driver reads and writes by appending commands to the virtualization queue so that the back-end driver can process them on the host;
- virtio-scsi aims to overcome some limitations introduced by virtio-blk, supports more devices per guest (one PCI device per disk is not a limiting factor anymore) and technologies like multiqueueing while keeping the performance of virtio-blk, additionally virtio-scsi provides a pass-through technology to present physical storage devices directly to guests.10
We perform some measurements using the FIO11 benchmarking tool with a setup similar to the one used in the Spool paper12 to evaluate: bandwidth, average latency and IOPS. We set the
–direct=1 flag to use unbuffered I/O. Results are reported below toghether with a table with the benchmarks run.
Storage Bandwidth Benchmarks (Higher is Better)
Average Latency Benchmarks (Lower is Better)
IOPS benchmarks (Higher Is Better)
|FIO Configuration (bs, rw, iodepth, numjobs)
|(128K, read, 128, 1)
|(128K, write, 128, 1)
|(4K, randread, 32, 4)
|(4K, randread 70%, 32, 4)
|(4K, randwrite 30%, 32, 4)
|(4K, randwrite, 32, 4 )
|(4K, randread, 1, 1 )
|(4K, randwrite, 1, 1 )
|(4K, read, 1, 1 )
|(4K, write, 1, 1 )
The FIO benchmarks highlight some interesting patterns: bandwidth is severely impacted, especially in read workloads, further investigation is needed to better pinpoint the bottleneck causing this. The number of Input/Output Operations per second is always lower on the SEV machine, as expected, however the difference is very contained, being on average a factor of 0.8 lower in the confidential machines. Average latency is almost always worse on SEV machines, in certain cases it is marginally lower, but we believe this is due to some artifact and is not relevant.
Final Remarks and Conclusion
Overall, from a first evaluation, it seems SEV/SEV-ES/SEV-SNP technologies do not cause a noticeable degradation in performance. The only notable exception is read bandwidth across all different storage virtualization techniques, further evaluation may be needed to better pinpoint what causes this decrease in performance.
Code and scripts to reproduce the measurements is available at https://github.com/rcastellotti/gr
Further performance evaluation for SEV/SEV-ES/SEV-SNP confidential virtual machines might include:
- repeating micro-benchmarks and FIO benchmarks tweaking machines configuration (vCPUs, RAM, storage virtualization);
- analyzing IO performance when using SEV Trusted Input Output 14;
- analyzing whether having multiple SEV machines on the same physical host incurs in a large overhead;
- analyzing performance overhead caused by
VMEXITwhen SEV-ES is enabled;
- investigate unikernels supporting SEV like microsoft/monza;
- exploring Confidential Containers 15 project to identify potential bottlenecks;
- understanding AMD Secure VM Service Module 13, a module to offload sensitive operations onto a privileged guest to see if it leads to a performance improvement.
Demo: reading Virtual Machine's data from the host machine
First of all we start two machines, sev and nosev, the former has SEV-SNP enabled, as we can check:
rc@sev:~$ sudo dmesg | grep SEV
Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
SEV: Using SNP CPUID table, 31 entries present.
SEV: SNP guest platform device initialized.
We will write something into a file and cat it in order to load the data in memory
rc@sev:~$ echo "hi from SEV!" > sev.txt
rc@sev:~$ cat sev.txt
hi from SEV!
rc@nosev:~$ echo "hi from NOSEV!" > nosev.txt
rc@nosev:~$ cat nosev.txt
hi from NOSEV!
Now we can dump the memory for the processes from the host machine using
rc@ryan:~$ sudo gcore -o mem-dump <SEV_PID>
rc@ryan:~$ grep -rnw mem-dump.<SEV_PID> -e "hi from SEV!"
rc@ryan:~$ sudo gcore -o mem-dump <NOSEV_PID>
rc@ryan:~$ grep -rnw mem-dump.<NOSEV_PID> -e "hi from NOSEV!"
grep: mem-dump.<NOSEV_PID>: binary file matches
From the host machine we are able to see NOSEV's machine memory while this is not possible with SEV enabled.