174 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
		
		
			
		
	
	
			174 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| 
								 | 
							
								Security
							 | 
						||
| 
								 | 
							
								========
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Overview
							 | 
						||
| 
								 | 
							
								--------
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This chapter explains the security requirements that QEMU is designed to meet
							 | 
						||
| 
								 | 
							
								and principles for securely deploying QEMU.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Security Requirements
							 | 
						||
| 
								 | 
							
								---------------------
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								QEMU supports many different use cases, some of which have stricter security
							 | 
						||
| 
								 | 
							
								requirements than others.  The community has agreed on the overall security
							 | 
						||
| 
								 | 
							
								requirements that users may depend on.  These requirements define what is
							 | 
						||
| 
								 | 
							
								considered supported from a security perspective.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Virtualization Use Case
							 | 
						||
| 
								 | 
							
								'''''''''''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The virtualization use case covers cloud and virtual private server (VPS)
							 | 
						||
| 
								 | 
							
								hosting, as well as traditional data center and desktop virtualization.  These
							 | 
						||
| 
								 | 
							
								use cases rely on hardware virtualization extensions to execute guest code
							 | 
						||
| 
								 | 
							
								safely on the physical CPU at close-to-native speed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following entities are untrusted, meaning that they may be buggy or
							 | 
						||
| 
								 | 
							
								malicious:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								- Guest
							 | 
						||
| 
								 | 
							
								- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
							 | 
						||
| 
								 | 
							
								- Network protocols (e.g. NBD, live migration)
							 | 
						||
| 
								 | 
							
								- User-supplied files (e.g. disk images, kernels, device trees)
							 | 
						||
| 
								 | 
							
								- Passthrough devices (e.g. PCI, USB)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Bugs affecting these entities are evaluated on whether they can cause damage in
							 | 
						||
| 
								 | 
							
								real-world use cases and treated as security bugs if this is the case.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Non-virtualization Use Case
							 | 
						||
| 
								 | 
							
								'''''''''''''''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The non-virtualization use case covers emulation using the Tiny Code Generator
							 | 
						||
| 
								 | 
							
								(TCG).  In principle the TCG and device emulation code used in conjunction with
							 | 
						||
| 
								 | 
							
								the non-virtualization use case should meet the same security requirements as
							 | 
						||
| 
								 | 
							
								the virtualization use case.  However, for historical reasons much of the
							 | 
						||
| 
								 | 
							
								non-virtualization use case code was not written with these security
							 | 
						||
| 
								 | 
							
								requirements in mind.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Bugs affecting the non-virtualization use case are not considered security
							 | 
						||
| 
								 | 
							
								bugs at this time.  Users with non-virtualization use cases must not rely on
							 | 
						||
| 
								 | 
							
								QEMU to provide guest isolation or any security guarantees.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Architecture
							 | 
						||
| 
								 | 
							
								------------
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This section describes the design principles that ensure the security
							 | 
						||
| 
								 | 
							
								requirements are met.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Guest Isolation
							 | 
						||
| 
								 | 
							
								'''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Guest isolation is the confinement of guest code to the virtual machine.  When
							 | 
						||
| 
								 | 
							
								guest code gains control of execution on the host this is called escaping the
							 | 
						||
| 
								 | 
							
								virtual machine.  Isolation also includes resource limits such as throttling of
							 | 
						||
| 
								 | 
							
								CPU, memory, disk, or network.  Guests must be unable to exceed their resource
							 | 
						||
| 
								 | 
							
								limits.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								QEMU presents an attack surface to the guest in the form of emulated devices.
							 | 
						||
| 
								 | 
							
								The guest must not be able to gain control of QEMU.  Bugs in emulated devices
							 | 
						||
| 
								 | 
							
								could allow malicious guests to gain code execution in QEMU.  At this point the
							 | 
						||
| 
								 | 
							
								guest has escaped the virtual machine and is able to act in the context of the
							 | 
						||
| 
								 | 
							
								QEMU process on the host.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Guests often interact with other guests and share resources with them.  A
							 | 
						||
| 
								 | 
							
								malicious guest must not gain control of other guests or access their data.
							 | 
						||
| 
								 | 
							
								Disk image files and network traffic must be protected from other guests unless
							 | 
						||
| 
								 | 
							
								explicitly shared between them by the user.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Principle of Least Privilege
							 | 
						||
| 
								 | 
							
								''''''''''''''''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The principle of least privilege states that each component only has access to
							 | 
						||
| 
								 | 
							
								the privileges necessary for its function.  In the case of QEMU this means that
							 | 
						||
| 
								 | 
							
								each process only has access to resources belonging to the guest.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The QEMU process should not have access to any resources that are inaccessible
							 | 
						||
| 
								 | 
							
								to the guest.  This way the guest does not gain anything by escaping into the
							 | 
						||
| 
								 | 
							
								QEMU process since it already has access to those same resources from within
							 | 
						||
| 
								 | 
							
								the guest.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Following the principle of least privilege immediately fulfills guest isolation
							 | 
						||
| 
								 | 
							
								requirements.  For example, guest A only has access to its own disk image file
							 | 
						||
| 
								 | 
							
								``a.img`` and not guest B's disk image file ``b.img``.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In reality certain resources are inaccessible to the guest but must be
							 | 
						||
| 
								 | 
							
								available to QEMU to perform its function.  For example, host system calls are
							 | 
						||
| 
								 | 
							
								necessary for QEMU but are not exposed to guests.  A guest that escapes into
							 | 
						||
| 
								 | 
							
								the QEMU process can then begin invoking host system calls.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								New features must be designed to follow the principle of least privilege.
							 | 
						||
| 
								 | 
							
								Should this not be possible for technical reasons, the security risk must be
							 | 
						||
| 
								 | 
							
								clearly documented so users are aware of the trade-off of enabling the feature.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Isolation mechanisms
							 | 
						||
| 
								 | 
							
								''''''''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Several isolation mechanisms are available to realize this architecture of
							 | 
						||
| 
								 | 
							
								guest isolation and the principle of least privilege.  With the exception of
							 | 
						||
| 
								 | 
							
								Linux seccomp, these mechanisms are all deployed by management tools that
							 | 
						||
| 
								 | 
							
								launch QEMU, such as libvirt.  They are also platform-specific so they are only
							 | 
						||
| 
								 | 
							
								described briefly for Linux here.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The fundamental isolation mechanism is that QEMU processes must run as
							 | 
						||
| 
								 | 
							
								unprivileged users.  Sometimes it seems more convenient to launch QEMU as
							 | 
						||
| 
								 | 
							
								root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
							 | 
						||
| 
								 | 
							
								huge security risk.  File descriptor passing can be used to give an otherwise
							 | 
						||
| 
								 | 
							
								unprivileged QEMU process access to host devices without running QEMU as root.
							 | 
						||
| 
								 | 
							
								It is also possible to launch QEMU as a non-root user and configure UNIX groups
							 | 
						||
| 
								 | 
							
								for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
							 | 
						||
| 
								 | 
							
								Some Linux distros already ship with UNIX groups for these devices by default.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								- SELinux and AppArmor make it possible to confine processes beyond the
							 | 
						||
| 
								 | 
							
								  traditional UNIX process and file permissions model.  They restrict the QEMU
							 | 
						||
| 
								 | 
							
								  process from accessing processes and files on the host system that are not
							 | 
						||
| 
								 | 
							
								  needed by QEMU.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								- Resource limits and cgroup controllers provide throughput and utilization
							 | 
						||
| 
								 | 
							
								  limits on key resources such as CPU time, memory, and I/O bandwidth.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								- Linux namespaces can be used to make process, file system, and other system
							 | 
						||
| 
								 | 
							
								  resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
							 | 
						||
| 
								 | 
							
								  those resources that were granted to it.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								- Linux seccomp is available via the QEMU ``--sandbox`` option.  It disables
							 | 
						||
| 
								 | 
							
								  system calls that are not needed by QEMU, thereby reducing the host kernel
							 | 
						||
| 
								 | 
							
								  attack surface.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sensitive configurations
							 | 
						||
| 
								 | 
							
								------------------------
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								There are aspects of QEMU that can have security implications which users &
							 | 
						||
| 
								 | 
							
								management applications must be aware of.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Monitor console (QMP and HMP)
							 | 
						||
| 
								 | 
							
								'''''''''''''''''''''''''''''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The monitor console (whether used with QMP or HMP) provides an interface
							 | 
						||
| 
								 | 
							
								to dynamically control many aspects of QEMU's runtime operation. Many of the
							 | 
						||
| 
								 | 
							
								commands exposed will instruct QEMU to access content on the host file system
							 | 
						||
| 
								 | 
							
								and/or trigger spawning of external processes.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For example, the ``migrate`` command allows for the spawning of arbitrary
							 | 
						||
| 
								 | 
							
								processes for the purpose of tunnelling the migration data stream. The
							 | 
						||
| 
								 | 
							
								``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
							 | 
						||
| 
								 | 
							
								their content to the guest as a virtual disk.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
							 | 
						||
| 
								 | 
							
								or Linux namespaces, the monitor console should be considered to have privileges
							 | 
						||
| 
								 | 
							
								equivalent to those of the user account QEMU is running under.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								It is further important to consider the security of the character device backend
							 | 
						||
| 
								 | 
							
								over which the monitor console is exposed. It needs to have protection against
							 | 
						||
| 
								 | 
							
								malicious third parties which might try to make unauthorized connections, or
							 | 
						||
| 
								 | 
							
								perform man-in-the-middle attacks. Many of the character device backends do not
							 | 
						||
| 
								 | 
							
								satisfy this requirement and so must not be used for the monitor console.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The general recommendation is that the monitor console should be exposed over
							 | 
						||
| 
								 | 
							
								a UNIX domain socket backend to the local host only. Use of the TCP based
							 | 
						||
| 
								 | 
							
								character device backend is inappropriate unless configured to use both TLS
							 | 
						||
| 
								 | 
							
								encryption and authorization control policy on client connections.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In summary, the monitor console is considered a privileged control interface to
							 | 
						||
| 
								 | 
							
								QEMU and as such should only be made accessible to a trusted management
							 | 
						||
| 
								 | 
							
								application or user.
							 |