With Apple Silicon available to the masses, it's a good time to add support
for driving its virtualization extensions from QEMU.
This patch adds all necessary architecture specific code to get basic VMs
working, including save/restore.
Known limitations:
- WFI handling is missing (follows in later patch)
- No watchpoint/breakpoint support
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-5-agraf@csgraf.de
[PMM: added missing #include]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We added a stub for the arch_type global in commit 5964ed56d9 so
that we could compile blockdev.c into the tools. However, in commit
9db1d3a2be we removed the only use of arch_type from blockdev.c.
The stub is therefore no longer needed, and we can delete it again,
together with the QEMU_ARCH_NONE value that only the stub was using.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730105947.28215-9-peter.maydell@linaro.org
When Hexagon was added we forgot to add it to the QEMU_ARCH_*
enumeration. This doesn't cause a visible effect because at the
moment Hexagon is linux-user only and the QEMU_ARCH_* constants are
only used in softmmu, but we might as well add it in, since it's the
only architecture currently missing from the list.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Taylor Simpson <tsimpson@quicinc.com>
Message-id: 20210730105947.28215-6-peter.maydell@linaro.org
The kvm_available() function reports whether KVM support was
compiled into the QEMU binary; it returns the value of the
CONFIG_KVM define.
The only place in the codebase where we use this function is
in qmp_query_kvm(). Now that accelerators are based on QOM
classes we can instead use accel_find("kvm") and remove the
kvm_available() function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730105947.28215-3-peter.maydell@linaro.org
The xen_available() function is used only to produce an error
for some Xen-specific command line options in QEMU binaries where
Xen support was not compiled in: it just returns the value of
the CONFIG_XEN define.
Now that accelerators are QOM classes, we can check for
"does this binary have Xen compiled in" with accel_find("xen"),
and drop the xen_available() function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730105947.28215-2-peter.maydell@linaro.org
The `aio-max-batch` parameter will be propagated to AIO engines
and it will be used to control the maximum number of queued requests.
When there are in queue a number of requests equal to `aio-max-batch`,
the engine invokes the system call to forward the requests to the kernel.
This parameter allows us to control the maximum batch size to reduce
the latency that requests might accumulate while queued in the AIO
engine queue.
If `aio-max-batch` is equal to 0 (default value), the AIO engine will
use its default maximum batch size value.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20210721094211.69853-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For block host devices, I/O can happen through either the kernel file
descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
or the SCSI passthrough ioctl SG_IO.
In the latter case, the size of each transfer can be limited by the
HBA, while for file descriptor I/O the kernel is able to split and
merge I/O in smaller pieces as needed. Applying the HBA limits to
file descriptor I/O results in more system calls and suboptimal
performance, so this patch splits the max_transfer limit in two:
max_transfer remains valid and is used in general, while max_hw_transfer
is limited to the maximum hardware size. max_hw_transfer can then be
included by the scsi-generic driver in the block limits page, to ensure
that the stricter hardware limit is used.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* avoid deprecation warnings for SASL on macOS 10.11 or newer
* fix -readconfig when config blocks have an id (like [chardev "qmp"])
* Error* initialization fixes
* Improvements to ESP emulation (Mark)
* Allow creating noreserve memory backends (David)
* Improvements to query-memdev (David)
* Bump compiler to C11 (Richard)
* First round of SVM fixes from GSoC project (Lara)
# gpg: Signature made Wed 16 Jun 2021 16:37:49 BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream: (45 commits)
configure: Remove probe for _Static_assert
qemu/compiler: Remove QEMU_GENERIC
include/qemu/lockable: Use _Generic instead of QEMU_GENERIC
util: Use unique type for QemuRecMutex in thread-posix.h
util: Pass file+line to qemu_rec_mutex_unlock_impl
util: Use real functions for thread-posix QemuRecMutex
softfloat: Use _Generic instead of QEMU_GENERIC
configure: Use -std=gnu11
target/i386: Added Intercept CR0 writes check
target/i386: Added consistency checks for CR0
target/i386: Added consistency checks for VMRUN intercept and ASID
target/i386: Refactored intercept checks into cpu_svm_has_intercept
configure: map x32 to cpu_family x86_64 for meson
hmp: Print "reserve" property of memory backends with "info memdev"
qmp: Include "reserve" property of memory backends
hmp: Print "share" property of memory backends with "info memdev"
qmp: Include "share" property of memory backends
qmp: Clarify memory backend properties returned via query-memdev
hostmem: Wire up RAM_NORESERVE via "reserve" property
util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).
Only Linux currently supports clearing the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out with "reserve=false".
The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.
As really only Linux implements RAM_NORESERVE right now, let's expose
the property only with CONFIG_LINUX. Setting the property to "false"
will then only fail in corner cases -- for example on very old kernels
or when memory overcommit was completely disabled by the admin.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-11-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.
This patch splits the vcpu init and destroy functions into a generic and
an architecture specific portion. This also allows us to move the generic
functions into the generic hvf code, removing exported functions.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-8-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.
This patch moves a few internal struct and constant defines over.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-5-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.
This patch moves CPU and memory operations over. While at it, make sure
the code is consumable on non-i386 systems.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-4-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.
This patch moves assert_hvf_ok() and introduces generic build infrastructure.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-2-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an
awkward way from the MemoryRegionSection that passed in from the
caller. The truth is for each KVMSlot the ramblock offset never
change for the lifecycle. Cache the ramblock offset for each KVMSlot
into the structure when the KVMSlot is created.
With that, we can further simplify kvm_physical_sync_dirty_bitmap()
with a helper to sync KVMSlot dirty bitmap to the ramblock dirty
bitmap of a specific KVMSlot.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-6-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a helper kvm_slot_get_dirty_log() to make the function
kvm_physical_sync_dirty_bitmap() clearer. We can even cache the as_id
into KVMSlot when it is created, so that we don't even need to pass it
down every time.
Since at it, remove return value of kvm_physical_sync_dirty_bitmap()
because it should never fail.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Per-kml slots_lock will bring some trouble if we want to take all slots_lock of
all the KMLs, especially when we're in a context that we could have taken some
of the KML slots_lock, then we even need to figure out what we've taken and
what we need to take.
Make this simple by merging all KML slots_lock into a single slots lock.
Per-kml slots_lock isn't anything that helpful anyway - so far only x86 has two
address spaces (so, two slots_locks). All the rest archs will be having one
address space always, which means there's actually one slots_lock so it will be
the same as before.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are already poisoning CONFIG_KVM since this switch is not working
in common code. Do the same with the other accelerator switches, too
(except for CONFIG_TCG, which is special, since it is also defined in
config-host.h).
Message-Id: <20210414112004.943383-2-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Target lm32 was deprecated in commit d849800512, v5.2.0. See there
for rationale.
Some of its code lives on in device models derived from milkymist
ones: hw/char/digic-uart.c and hw/display/bcm2835_fb.c.
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210503084034.3804963-2-armbru@redhat.com>
Acked-by: Michael Walle <michael@walle.cc>
[Trivial conflicts resolved, reST markup fixed]
There are no known users of this CPU anymore, and there are no
binaries available online which could be used for regression tests,
so the code has likely completely bit-rotten already. It's been
marked as deprecated since two releases now and nobody spoke up
that there is still a need to keep it, thus let's remove it now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210430160355.698194-1-thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Commit message typos fixed, trivial conflicts resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Both os-win32.h and os-posix.h include system header files. Instead
of having osdep.h include them inside its 'extern "C"' block, make
these headers handle that themselves, so that we don't include the
system headers inside 'extern "C"'.
This doesn't fix any current problems, but it's conceptually the
right way to handle system headers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Similarly to 5f629d943c ("s390x: fix s390 virtio aliases"),
define the virtio aliases.
This allows to start machines with virtio devices without
knowledge of the implementation type.
For instance, we can use "-device virtio-scsi" on
m68k, s390x or PC, and the device will be respectively
"virtio-scsi-device", "virtio-scsi-ccw" or "virtio-scsi-pci".
This already exists for s390x and -ccw interfaces, add them
for m68k and MMIO (-device) interfaces.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20210319202335.2397060-3-laurent@vivier.eu>
Message-Id: <20210323165308.15244-18-alex.bennee@linaro.org>
Currently get_naturally_aligned_size() is used by the intel iommu
to compute the maximum invalidation range based on @size which is
a power of 2 while being aligned with the @start address and less
than the maximum range defined by @gaw.
This helper is also useful for other iommu devices (virtio-iommu,
SMMUv3) to make sure IOMMU UNMAP notifiers only are called with
power of 2 range sizes.
Let's move this latter into dma-helpers.c and rename it into
dma_aligned_pow2_mask(). Also rewrite the helper so that it
accomodates UINT64_MAX values for the size mask and max mask.
It now returns a mask instead of a size. Change the caller.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20210309102742.30442-3-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Pull request
# gpg: Signature made Wed 10 Mar 2021 21:56:09 GMT
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/trivial-branch-for-6.0-pull-request: (22 commits)
sysemu: Let VMChangeStateHandler take boolean 'running' argument
sysemu/runstate: Let runstate_is_running() return bool
hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES
hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES
hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards
qemu-common.h: Update copyright string to 2021
tests/fp/fp-test: Replace the word 'blacklist'
qemu-options: Replace the word 'blacklist'
seccomp: Replace the word 'blacklist'
scripts/tracetool: Replace the word 'whitelist'
ui: Replace the word 'whitelist'
virtio-gpu: Adjust code space style
exec/memory: Use struct Object typedef
fuzz-test: remove unneccessary debugging flags
net: Use id_generate() in the network subsystem, too
MAINTAINERS: Fix the location of tools manuals
vhost_user_gpu: Drop dead check for g_malloc() failure
backends/dbus-vmstate: Fix short read error handling
target/hexagon/gen_tcg_funcs: Fix a typo
hw/elf_ops: Fix a typo
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Record/replay provides REPLAY_CLOCK_LOCKED macro to access
the clock when vm_clock_seqlock is locked. This macro is
needed because replay internals operate icount. In locked case
replay use icount_get_raw_locked for icount request, which prevents
excess locking which leads to deadlock. But previously only
record code used *_locked function and replay did not.
Therefore sometimes clock access lead to deadlocks.
This patch fixes clock access for replay too and uses *_locked
icount access function.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <161347990483.1313189.8371838968343494161.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When SEV-ES is enabled, it is not possible modify the guests register
state after it has been initially created, encrypted and measured.
Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the
hypervisor cannot emulate this because it cannot update the AP register
state. For the very first boot by an AP, the reset vector CS segment
value and the EIP value must be programmed before the register has been
encrypted and measured. Search the guest firmware for the guest for a
specific GUID that tells Qemu the value of the reset vector to use.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OVMF is developing a mechanism for depositing a GUIDed table just
below the known location of the reset vector. The table goes
backwards in memory so all entries are of the form
<data>|len|<GUID>
Where <data> is arbtrary size and type, <len> is a uint16_t and
describes the entire length of the entry from the beginning of the
data to the end of the guid.
The foot of the table is of this form and <len> for this case
describes the entire size of the table. The table foot GUID is
defined by OVMF as 96b582de-1fb2-45f7-baea-a366c55a082d and if the
table is present this GUID is just below the reset vector, 48 bytes
before the end of the firmware file.
Add a parser for the ovmf reset block which takes a copy of the block,
if the table foot guid is found, minus the footer and a function for
later traversal to return the data area of any specified GUIDs.
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-2-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently the "memory-encryption" property is only looked at once we
get to kvm_init(). Although protection of guest memory from the
hypervisor isn't something that could really ever work with TCG, it's
not conceptually tied to the KVM accelerator.
In addition, the way the string property is resolved to an object is
almost identical to how a QOM link property is handled.
So, create a new "confidential-guest-support" link property which sets
this QOM interface link directly in the machine. For compatibility we
keep the "memory-encryption" property, but now implemented in terms of
the new property.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.
That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.
For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with. But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:
* it assumes that the flash memory is defined to be encrypted with the
guest key, rather than being shared with hypervisor
* it assumes that that hypervisor has some mechanism to encrypt data into
the guest, even though it can't decrypt it out, since that's the whole
point
* the interface assumes that this encrypt can be done in place, which
implies that the hypervisor can write into a confidential guests's
memory, even if what it writes isn't meaningful
So really, this "abstraction" is actually pretty specific to the way SEV
works. So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>