The root cause for this crash is the ioeventfd not stopped while the VM stop.
The callback for vmstate_change was not implement in virtio-mmio bus
Reproduce step
load the vm with
-M microvm \
-netdev tap,id=net0,vhostforce,script=no,downscript=no \
-device virtio-net-device,netdev=net0\
After the VM boot, login the vm and then shutdown the vm
System will crash
[Current thread is 1 (Thread 0x7ffff6edde00 (LWP 374378))]
(gdb) bt
0 0x00005555558f18b4 in qemu_flush_or_purge_queued_packets (purge=false, nc=0x55500252e850) at ../net/net.c:636
1 qemu_flush_queued_packets (nc=0x55500252e850) at ../net/net.c:656
2 0x0000555555b6c363 in virtio_queue_notify_vq (vq=0x7fffe7e2b010) at ../hw/virtio/virtio.c:2339
3 virtio_queue_host_notifier_read (n=0x7fffe7e2b08c) at ../hw/virtio/virtio.c:3583
4 0x0000555555de7b5a in aio_dispatch_handler (ctx=ctx@entry=0x5555567c5780, node=0x555556b83fd0) at ../util/aio-posix.c:329
5 0x0000555555de8454 in aio_dispatch_ready_handlers (ready_list=<optimized out>, ctx=<optimized out>) at ../util/aio-posix.c:359
6 aio_poll (ctx=0x5555567c5780, blocking=blocking@entry=false) at ../util/aio-posix.c:662
7 0x0000555555cce0cc in monitor_cleanup () at ../monitor/monitor.c:645
8 0x0000555555b06bd2 in qemu_cleanup () at ../softmmu/runstate.c:822
9 0x000055555586e693 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:51
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20211109023744.22387-1-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The GICv3/v4 pseudocode has a function IsSpecial() which returns true
if passed a "special" interrupt ID number (anything between 1020 and
1023 inclusive). We open-code this condition in a couple of places,
so abstract it out into a new function gicv3_intid_is_special().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The logic of gicv3_redist_update() is as follows:
* it must be called in any code path that changes the state of
(only) redistributor interrupts
* if it finds a redistributor interrupt that is (now) higher
priority than the previous highest-priority pending interrupt,
then this must be the new highest-priority pending interrupt
* if it does *not* find a better redistributor interrupt, then:
- if the previous state was "no interrupts pending" then
the new state is still "no interrupts pending"
- if the previous best interrupt was not a redistributor
interrupt then that remains the best interrupt
- if the previous best interrupt *was* a redistributor interrupt,
then the new best interrupt must be some non-redistributor
interrupt, but we don't know which so must do a full scan
In commit 17fb5e36aa we effectively added the LPI interrupts
as a kind of "redistributor interrupt" for this purpose, by adding
cs->hpplpi to the set of things that gicv3_redist_update() considers
before it gives up and decides to do a full scan of distributor
interrupts. However we didn't quite get this right:
* the condition check for "was the previous best interrupt a
redistributor interrupt" must be updated to include LPIs
in what it considers to be redistributor interrupts
* every code path which updates the LPI state which
gicv3_redist_update() checks must also call gicv3_redist_update():
this is cs->hpplpi and the GICR_CTLR ENABLE_LPIS bit
This commit fixes this by:
* correcting the test on cs->hppi.irq in gicv3_redist_update()
* making gicv3_redist_update_lpi() always call gicv3_redist_update()
* introducing a new gicv3_redist_update_lpi_only() for the one
callsite (the post-load hook) which must not call
gicv3_redist_update()
* making gicv3_redist_lpi_pending() always call gicv3_redist_update(),
either directly or via gicv3_redist_update_lpi()
* removing a couple of now-unnecessary calls to gicv3_redist_update()
from some callers of those two functions
* calling gicv3_redist_update() when the GICR_CTLR ENABLE_LPIS
bit is cleared
(This means that the not-file-local gicv3_redist_* LPI related
functions now all take care of the updates of internally cached
GICv3 information, in the same way the older functions
gicv3_redist_set_irq() and gicv3_redist_send_sgi() do.)
The visible effect of this bug was that when the guest acknowledged
an LPI by reading ICC_IAR1_EL1, we marked it as not pending in the
LPI data structure but still left it in cs->hppi so we would offer it
to the guest again. In particular for setups using an emulated GICv3
and ITS and using devices which use LPIs (ie PCI devices) a Linux
guest would complain "irq 54: nobody cared" and then hang. (The hang
was intermittent, presumably depending on the timing between
different interrupts arriving and being completed.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20211124202005.989935-1-peter.maydell@linaro.org
The virt machine has properties to enable MTE and Nested Virtualization
support. However, its check to ensure the backing accel implementation
supports it today only looks for KVM and bails out if it finds it.
Extend the checks to HVF as well as it does not support either today.
This will cause QEMU to print a useful error message rather than
silently ignoring the attempt by the user to enable either MTE or
the Virtualization extensions.
Reported-by: saar amar <saaramar5@gmail.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-id: 20211123122859.22452-1-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 18f6290a6a ("hw/intc: GICv3 ITS initial framework")
incremented version_id and minimum_version_id fields of
VMStateDescription vmstate_its. This breaks the migration between
6.2 and 6.1 with the following message:
qemu-system-aarch64: savevm: unsupported version 1 for 'arm_gicv3_its' v0
qemu-system-aarch64: load of migration failed: Invalid argument
Revert that change.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20211122171020.1195483-1-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Once a "One Time Programmable" is programmed, it shouldn't be reset.
Do not re-initialize the OTP content in the DeviceReset handler,
initialize it once in the DeviceRealize one.
Fixes: 9fb45c62ae ("riscv: sifive: Implement a model for SiFive FU540 OTP")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211119104757.331579-1-f4bug@amsat.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Configuring a drive with "if=none" is meant for creation of a backend
only, it should not get automatically assigned to a device frontend.
Use "if=pflash" for the One-Time-Programmable device instead (like
it is e.g. also done for the efuse device in hw/arm/xlnx-zcu102.c).
Since the old way of configuring the device has already been published
with the previous QEMU versions, we cannot remove this immediately, but
have to deprecate it and support it for at least two more releases.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20211119102549.217755-1-thuth@redhat.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The ESCC datasheet states that SPEC_ALLSENT is always set in sync mode and set
in async mode once all characters have cleared the transmitter. Since writes to
SERIAL_DATA use a synchronous chardev API, the guest can never see the state when
transmission is in progress so it is possible to set SPEC_ALLSENT in the
R_SPEC register unconditionally.
This fixes a hang when using the Sun PROM as it attempts to enumerate the
onboard serial devices, and a similar hang in OpenBSD SPARC32 where in both cases
the boot process will not proceed until SPEC_ALLSENT has been set after writing
to W_TXCTRL1.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20211118181835.18497-3-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The "Transmit Interrupts and Transmit Buffer Empty Bit" section of the ESCC
datasheet states the following about the STATUS_TXEMPTY bit: "After a hardware
reset (including a hardware reset by software), or a channel reset, this bit
is set to 1".
Update escc_reset() to set the STATUS_TXEMPTY bit in the R_STATUS register
on device reset as described which fixes a regression whereby the Sun PROM
checks this bit early on startup and gets stuck in an infinite loop if it is
not set.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20211118181835.18497-2-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Bugfixes for 6.2.
# gpg: Signature made Fri 19 Nov 2021 10:33:29 AM CET
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
chardev/wctable: don't free the instance in wctablet_chr_finalize
meson.build: Support ncurses on MacOS and OpenBSD
docs: Spell QEMU all caps
qtest/am53c974-test: add test for reset before transfer
esp: ensure that async_len is reset to 0 during esp_hard_reset()
nvmm: Fix support for stable version
meson: fix botched compile check conversions
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
ppc 6.2 queue:
* fix pmu vmstate
* Fix compile of byte_reverse on new compilers
# gpg: Signature made Fri 19 Nov 2021 12:49:30 PM CET
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-ppc-20211119' of https://github.com/legoater/qemu:
tests/tcg/ppc64le: Fix compile flags for byte_reverse
pmu: fix pmu vmstate subsection list
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change namespaces to be shared namespaces by default (parameter
shared=on). Keep shared=off for older machine types.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
namespaces get moved from the controller to the subsystem if one
is specified.
That keeps the namespaces alive after a controller hot-unplug, but
after a controller hotplug we have to reconnect the namespaces
from the subsystem to the controller.
Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
Cc: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
[k.jensen: only attach to shared and non-detached namespaces]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
While activating device in vmxnet3_acticate_device(), it does not
validate guest supplied configuration values against predefined
minimum - maximum limits. This may lead to integer overflow or
OOB access issues. Add checks to avoid it.
Fixes: CVE-2021-20203
Buglink: https://bugs.launchpad.net/qemu/+bug/1913873
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The subsection is not closed by a NULL marker so this can trigger
a segfault when the pmu vmstate is saved.
This can be easily shown with:
$ ./qemu-system-ppc64 -dump-vmstate vmstate.json
Segmentation fault (core dumped)
Fixes: d811d61fbc ("mac_newworld: add PMU device")
Cc: mark.cave-ayland@ilande.co.uk
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
target-arm queue:
* Support multiple redistributor regions for TCG GICv3
* Send RTC_CHANGE QMP event from pl031
# gpg: Signature made Mon 15 Nov 2021 07:53:40 PM CET
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]
* tag 'pull-target-arm-20211115-1' of https://git.linaro.org/people/pmaydell/qemu-arm:
hw/rtc/pl031: Send RTC_CHANGE QMP event
hw/intc/arm_gicv3: Support multiple redistributor regions
hw/intc/arm_gicv3: Set GICR_TYPER.Last correctly when nb_redist_regions > 1
hw/intc/arm_gicv3: Move checking of redist-region-count to arm_gicv3_common_realize
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
pci,pc,virtio: bugfixes
pci power management fixes
acpi hotplug fixes
misc other fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 15 Nov 2021 05:15:09 PM CET
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
pcie: expire pending delete
pcie: fast unplug when slot power is off
pcie: factor out pcie_cap_slot_unplug()
pcie: add power indicator blink check
pcie: implement slot power control for pcie root ports
pci: implement power state
vdpa: Check for existence of opts.vhostdev
vdpa: Replace qemu_open_old by qemu_open at
virtio: use virtio accessor to access packed event
virtio: use virtio accessor to access packed descriptor flags
tests: bios-tables-test update expected blobs
hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
bios-tables-test: Allow changes in DSDT ACPI tables
hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
pcie: rename 'native-hotplug' to 'x-native-hotplug'
hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines
vhost: Fix last vq queue index of devices with no cvq
vhost: Rename last_index to vq_index_end
softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The PL031 currently is not able to report guest RTC change to the QMP
monitor as opposed to mc146818 or spapr RTCs. This patch adds the call
to qapi_event_send_rtc_change() when the Load Register is written. The
value which is reported corresponds to the difference between the guest
reference time and the reference time kept in softmmu/rtc.c.
For instance adding 20s to the guest RTC value will report 20. Adding
an extra 20s to the guest RTC value will report 20 + 20 = 40.
The inclusion of qapi/qapi-types-misc-target.h in hw/rtl/pl031.c
require to compile the PL031 with specific_ss.add() to avoid
./qapi/qapi-types-misc-target.h:18:13: error: attempt to use poisoned
"TARGET_<ARCH>".
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210920122535.269988-1-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Our GICv3 QOM interface includes an array property
redist-region-count which allows board models to specify that the
registributor registers are not in a single contiguous range, but
split into multiple pieces. We implemented this for KVM, but
currently the TCG GICv3 model insists that there is only one region.
You can see the limit being hit with a setup like:
qemu-system-aarch64 -machine virt,gic-version=3 -smp 124
Add support for split regions to the TCG GICv3. To do this we switch
from allocating a simple array of MemoryRegions to an array of
GICv3RedistRegion structs so that we can use the GICv3RedistRegion as
the opaque pointer in the MemoryRegion read/write callbacks. Each
GICv3RedistRegion contains the MemoryRegion, a backpointer allowing
the read/write callback to get hold of the GICv3State, and an index
which allows us to calculate which CPU's redistributor is being
accessed.
Note that arm_gicv3_kvm always passes in NULL as the ops argument
to gicv3_init_irqs_and_mmio(), so the only MemoryRegion read/write
callbacks we need to update to handle this new scheme are the
gicv3_redist_read/write functions used by the emulated GICv3.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
The 'Last' bit in the GICR_TYPER GICv3 redistributor register is
supposed to be set to 1 if this is the last redistributor in a series
of contiguous redistributor pages. Currently we set Last only for
the redistributor for CPU (num_cpu - 1). This only works if there is
a single redistributor region; if there are multiple redistributor
regions then we need to set the Last bit for the last redistributor
in each region.
This doesn't cause any problems currently because only the KVM GICv3
supports multiple redistributor regions, and it ignores the value in
GICv3State::gicr_typer. But we need to fix this before we can enable
support for multiple regions in the emulated GICv3.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
The GICv3 devices have an array property redist-region-count.
Currently we check this for errors (bad values) in
gicv3_init_irqs_and_mmio(), just before we use it. Move this error
checking to the arm_gicv3_common_realize() function, where we
sanity-check all of the other base-class properties. (This will
always be before gicv3_init_irqs_and_mmio() is called, because
that function is called in the subclass realize methods, after
they have called the parent-class realize.)
The motivation for this refactor is:
* we would like to use the redist_region_count[] values in
arm_gicv3_common_realize() in a subsequent patch, so we need
to have already done the sanity-checking first
* this removes the only use of the Error** argument to
gicv3_init_irqs_and_mmio(), so we can remove some error-handling
boilerplate
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Add an expire time for pending delete, once the time is over allow
pressing the attention button again.
This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-7-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In case the slot is powered off (and the power indicator turned off too)
we can unplug right away, without round-trip to the guest.
Also clear pending attention button press, there is nothing to care
about any more.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-6-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With this patch hot-plugged pci devices will only be visible to the
guest if the guests hotplug driver has enabled slot power.
This should fix the hot-plug race which one can hit when hot-plugging
a pci device at boot, while the guest is in the middle of the pci bus
scan.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-3-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows to power off pci devices. In "off" state the devices will
not be visible. No pci config space access, no pci bar access, no dma.
Default state is "on", so this patch (alone) should not change behavior.
Use case: Allows hotplug controllers implement slot power. Hotplug
controllers doing so should set the inital power state for devices in
the ->plug callback.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-2-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We used to access packed descriptor event and off_wrap via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not atomic which may lead a wrong value to be read or
wrote.
This patch fixes this by switching to use
virito_{stw|lduw}_phys_cached() to make sure the access is atomic.
Fixes: 683f766567 ("virtio: event suppression support for packed ring")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20211111063854.29060-2-jasowang@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We used to access packed descriptor flags via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not an atomic operation which may lead a wrong value
is read or wrote.
So this patch switches to use virito_{stw|lduw}_phys_cached() to make
sure the aceess is atomic.
Fixes: 86044b24e8 ("virtio: basic packed virtqueue support")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20211111063854.29060-1-jasowang@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are two ways to enable ACPI PCI Hot-plug:
* Disable the Hot-plug Capable bit on PCIe slots.
This was the first approach which led to regression [1-2], as
I/O space for a port is allocated only when it is hot-pluggable,
which is determined by HPC bit.
* Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
method.
This removes the (future) ability of hot-plugging switches with PCIe
Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
bridges. If the user wants to explicitely use this feature, they can
disable ACPI PCI Hot-plug with:
--global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
instead of PCIe Native.
[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211112110857.3116853-5-imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Mark property as experimental/internal adding 'x-' prefix.
Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.
It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20211112110857.3116853-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ppc 6.2 queue :
* Fix of a regression in floating point load instructions (Matheus)
* Associativity fix for pseries machine (Daniel)
* tlbivax fix for BookE machines (Danel)
# gpg: Signature made Fri 12 Nov 2021 12:11:29 PM CET
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-ppc-20211112' of https://github.com/legoater/qemu:
ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()
spapr_numa.c: fix FORM1 distance-less nodes
target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Fixes for SGX
* force_rcu notifiers
# gpg: Signature made Wed 10 Nov 2021 10:57:48 PM CET
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
sgx: Reset the vEPC regions during VM reboot
numa: avoid crash with SGX and "info numa"
accel/tcg: Register a force_rcu notifier
rcu: Introduce force_rcu notifier
target/i386: sgx: mark device not user creatable
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When trying to use the pc-dimm device on a non-NUMA machine, we get:
$ qemu-system-arm -M none -cpu max -S \
-object memory-backend-file,id=mem1,size=1M,mem-path=/tmp/1m \
-device pc-dimm,id=dimm1,memdev=mem1
Segmentation fault (core dumped)
(gdb) bt
#0 pc_dimm_realize (dev=0x555556da3e90, errp=0x7fffffffcd10) at hw/mem/pc-dimm.c:184
#1 0x0000555555fe1f8f in device_set_realized (obj=0x555556da3e90, value=true, errp=0x7fffffffce18) at hw/core/qdev.c:531
#2 0x0000555555feb4a9 in property_set_bool (obj=0x555556da3e90, v=0x555556e54420, name=0x5555563c3c41 "realized", opaque=0x555556a704f0, errp=0x7fffffffce18) at qom/object.c:2257
To avoid that crash, restrict the pc-dimm NUMA check to machines
supporting NUMA, and do not allow the use of 'node' property on
non-NUMA machines.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20211106145016.611332-1-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The -1 assumes that cvq device model is accounted in data_queue_pairs,
if cvq does not exists, but it's actually the opposite: Devices with
!cvq are ok but devices with cvq does not add the last queue to
data_queue_pairs.
This is not a problem to vhost-net, but it is to vhost-vdpa:
* Devices with cvq gets initialized at last data vq device model, not
at cvq one.
* Devices with !cvq never gets initialized, since last_index is the
first queue of the last device model.
Because of that, the right change in last_index is to actually add the
cvq, not to remove the missing one.
This is not a problem to vhost-net, but it is to vhost-vdpa, which
device model trust to reach the last index to finish starting the
device.
Also, as the previous commit, rename it to index_end.
Tested with vp_vdpa with host's vhost=on and vhost=off, with ctrl_vq=on
and ctrl_vq=off.
Fixes: 049eb15b5f ("vhost: record the last virtqueue index for the virtio device")
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20211104085625.2054959-3-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot. For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.
Qemu can invoke the ioctl to bring its vEPC pages back to uninitialized
state. There is a possibility that some pages fail to be removed if they
are SECS pages, and the child and SECS pages could be in separate vEPC
regions. Therefore, the ioctl returns the number of EREMOVE failures,
telling Qemu to try the ioctl again after it's done with all vEPC regions.
The related kernel patches:
Link: https://lkml.kernel.org/r/20211021201155.1523989-3-pbonzini@redhat.com
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 71e6fae3a9 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.
During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:
$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 160 160 160
1: 160 10 160 160
2: 160 160 10 160
3: 160 160 160 10
Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a9 applies to FORM1 as well.
This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.
After this patch, the FORM1 guest mentioned above will have the
following topology:
$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10
This is compatible with what FORM2 guests and other archs do in this
case.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
Fixes: 690fbe4295 ("spapr_numa: consider user input when defining associativity")
CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Add the MEMORY_DEVICE_INFO_KIND_SGX_EPC case, so that enclave
memory is included in the output of "info numa" instead of crashing
the monitor.
Fixes: a7c565a941 ("sgx-epc: Add the fill_device_info() callback support", 2021-09-30)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The device is created by the machine based on the sgx-epc property.
It should not be created by users.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>