It uses [offset, offset + size - 1] to indicate that the length of range is
size in most places in vfio trace code (such as
trace_vfio_region_region_mmap()) execpt trace_vfio_region_sparse_mmap_entry().
So change it for trace_vfio_region_sparse_mmap_entry(), but if size is zero,
the trace will be weird with an underflow, so move the trace and trace it
only if size is not zero.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1650100104-130737-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In migration resume phase, all unmasked msix vectors need to be
setup when loading the VF state. However, the setup operation would
take longer if the VM has more VFs and each VF has more unmasked
vectors.
The hot spot is kvm_irqchip_commit_routes, it'll scan and update
all irqfds that are already assigned each invocation, so more
vectors means need more time to process them.
vfio_pci_load_config
vfio_msix_enable
msix_set_vector_notifiers
for (vector = 0; vector < dev->msix_entries_nr; vector++) {
vfio_msix_vector_do_use
vfio_add_kvm_msi_virq
kvm_irqchip_commit_routes <-- expensive
}
We can reduce the cost by only committing once outside the loop.
The routes are cached in kvm_state, we commit them first and then
bind irqfd for each vector.
The test VM has 128 vcpus and 8 VF (each one has 65 vectors),
we measure the cost of the vfio_msix_enable for each VF, and
we can see 90+% costs can be reduce.
VF Count of irqfds[*] Original With this patch
1st 65 8 2
2nd 130 15 2
3rd 195 22 2
4th 260 24 3
5th 325 36 2
6th 390 44 3
7th 455 51 3
8th 520 58 4
Total 258ms 21ms
[*] Count of irqfds
How many irqfds that already assigned and need to process in this
round.
The optimization can be applied to msi type too.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-6-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Commit ecebe53fe9 ("vfio: Avoid disabling and enabling vectors
repeatedly in VFIO migration") avoids inefficiently disabling and
enabling vectors repeatedly and lets the unmasked vectors be enabled
one by one.
But we want to batch multiple routes and defer the commit, and only
commit once outside the loop of setting vector notifiers, so we
cannot enable the vectors one by one in the loop now.
Revert that commit and we will take another way in the next patch,
it can not only avoid disabling/enabling vectors repeatedly, but
also satisfy our requirement of defer to commit.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-5-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When pulling or pushing an OS context from/to a CPU, we should
re-evaluate the state of the External interrupt signal. Otherwise, we
can end up catching the External interrupt exception in hypervisor
mode, which is unexpected.
The problem is best illustrated with the following scenario:
1. an External interrupt is raised while the guest is on the CPU.
2. before the guest can ack the External interrupt, an hypervisor
interrupt is raised, for example the Hypervisor Decrementer or
Hypervisor Virtualization interrupt. The hypervisor interrupt forces
the guest to exit while the External interrupt is still pending.
3. the hypervisor handles the hypervisor interrupt. At this point, the
External interrupt is still pending. So it's very likely to be
delivered while the hypervisor is running. That's unexpected and can
result in an infinite loop where the hypervisor catches the External
interrupt, looks for an interrupt in its hypervisor queue, doesn't
find any, exits the interrupt handler with the External interrupt
still raised, repeat...
The fix is simply to always lower the External interrupt signal when
pulling an OS context. It means it needs to be raised again when
re-pushing the OS context. Fortunately, it's already the case, as we
now always call xive_tctx_ipb_update(), which will raise the signal if
needed.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-Id: <20220429071620.177142-3-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The Post Interrupt Priority Register (PIPR) is not restored like the
other OS-context related fields of the TIMA when pushing an OS context
on the CPU. It's not needed because it can be calculated from the
Interrupt Pending Buffer (IPB), which is saved and restored. The PIPR
must therefore always be recomputed when pushing an OS context.
This patch fixes a path on P9 and P10 where it was not done. If there
was a pending interrupt when the OS context was pulled, the IPB was
saved correctly. When pushing back the context, the code in
xive_tctx_need_resend() was checking for a interrupt raised while the
context was not on the CPU, saved in the NVT. If one was found, then
it was merged with the saved IPB and the PIPR updated and everything
was fine. However, if there was no interrupt found in the NVT, then
xive_tctx_ipb_update() was not being called and the PIPR was not
updated. This patch fixes it by always calling xive_tctx_ipb_update().
Note that on P10 (xive2.c) and because of the above, there's no longer
any need to check the CPPR value so it can go away.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-Id: <20220429071620.177142-2-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
target-arm queue:
* Enable read access to performance counters from EL0
* Enable SCTLR_EL1.BT0 for aarch64-linux-user
* Refactoring of cpreg handling
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmJzlJYZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3rQXEACqtWhESD9ZJ0T1DfiWh7HX
# KXZuvB5C4kEdY8KXPJsdFM47KGMB29AI1pfqN5oRvalGG40ROM1HNTO44LSjKgUr
# b+aEq0bcbOJQuhfc5EPoh3b9wekowxlBYsH3Zq251J6ua6dRd1iqdeGXFIZbn02x
# RY5lXB2wgWh8LnF+qwoLiIrqJWsJ8PSOolyl0LrKjI3Z22UboK1Y5K0sbJBlavX4
# xKEyd4Af1Jq+1GcleSymAjcNF1iO+38w6rrFSgMWj+f3HSjKCk+MHU78rfqVNa88
# ESRjBj1x3c8kRzNzy+Q8ntJ5QzREvFDpUYBC9lvnoLKQ6xRJWDvvZQw2YJGsH8sB
# Xgg8fQ75iYEQdN4SHLWn24OwZpKuzTZ4QYm0d02GiAZCGXgAFEIKG62lBd3UJTAy
# 6wTUdjuLv/KA+Lc3qdvmFfOVxfPh728VvFl55IoGXZv9FFrxvrluLEgr3TIje9W3
# 0r1FcjtAuuTHzKiaf8UsmvMW9nR550L1xQ+uMY8GKQvQgSvkf050srVZS05GFItH
# DqCUv++hsyi0b44J377cUKkAEOdH/rhV20pvvfoJthRgmHLNN5LG61JI9eK9JXzl
# +AYpbxAC3R6f0dp6/31D0ZRhW7wcC/rt1EVK/iACVKoGo8hZf3lC64y2+3TVoApF
# DdCadVNnR9eUFWh1inGXKQ==
# =Q7ra
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 05 May 2022 04:10:46 AM CDT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]
* tag 'pull-target-arm-20220505' of https://git.linaro.org/people/pmaydell/qemu-arm: (23 commits)
target/arm: read access to performance counters from EL0
target/arm: Add isar_feature_{aa64,any}_ras
target/arm: Add isar predicates for FEAT_Debugv8p2
target/arm: Remove HOST_BIG_ENDIAN ifdef in add_cpreg_to_hashtable
target/arm: Reformat comments in add_cpreg_to_hashtable
target/arm: Perform override check early in add_cpreg_to_hashtable
target/arm: Hoist isbanked computation in add_cpreg_to_hashtable
target/arm: Use bool for is64 and ns in add_cpreg_to_hashtable
target/arm: Consolidate cpreg updates in add_cpreg_to_hashtable
target/arm: Hoist computation of key in add_cpreg_to_hashtable
target/arm: Merge allocation of the cpreg and its name
target/arm: Store cpregs key in the hash table directly
target/arm: Drop always-true test in define_arm_vh_e2h_redirects_aliases
target/arm: Name CPSecureState type
target/arm: Name CPState type
target/arm: Change cpreg access permissions to enum
target/arm: Avoid bare abort() or assert(0)
target/arm: Reorg ARMCPRegInfo type field bits
target/arm: Make some more cpreg data static const
target/arm: Replace sentinels with ARRAY_SIZE in cpregs.h
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The spec clarifies now that QEMU should not send a file descriptor in a
request to remove a memory region. Change it accordingly.
For libvhost-user, this is a bug fix that makes it compatible with
rust-vmm's implementation that doesn't send a file descriptor. Keep
accepting, but ignoring a file descriptor for compatibility with older
QEMU versions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220407133657.155281-4-kwolf@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qemu_*block() functions are meant to be be used with sockets (the
win32 implementation expects SOCKET)
Over time, those functions where used with Win32 SOCKET or
file-descriptors interchangeably. But for portability, they must only be
used with socket-like file-descriptors. FDs can use
g_unix_set_fd_nonblocking() instead.
Rename the functions with "socket" in the name to prevent bad usages.
This is effectively reverting commit f9e8cacc55 ("oslib-posix:
rename socket_set_nonblock() to qemu_set_nonblock()").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Those calls are non-socket fd, or are POSIX-specific. Use the dedicated
GLib API. (qemu_set_nonblock() is for socket-like)
(this is a preliminary patch before renaming qemu_set_nonblock())
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Per ast1030_v7.pdf, AST1030 HACE engine is identical to AST2600's HACE
engine.
Signed-off-by: Steven Lee <steven_lee@aspeedtech.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
I was setting gpioV4-7 to "1110" using the QOM pin property handler and
noticed that lowering gpioV7 was inadvertently lowering gpioV4-6 too.
(qemu) qom-set /machine/soc/gpio gpioV4 true
(qemu) qom-set /machine/soc/gpio gpioV5 true
(qemu) qom-set /machine/soc/gpio gpioV6 true
(qemu) qom-get /machine/soc/gpio gpioV4
true
(qemu) qom-set /machine/soc/gpio gpioV7 false
(qemu) qom-get /machine/soc/gpio gpioV4
false
An expression in aspeed_gpio_set_pin_level was using a logical NOT
operator instead of a bitwise NOT operator:
value &= !pin_mask;
The original author probably intended to make a bitwise NOT expression
"~", but mistakenly used a logical NOT operator "!" instead. Some
programming languages like Rust use "!" for both purposes.
Fixes: 4b7f956862 ("hw/gpio: Add basic Aspeed GPIO model for AST2400 and
AST2500")
Signed-off-by: Peter Delevoryas <pdel@fb.com>
Message-Id: <20220502080827.244815-1-pdel@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The aspeed ast2600 accumulative mode is described in datasheet
ast2600v10.pdf section 25.6.4:
1. Allocating and initiating accumulative hash digest write buffer
with initial state.
* Since QEMU crypto/hash api doesn't provide the API to set initial
state of hash library, and the initial state is already set by
crypto library (gcrypt/glib/...), so skip this step.
2. Calculating accumulative hash digest.
(a) When receiving the last accumulative data, software need to add
padding message at the end of the accumulative data. Padding
message described in specific of MD5, SHA-1, SHA224, SHA256,
SHA512, SHA512/224, SHA512/256.
* Since the crypto library (gcrypt/glib) already pad the
padding message internally.
* This patch is to remove the padding message which fed byguest
machine driver.
Signed-off-by: Troy Lee <troy_lee@aspeedtech.com>
Signed-off-by: Steven Lee <steven_lee@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220426021120.28255-3-steven_lee@aspeedtech.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Current fmc model of AST2500 EVB and AST2600 EVB can't emulate quad
mode properly so fix them using equivalent mx25l25635e and mx66u51235f
respectively.
These default settings still can be overridden using the 'fmc-model'
command line option.
Reported-by: Graeme Gregory <quic_ggregory@quicinc.com>
Signed-off-by: Jae Hyun Yoo <quic_jaehyoo@quicinc.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220402184427.4010304-1-quic_jaehyoo@quicinc.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Guest code (u-boot) pokes at this on boot. No functionality is required
for guest code to work correctly, but it helps to document the region
being read from.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220318092211.723938-1-joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
In order to correctly report secure boot running firmware, these values
must be set. They are taken from a running machine when secure boot is
enabled.
We don't yet have documentation from ASPEED on what they mean. Set the
raw values for now, and in the future improve the model with properties
to set these on a per-machine basis.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220310052159.183975-1-joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
AST2600's HPLL register offset and bit definition are different from
AST2500. Add a hpll calculation function and an apb frequency calculation
function based on SCU200 register description in ast2600v11.pdf.
Signed-off-by: Steven Lee <steven_lee@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220315075753.8591-2-steven_lee@aspeedtech.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
When mapped POSIX ACL is used, we are ignoring errors when trying
to remove a POSIX ACL xattr that does not exist. On Linux hosts we
would get ENODATA in such cases, on macOS hosts however we get
ENOATTR instead.
As we can be sure that ENOATTR is defined as being identical on Linux
hosts (at least by qemu/xattr.h), it is safe to fix this issue by
simply comparing against ENOATTR instead of ENODATA.
This patch fixes e.g. a command on Linux guest like:
cp --preserve=mode old new
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/2866993.yOYK24bMf6@silver/
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-Id: <34f81e9bffd7a3e65fb7aab5b56c107bd0aac960.1651228001.git.qemu_oss@crudebyte.com>
Linux and macOS only share some errno definitions with equal macro
name and value. In fact most mappings for errno are completely
different on the two systems.
This patch converts some important errno values from macOS host to
corresponding Linux errno values before eventually sending such error
codes along with 'Rlerror' replies (if 9p2000.L is used that is). Not
having translated errnos before violated the 9p2000.L protocol spec,
which says:
"
size[4] Rlerror tag[2] ecode[4]
... ecode is a numerical Linux errno.
"
https://github.com/chaos/diod/wiki/protocol#lerror----return-error-code
This patch fixes a bunch of misbehaviours when running a Linux client
on macOS host. For instance this patch fixes:
mount -t 9p -o posixacl ...
on Linux guest if security_mode=mapped was used for 9p server, which
refused to mount successfully, because macOS returned ENOATTR==93
when client tried to retrieve POSIX ACL xattrs, because errno 93
is defined as EPROTONOSUPPORT==93 on Linux, so Linux client believed
that xattrs were not supported by filesystem on host in general.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/20220421124835.3e664669@bahia/
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-Id: <b322ab298a62069e527d2b032028bdc9115afacd.1651228001.git.qemu_oss@crudebyte.com>
The 'rdev' field in 9p reponse 'Rgetattr' is of type dev_t,
which is actually a system dependant type and therefore both the
size and encoding of dev_t differ between macOS and Linux.
So far we have sent 'rdev' to guest in host's dev_t format as-is,
which caused devices to appear with wrong device numbers on
guests running on macOS hosts, eventually leading to various
misbehaviours on guest in conjunction with device files.
This patch fixes this issue by converting the device number from
host's dev_t format to Linux dev_t format. As 9p request
'Tgettattr' is exclusive to protocol version 9p2000.L, it should
be fair to assume that 'rdev' field is assumed to be in Linux dev_t
format by client as well.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/20220421093056.5ab1e7ed@bahia/
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-Id: <b3a430c2c382ba69a7405e04c0b090ab0d86f17e.1651228001.git.qemu_oss@crudebyte.com>
The 'synth' driver's root node and the 'synth' driver's first
subdirectory node falsely share the same inode number (zero), which
makes it impossible for 9p clients (i.e. 9p test cases) to distinguish
root node and first subdirectory from each other by comparing their QIDs
(which are derived by 9p server from driver's inode numbers).
Fix this issue by using prefix-increment instead of postfix-increment
operator while generating new inode numbers for subdirectories and files.
Link: https://lore.kernel.org/qemu-devel/3859307.hTDP4D0zbi@silver/
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1nTpyU-0000yR-9o@lizzy.crudebyte.com>
Imply the TPM sysbus devices. This allows users to add TPM devices to
the RISC-V virt board.
This was tested by first creating an emulated TPM device:
swtpm socket --tpm2 -t -d --tpmstate dir=/tmp/tpm \
--ctrl type=unixio,path=swtpm-sock
Then launching QEMU with:
-chardev socket,id=chrtpm,path=swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis-device,tpmdev=tpm0
The TPM device can be seen in the memory tree and the generated device
tree.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/942
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20220427234146.1130752-7-alistair.francis@opensource.wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>