Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.
This is currently only used for the secondary channels of fixed-ram
migration, which can guarantee that writes are page aligned.
However the parameter could be made to affect other types of
file-based migrations in the future.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
For the incoming fixed-ram migration we need to read the ramblock
headers, get the pages bitmap and send the host address of each
non-zero page to the multifd channel thread for writing.
To read from the migration file we need a preadv function that can
read into the iovs in segments of contiguous pages because (as in the
writing case) the file offset applies to the entire iovec.
Usage on HMP is:
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability fixed-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate_incoming file:migfile
(qemu) info status
(qemu) c
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The new fixed-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.
Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.
This requires multifd to stop sending headers and leaving the stream
format to the fixed-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.
Usage on HMP is:
(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability fixed-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Some functionalities of multifd are incompatible with the 'fixed-ram'
migration format.
The MULTIFD_FLUSH flag in particular is not used because in fixed-ram
there is no sinchronicity between migration source and destination so
there is not need for a sync packet. In fact, fixed-ram disables
packets in multifd as a whole.
Make sure RAM_SAVE_FLAG_MULTIFD_FLUSH is never emitted when fixed-ram
is enabled.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We'll need to set the shadow_bmap bits from outside ram.c soon and
TARGET_PAGE_BITS is poisoned, so add a wrapper to it.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
For the upcoming support to fixed-ram migration with multifd, we need
to be able to accept an iovec array with non-contiguous data.
Add a pwritev and preadv version that splits the array into contiguous
segments before writing. With that we can have the ram code continue
to add pages in any order and the multifd code continue to send large
arrays for reading and writing.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
Since iovs can be non contiguous, we'd need a separate array on the
side to carry an extra file offset for each of them, so I'm relying on
the fact that iovs are all within a same host page and passing in an
encoded offset that takes the host page into account.
Currently multifd does not need to have knowledge of pages on the
receiving side because all the information needed is within the
packets that come in the stream.
We're about to add support to fixed-ram migration, which cannot use
packets because it expects the ramblock section in the migration file
to contain only the guest pages data.
Add a pointer to MultiFDPages in the multifd_recv_state and use the
pages similarly to what we already do on the sending side. The pages
are used to transfer data between the ram migration code in the main
migration thread and the multifd receiving threads.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Allow multifd to open file-backed channels. This will be used when
enabling the fixed-ram migration stream format which expects a
seekable transport.
The QIOChannel read and write methods will use the preadv/pwritev
versions which don't update the file offset at each call so we can
reuse the fd without re-opening for every channel.
Note that this is just setup code and multifd cannot yet make use of
the file channels.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
For the upcoming support to the new 'fixed-ram' migration stream
format, we cannot use multifd packets because each write into the
ramblock section in the migration file is expected to contain only the
guest pages. They are written at their respective offsets relative to
the ramblock section header.
There is no space for the packet information and the expected gains
from the new approach come partly from being able to write the pages
sequentially without extraneous data in between.
The new format also doesn't need the packets and all necessary
information can be taken from the standard migration headers with some
(future) changes to multifd code.
Use the presence of the fixed-ram capability to decide whether to send
packets. For now this has no effect as fixed-ram cannot yet be enabled
with multifd.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add a completion tracepoint that provides basic stats for
debug. Displays throughput (MB/s and pages/s) and total time (ms).
Usage:
$QEMU ... -trace migration_status
Output:
migration_status 1506 MB/s, 436725 pages/s, 8698 ms
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add the necessary code to parse the format changes for the 'fixed-ram'
capability.
One of the more notable changes in behavior is that in the 'fixed-ram'
case ram pages are restored in one go rather than constantly looping
through the migration stream.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
(farosas) reused more of the common code by making the fixed-ram
function take only one ramblock and calling it from inside
parse_ramblock.
Implement the outgoing migration side for the 'fixed-ram' capability.
A bitmap is introduced to track which pages have been written in the
migration file. Pages are written at a fixed location for every
ramblock. Zero pages are ignored as they'd be zero in the destination
migration as well.
The migration stream is altered to put the dirty pages for a ramblock
after its header instead of having a sequential stream of pages that
follow the ramblock headers. Since all pages have a fixed location,
RAM_SAVE_FLAG_EOS is no longer generated on every migration iteration.
Without fixed-ram (current):
ramblock 1 header|ramblock 2 header|...|RAM_SAVE_FLAG_EOS|stream of
pages (iter 1)|RAM_SAVE_FLAG_EOS|stream of pages (iter 2)|...
With fixed-ram (new):
ramblock 1 header|ramblock 1 fixed-ram header|ramblock 1 pages (fixed
offsets)|ramblock 2 header|ramblock 2 fixed-ram header|ramblock 2
pages (fixed offsets)|...|RAM_SAVE_FLAG_EOS
where:
- ramblock header: the generic information for a ramblock, such as
idstr, used_len, etc.
- ramblock fixed-ram header: the new information added by this
feature: bitmap of pages written, bitmap size and offset of pages
in the migration file.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
To facilitate the implementation of the 'fixed-ram' migration restore,
factor out the code responsible for parsing the ramblocks
headers. This also makes ram_load_precopy easier to comprehend.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add a new migration capability 'fixed-ram'.
The core of the feature is to ensure that each ram page has a specific
offset in the resulting migration stream. The reason why we'd want
such behavior are two fold:
- When doing a 'fixed-ram' migration the resulting file will have a
bounded size, since pages which are dirtied multiple times will
always go to a fixed location in the file, rather than constantly
being added to a sequential stream. This eliminates cases where a vm
with, say, 1G of ram can result in a migration file that's 10s of
GBs, provided that the workload constantly redirties memory.
- It paves the way to implement DIRECT_IO-enabled save/restore of the
migration stream as the pages are ensured to be written at aligned
offsets.
For now, enabling the capability has no effect. The next couple of
patches implement the core funcionality.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The fixed-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add utility methods that will be needed when implementing 'fixed-ram'
migration capability.
qemu_file_is_seekable
qemu_put_buffer_at
qemu_get_buffer_at
qemu_set_offset
qemu_get_offset
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
fixed total_transferred accounting
restructured to use qio_channel_file_preadv instead of the _full
variant
The upcoming 'fixed-ram' feature will require qemu to write data to
(and restore from) specific offsets of the migration file.
Add a minimal implementation of pwritev/preadv and expose them via the
io_pwritev and io_preadv interfaces.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Introduce basic pwritev/preadv support in the generic channel layer.
Specific implementation will follow for the file channel as this is
required in order to support migration streams with fixed location of
each ram page.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add a generic QIOChannel feature SEEKABLE which would be used by the
qemu_file* apis. For the time being this will be only implemented for
file channels.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We don't need to do this in two pieces. One single function makes it
easier to grasp, specially since it removes one indirection in the
return value.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
It makes a bit more send to have the zero page handling of xbzrle
right where we save the zero page.
This also makes save_zero_page() follow the same format as
save_compress_page() at the top level of ram_save_target_page_legacy().
Signed-off-by: Fabiano Rosas <farosas@suse.de>
'rs' is not used in that function. Commit 9360447d34 ("ram: Use
MigrationStats for statistics") forgot to remove it.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Adapt the file migration tests to take into account the auto-pause
feature.
The test currently has a flag 'stop_src' that is used to know if the
test itself should stop the VM. Add a new flag 'auto_pause' to enable
QEMU to stop the VM instead.. The two in combination allow us to
migrate a already stopped VM and check that it is still stopped on the
destination (auto-pause in effect restoring the original state).
By adding a more precise tracking of migration state changes, we can
also make sure that auto-pause is actually stopping the VM right after
qmp_migrate(), as opposed to the vm_stop() that happens at
migration_complete().
When resuming the destination a similar situation occurs, we use
'stop_src' to have a stopped VM and check that the destination does
not get a "resume" event.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The file migration is asynchronous, so it benefits from being done
with a stopped VM. Allow the file migration to take benefit of the
auto-pause capability.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add a capability that allows the management layer to delegate to QEMU
the decision of whether to pause a VM and perform a non-live
migration. Depending on the type of migration being performed, this
could bring performance benefits.
Note that the capability is enabled by default but at this moment no
migration scheme is making use of it.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
There are some situations during migration when we want to change the
runstate of the VM, but don't actually want the new runstate to be put
on the wire to be restored on the destination VM. In those cases, the
pattern is to use global_state_store() to save the state for migration
before changing it.
One scenario where this happens is when switching the source VM into
the FINISH_MIGRATE state. This state only makes sense on the source
VM. Another situation is when pausing the source VM prior to migration
completion.
We are about to introduce a third scenario when the whole migration
should be performed with a paused VM. In this case we will want to
save the VM runstate at the very start of the migration and that state
will be the one restored on the destination regardless of all the
runstate changes that happen in between.
To achieve that we need to make sure that the other two calls to
global_state_store() do not overwrite the state that is to be
migrated.
Introduce a version of global_state_store() that only saves the state
if no other state has already been saved.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
There is a pattern of calling runstate_get() to store the current
runstate and calling global_state_store() to save the current runstate
for migration. Since global_state_store() also calls runstate_get(),
make it return the runstate instead.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add support for waiting for a migration state change event to
happen. This can help disambiguate between runstate changes that
happen during VM lifecycle.
Specifically, the next couple of patches want to know whether STOP
events happened at the migration start or end. Add the "setup" and
"active" migration states for that purpose.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Move the QTestMigrationState into QTestState so we don't have to pass
it around to the wait_for_* helpers anymore. Since QTestState is
private to libqtest.c, move the migration state struct to libqtest.h
and add a getter.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
There is currently no way to write a test for errors that happened in
qmp_migrate before the migration has started.
Add a version of qmp_migrate that ensures an error happens. To make
use of it a test needs to set MigrateCommon.result as
MIG_TEST_QMP_ERROR.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We are sending a migration event of MIGRATION_STATUS_SETUP at
qemu_start_incoming_migration but never actually setting the state.
This creates a window between qmp_migrate_incoming and
process_incoming_migration_co where the migration status is still
MIGRATION_STATUS_NONE. Calling query-migrate during this time will
return an empty response even though the incoming migration command
has already been issued.
Commit 7cf1fe6d68 ("migration: Add migration events on target side")
has added support to the 'events' capability to the incoming part of
migration, but chose to send the SETUP event without setting the
state. I'm assuming this was a mistake.
This introduces a change in behavior, any QMP client waiting for the
SETUP event will hang, unless it has previously enabled the 'events'
capability. Having the capability enabled is sufficient to continue to
receive the event.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Use the new migrate_incoming_qmp helper in the places that currently
open-code calling migrate-incoming.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
file-based migration requires the target to initiate its migration after
the source has finished writing out the data in the file. Currently
there's no easy way to initiate 'migrate-incoming', allow this by
introducing migrate_incoming_qmp helper, similarly to migrate_qmp.
Also make sure migration events are enabled and wait for the incoming
migration to start before returning. This avoid a race when querying
the migration status too soon after issuing the command.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The following patch will make use of this function from within
migrate-helpers.c, so move it there.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Allow an offset option to be specified as part of the file URI, in
the form "file:filename,offset=offset", where offset accepts the common
size suffixes, or the 0x prefix, but not both. Migration data is written
to and read from the file starting at offset. If unspecified, it defaults
to 0.
This is needed by libvirt to store its own data at the head of the file.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Extend the migration URI to support file:<filename>. This can be used for
any migration scenario that does not require a reverse path. It can be
used as an alternative to 'exec:cat > file' in minimized containers that
do not contain /bin/sh, and it is easier to use than the fd:<fdname> URI.
It can be used in HMP commands, and as a qemu command-line parameter.
For best performance, guest ram should be shared and x-ignore-shared
should be true, so guest pages are not written to the file, in which case
the guest may remain running. If ram is not so configured, then the user
is advised to stop the guest first. Otherwise, a busy guest may re-dirty
the same page, causing it to be appended to the file multiple times,
and the file may grow unboundedly. That issue is being addressed in the
"fixed-ram" patch series.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
We've found the source of flakiness in this test, so re-enable it.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This wasn't noticed because the test is currently disabled.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Fixes: 02f56e3de ("tests/qtest: massively speed up migration-test")
This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.
Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.
Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're about to add more functions to this file so make it use the same
coding style as the rest of the code.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Update $linux_arch to keep using the shared linux-headers/asm-riscv/
include path.
Fixes: e3e477c3bc ("configure: Fix cross-building for RISCV host")
Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[rth: Missed v5, so now applying the diff between v4 and v5.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
While when building on native Linux the host architecture
is reported as "riscv32" or "riscv64":
Host machine cpu family: riscv64
Host machine cpu: riscv64
Found pkg-config: /usr/bin/pkg-config (0.29.2)
Since commit ba0e733362 ("configure: Merge riscv32 and riscv64
host architectures"), when cross-compiling it is detected as
"riscv". Meson handles the cross-detection but displays a warning:
WARNING: Unknown CPU family riscv, please report this at https://github.com/mesonbuild/meson/issues/new
Host machine cpu family: riscv
Host machine cpu: riscv
Target machine cpu family: riscv
Target machine cpu: riscv
Found pkg-config: /usr/bin/riscv64-linux-gnu-pkg-config (1.8.1)
Now since commit 278c1bcef5 ("target/riscv: Only unify 'riscv32/64'
-> 'riscv' for host cpu in meson") Meson expects the cpu to be in
[riscv32, riscv64]. So when cross-building (for example on our
cross-riscv64-system Gitlab-CI job) we get:
WARNING: Unknown CPU family riscv, please report this at https://github.com/mesonbuild/meson/issues/new
Host machine cpu family: riscv
Host machine cpu: riscv
Target machine cpu family: riscv
Target machine cpu: riscv
../meson.build:684:6: ERROR: Problem encountered: Unsupported CPU riscv, try --enable-tcg-interpreter
Fix by partially revert commit ba0e733362 so when cross-building
the ./configure script passes the proper host architecture to meson.
Fixes: ba0e733362 ("configure: Merge riscv32 and riscv64 host architectures")
Fixes: 278c1bcef5 ("target/riscv: Only unify 'riscv32/64' -> 'riscv' for host cpu in meson")
Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230711110619.56588-1-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
pc,pci,virtio: cleanups, fixes, features
vhost-user-gpu: edid
vhost-user-scmi device
vhost-vdpa: _F_CTRL_RX and _F_CTRL_RX_EXTRA support for svq
cleanups, fixes all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmSsjYMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp2vYH/20u6TAMssE/UAJoUU0ypbJkbHjDqiqDeuZN
# qDYazLUWIJTUbDnSfXAiRcdJuukEpEFcoHa9O6vgFE/SNod51IrvsJR9CbZxNmk6
# D+Px9dkMckDE/yb8f6hhcHsi7/1v04I0oSXmJTVYxWSKQhD4Km6x8Larqsh0u4yd
# n6laZ+VK5H8sk6QvI5vMz+lYavACQVryiWV/GAigP21B0eQK79I5/N6y0q8/axD5
# cpeTzUF+m33SfLfyd7PPmibCQFYrHDwosynSnr3qnKusPRJt2FzWkzOiZgbtgE2L
# UQ/S4sYTBy8dZJMc0wTywbs1bSwzNrkQ+uS0v74z9wCUYTgvQTA=
# =RsOh
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Jul 2023 12:00:19 AM BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (66 commits)
vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ
vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ
vdpa: Avoid forwarding large CVQ command failures
vdpa: Accessing CVQ header through its structure
vhost: Fix false positive out-of-bounds
vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature
vdpa: Restore MAC address filtering state
vdpa: Use iovec for vhost_vdpa_net_load_cmd()
pcie: Specify 0 for ARI next function numbers
pcie: Use common ARI next function number
include/hw/virtio: document some more usage of notifiers
include/hw/virtio: add kerneldoc for virtio_init
include/hw/virtio: document virtio_notify_config
hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments
include/hw: document the device_class_set_parent_* fns
include: attempt to document device_class_set_props
vdpa: Fix possible use-after-free for VirtQueueElement
pcie: Add hotplug detect state register to cmask
virtio-iommu: Rework the traces in virtio_iommu_set_page_size_mask()
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Due to the size limitation of the out buffer sent to the vdpa device,
which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
command is truncated in QEMU. As a result, the vdpa device rejects
this flawd CVQ command.
However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
CVQ command has a variable length, which may exceed
vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
`MAC_TABLE_ENTRIES` MAC addresses for the filter table.
This patch solves this problem by following steps:
* Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
which represents the size of the buffer that is allocated and mmaped.
This ensures that everything works correctly as long as the guest
sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
sizeof(struct virtio_net_ctrl_hdr)
- 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
Considering the highly unlikely scenario for the guest setting
more than that number of MAC addresses for the filter table, this
should work fine for the majority of cases.
* If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
instead of directly sending this CVQ command, QEMU should send
a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
(`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
(`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
to the device model.
By doing so, the vdpa device turns promiscuous mode on, aligning
with the VirtIO standard. The device model marks
`n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
which aligns with the state of the vdpa device.
Note that the bug cannot be triggered at the moment, since
VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.
Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
QEMU uses vhost_svq_translate_addr() to translate addresses
between the QEMU's virtual address and the SVQ IOVA. In order
to validate this translation, QEMU checks whether the translated
range falls within the mapped range.
Yet the problem is that, the value of `needle_last`, which is calculated
by `needle.translated_addr + iovec[i].iov_len`, should represent the
exclusive boundary of the translated range, rather than the last
inclusive addresses of the range. Consequently, QEMU fails the check
when the translated range matches the size of the mapped range.
This patch solves this problem by fixing the `needle_last` value to
the last inclusive address of the translated range.
Note that this bug cannot be triggered at the moment, because QEMU
is unable to translate such a big range due to the truncation of
the CVQ command in vhost_vdpa_net_handle_ctrl_avail().
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <ee31c5420ffc8e6a29705ddd30badb814ddbae1d.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VirtIO standard, "The driver MUST follow
the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number,
followed by that number of non-multicast MAC addresses,
followed by another le32 number, followed by that number
of multicast addresses."
Considering that these data is not stored in contiguous memory,
this patch refactors vhost_vdpa_net_load_cmd() to accept
scattered data, eliminating the need for an addtional data copy or
packing the data into s->cvq_cmd_out_buffer outside of
vhost_vdpa_net_load_cmd().
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <3482cc50eebd13db4140b8b5dec9d0cc25b20b1b.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The current implementers of ARI are all SR-IOV devices. The ARI next
function number field is undefined for VF according to PCI Express Base
Specification Revision 5.0 Version 1.0 section 9.3.7.7. The PF still
requires some defined value so end the linked list formed with the field
by specifying 0 as required for any ARI implementation according to
section 7.8.7.2.
For migration, the field will keep having 1 as its value on the old
QEMU machine versions.
Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt")
Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230710153838.33917-3-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I'm still not sure how I achieve by use case of the parent class
defining the following properties:
static Property vud_properties[] = {
DEFINE_PROP_CHR("chardev", VHostUserDevice, chardev),
DEFINE_PROP_UINT16("id", VHostUserDevice, id, 0),
DEFINE_PROP_UINT32("num_vqs", VHostUserDevice, num_vqs, 1),
DEFINE_PROP_END_OF_LIST(),
};
But for the specialisation of the class I want the id to default to
the actual device id, e.g.:
static Property vu_rng_properties[] = {
DEFINE_PROP_UINT16("id", VHostUserDevice, id, VIRTIO_ID_RNG),
DEFINE_PROP_UINT32("num_vqs", VHostUserDevice, num_vqs, 1),
DEFINE_PROP_END_OF_LIST(),
};
And so far the API for doing that isn't super clear.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-2-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
QEMU uses vhost_handle_guest_kick() to forward guest's available
buffers to the vdpa device in SVQ avail ring.
In vhost_handle_guest_kick(), a `g_autofree` `elem` is used to
iterate through the available VirtQueueElements. This `elem` is
then passed to `svq->ops->avail_handler`, specifically to the
vhost_vdpa_net_handle_ctrl_avail(). If this handler fails to
process the CVQ command, vhost_handle_guest_kick() regains
ownership of the `elem`, and either frees it or requeues it.
Yet the problem is that, vhost_vdpa_net_handle_ctrl_avail()
mistakenly frees the `elem`, even if it fails to forward the
CVQ command to vdpa device. This can result in a use-after-free
for the `elem` in vhost_handle_guest_kick().
This patch solves this problem by refactoring
vhost_vdpa_net_handle_ctrl_avail() to only freeing the `elem` if
it owns it.
Fixes: bd907ae4b0 ("vdpa: manual forward CVQ buffers")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <e3f2d7db477734afe5c6a5ab3fa8b8317514ea34.1688746840.git.yin31149@gmail.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When trying to migrate a machine type pc-q35-6.0 or lower, with this
cmdline options,
-device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
-device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
the following bug happens after all ram pages were sent:
qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
qemu-kvm: load of migration failed: Invalid argument
This happens on pc-q35-6.0 or lower because of:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
hotplug for the guest. After a while the guest will deal with this hotplug
and qemu will clear the above bit.
Then, during migration, get_pci_config_device() will compare the
configs of both the freshly created device and the one that is being
received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
and cause the bug to reproduce.
To avoid this fake incompatibility, there are tree fields in PCIDevice that
can help:
- wmask: Used to implement R/W bytes, and
- w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
- cmask: Used to enable config checks on load.
According to PCI Express® Base Specification Revision 5.0 Version 1.0,
table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
listed as RO (read-only), so it only makes sense to make use of the cmask
field.
So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
get_pci_config_device() does not abort the migration.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20230706045546.593605-3-leobras@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
The current error messages in virtio_iommu_set_page_size_mask()
sound quite similar for different situations and miss the IOMMU
memory region that causes the issue.
Clarify them and rework the comment.
Also remove the trace when the new page_size_mask is not applied as
the current frozen granule is kept. This message is rather confusing
for the end user and anyway the current granule would have been used
by the driver.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20230705165118.28194-3-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
When running on a 64kB page size host and protecting a VFIO device
with the virtio-iommu, qemu crashes with this kind of message:
qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
with mask 0x20010000
qemu: hardware error: vfio: DMA mapping failed, unable to continue
This is due to the fact the IOMMU MR corresponding to the VFIO device
is enabled very late on domain attach, after the machine init.
The device reports a minimal 64kB page size but it is too late to be
applied. virtio_iommu_set_page_size_mask() fails and this causes
vfio_listener_region_add() to end up with hw_error();
To work around this issue, we transiently enable the IOMMU MR on
machine init to collect the page size requirements and then restore
the bypass state.
Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned device")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20230705165118.28194-2-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
PCIe downstream ports only have a single device 0, so PCI Express devices can
only be plugged into slot 0 on a PCIe port. Add a warning to let users know
when the invalid configuration is used. We may enforce this more strongly later
once we get more clarity on whether we are introducing a bad regression for
users currently using the wrong configuration.
The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the igb device supports). The vfs are instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.
CC: jusual@redhat.com
CC: imammedo@redhat.com
CC: mst@redhat.com
CC: akihiko.odaki@daynix.com
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20230705115925.5339-6-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
The test attaches a SCSI controller to a non-zero slot and a pcie-to-pci bridge
on slot 0 on the same pcie-root-port. Since a downstream device can be attached
to a pcie-root-port only on slot 0, the above test configuration is not allowed.
Additionally using pcie.0 as id for pcie-to-pci bridge is incorrect as that id
is reserved only for the root bus.
In the test scenario, there is no need to attach a pcie-root-port to the
root complex. A SCSI controller can be attached to a pcie-to-pci bridge
which can then be directly attached to the root bus (pcie.0).
Fix the test and simplify it.
CC: mst@redhat.com
CC: imammedo@redhat.com
CC: Michael Labiuk <michael.labiuk@virtuozzo.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230705115925.5339-5-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCIE ports only have one slot, slot 0. Hence, non-zero slots are not available
for PCIE devices on PCIE root ports. Fix test_acpi_q35_tcg_no_acpi_hotplug()
so that the test does not use them.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230705115925.5339-3-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With TPM CRM device, vhost-vdpa reports an error when it tries
to register a listener for a non aligned memory region:
qemu-system-x86_64: vhost_vdpa_listener_region_add received unaligned region
qemu-system-x86_64: vhost_vdpa_listener_region_del received unaligned region
This error can be confusing for the user whereas we only need to skip
the region (as it's already done after the error_report())
Rather than introducing a special case for TPM CRB memory section
to not display the message in this case, simply replace the
error_report() by a trace function (with more information, like the
memory region name).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20230704071931.575888-2-lvivier@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."
Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.
Yet the problem is that, vhost_vdpa_net_load_offloads() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.
This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.
Fixes: 0b58d3686a ("vdpa: Add vhost_vdpa_net_load_offloads()")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <b0396b80e96322b86f1a0b10c098fc1edd947d72.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."
Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.
Yet the problem is that, vhost_vdpa_net_load_mq() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.
This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.
Fixes: f64c7cda69 ("vdpa: Add vhost_vdpa_net_load_mq")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <ec515ebb0b4f56368751b9e318e245a5d994fa72.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."
Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.
Yet the problem is that, vhost_vdpa_net_load_mac() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.
This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.
Fixes: f73c0c43ac ("vdpa: extract vhost_vdpa_net_load_mac from vhost_vdpa_net_load")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <a21731518644abbd0c495c5b7960527c5911f80d.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
pci_new() automatically retains a reference to a virtual function when
registering it so we need to release the reference when unregistering.
Fixes: 7c0fa8dff8 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230411090408.48366-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
There is also pci_new() which creates non-multifunction PCI devices.
Accordingly the parameter is always set to true when a multi function PCI
device is to be created.
The reason for the parameter's existence seems to be that it is used in the
internal PCI code as well which is the only location where it gets set to
false. This one usage can be resolved by factoring out an internal helper
function.
Remove this redundant, error-prone parameter.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230304114043.121024-6-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The modern, declarative way to set up VM state handling is to assign to
DeviceClass::vmsd attribute.
There shouldn't be any change in behavior since dc->vmsd causes
vmstate_register_with_alias_id() to be called on the instance during
the instance init phase. vmstate_register() was also called during the
instance init phase which forwards to vmstate_register_with_alias_id()
internally. Checking the migration schema before and after this patch confirms:
before:
> qemu-system-x86_64 -S
> qemu > migrate -d exec:cat>before.mig
after:
> qemu-system-x86_64 -S
> qemu > migrate -d exec:cat>after.mig
> analyze-migration.py -d desc -f before.mig > before.json
> analyze-migration.py -d desc -f after.mig > after.json
> diff before.json after.json
-> empty
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230531211043.41724-8-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Every TYPE_PCI_IDE device performs the same not-so-trivial bit manipulation by
copy'n'paste code. Extract this into bmdma_status_writeb(), mirroring
bmdma_cmd_writeb().
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20230531211043.41724-6-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The instruction implements SAD (sum-absolute-difference) operation which
is used in motion estimation algorithms. The instruction handles four
8-bit data in parallel.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-34-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions do parallel quad 8-bit multiply and accumulate.
They are close to existing Q8MUL Q8MULSU so the generation
function modified to support all of them.
Also the patch fixes decoding of Q8MULSU according to tests on
hardware.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-30-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are:
- single 32-bit
- dual 16-bit packed
- quad 8-bit packed
conditional moves.
They are grouped in pool20 in the source code.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-29-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are dual 32-bit arithmetic shift right and
pack LSBs to 2x 16-bit into a MXU register.
The difference is the shift amount source: immediate or GP reg.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-25-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are all load/store a halfword from memory
and put it into/get it from MXU register in various combinations.
I-suffix instructions modify the base address GPR by offset provided.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-22-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are all load/store a byte from memory
and put it into/get it from MXU register in various combinations.
I-suffix instructions modify the base address GPR by offset provided.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-21-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are all dual 8-bit addition/subtraction in
various combinations. Most instructions are grouped in pool14,
see the opcode organization in the file.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-20-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are all dual 16-bit addition/subtraction in
various combinations. The instructions are grouped in pool13,
see the opcode organization in the file.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-19-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The instruction adds two 32-bit values with respect
to corresponding carry flags in MXU_CR.
XRa += XRb + LeftCarry flag;
XRd += XRc + RightCarry flag;
Suddenly, it doesn't modify carry flags as a result of addition.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-18-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are all dual 32-bit addition/subtraction in
various combinations. The instructions are grouped in pool12,
see the opcode organization in the file.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-17-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The instruction adds/subtracts two 32-bit values in XRb and XRc.
Placing results in XRa and XRd and updates carry bits for each
path in the MXU control register.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-16-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are part of pool3, see the grand tree above
in the file.
The instructions are close to D16MUL so common generation function
provided.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-11-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions are part of pool1, see the grand tree above
in the file. Q8ADD is part of pool1 too but belong to another
category of instructions, thus will be made in later patches.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-8-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These instructions used to multiply 2x32-bit GPR sources & accumulate
result into 64-bit pair of XRF registers.
These instructions stain HI/LO registers with the final result.
Their opcode is close to the MIPS32R1 MADD[U]/MSUB[U], so it have to
call decode_opc_special2_legacy when failing to find MXU opcode.
Moreover, it solves issue with reinventing MUL and malfunction
MULU/CLZ/CLO instructions.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-5-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
XBurstR1 - is the MIPS32R1 CPU which aims to cover all Ingenic SoCs
older than JZ4770 and some newer.
XBurstR2 - is the MIPS32R2 CPU which aims to cover all Ingenic SoCs
starting from to JZ4770.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-3-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add support for emulating:
- S32LDDV and S32LDDVR
- S32STD and S32STDR
- S32STDV and S32STDVR
MXU instructions.
Add support for emulating MXU instructions with address register
post-modify counterparts:
- S32LDI and S32LDIR
- S32LDIV and S32LDIVR
- S32SDI and S32SDIR
- S32SDIV and S32SDIVR
Refactor support for emulating the S32LDD and S32LDDR instructions.
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Message-Id: <20230608104222.1520143-2-lis8215@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
After implemented CPUCFG and CSR, we are now able to boot Linux
kernel with Loongson-3A4000 CPU, so there is no point to restrict
CPU type to 3A1000 only, instead we just check for presence of
INSN_LOONGSON3A.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20230521214832.20145-3-jiaxun.yang@flygoat.com>
[JY: Check for cpu_type_supports_isa(INSN_LOONGSON3A)]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Loongson introduced CSR instructions since 3A4000, which looks
similar to IOCSR and CPUCFG instructions we seen in LoongArch.
Unfortunately we don't have much document about those instructions,
bit fields of CPUCFG instructions and IOCSR registers can be found
at 3A4000's user manual, while instruction encodings can be found
at arch/mips/include/asm/mach-loongson64/loongson_regs.h from
Linux Kernel.
Our predefined CPUCFG bits are differ from actual 3A4000, since
we can't emulate all CPUCFG features present in 3A4000 for now,
we just enable bits for what we have in TCG.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20230521214832.20145-2-jiaxun.yang@flygoat.com>
[JY: Fixed typo in ase_lcsr_available(),
retrict GEN_FALSE_TRANS]
[PMD: Fix meson's mips_softmmu_ss -> mips_system_ss,
restrict AddressSpace/MemoryRegion to SysEmu]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Third RISC-V PR for 8.1
* Use xl instead of mxl for disassemble
* Factor out extension tests to cpu_cfg.h
* disas/riscv: Add vendor extension support
* disas/riscv: Add support for XVentanaCondOps
* disas/riscv: Add support for XThead* instructions
* Fix mstatus related problems
* Fix veyron-v1 CPU properties
* Fix the xlen for data address when MPRV=1
* opensbi: Upgrade from v1.2 to v1.3
* Enable 32-bit Spike OpenSBI boot testing
* Support the watchdog timer of HiFive 1 rev b
* Only build qemu-system-riscv$$ on rv$$ host
* Add RVV registers to log
* Restrict ACLINT to TCG
* Add syscall riscv_hwprobe
* Add support for BF16 extensions
* KVM_RISCV_SET_TIMER macro is not configured correctly
* Generate devicetree only after machine initialization is complete
* virt: Convert fdt_load_addr to uint64_t
* KVM: fixes and enhancements
* Add support for the Zfa extension
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmSr+ekACgkQr3yVEwxT
# gBMMGg//ZCcyH3KXB49c2KUIFO6FKYUxN9uC3giZCtuGyEH8T2yDgZVVXnxwU+Ij
# +3Ej6T/ZdWMpePC9qf+xKzHWZk7Qc8Tcg+JgQbga573894yZInRwYl8HsSlEKA+Z
# vlqSBPxTlp9rlDwGP/LjGljyIFqL4konk9zi3FL4ZXTF1iHUGrh/953Y3wIreEfl
# KX5UznnWcgy2BqQT1vihMbM8qCVK6iryH+QZ6LiAsPMSX1rIzk8ectQryILzoIYh
# bMiwCLVMyr4ZrUXjmGTF+7/WcOWwhhyfpdstf2iotKALelZtVHit0wHcty2GYQde
# nvN83jJWu04DGXkPBUsqCUQXczGo1QHjJUH3RIRJzfOby/lGt4pSzHAfKA+iNUht
# ikM3SdBsXMO+ogjTtTcCMb7/m2vsMoQP60VRts9Mh3YVD0cgr7RqpqRoEMugVYnr
# ca8Vijf71mB+y+pq477eV1Q8BoKpr8xa1OlFkNKPC17uMD7HoDMI44QgFOgtYp10
# TMsqqyB75q6PZhSEwm63xbmH0Zpo8kSqT/E3MTtGTyPeuL8TNNNSkCmFaGYmRrbI
# XEp7vG2RaDJOvDomS3nUhA5ruc8SaXd0q25q2gLYQfCsehfFqZAwuNB5xf1zS0M0
# ov1/gwaqU93t6nLbo2cCbb0plkIFKwwJ9KKjD06wJ4KPe0TGFzk=
# =3XFD
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Jul 2023 01:30:33 PM BST
# gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-20230710-1' of https://github.com/alistair23/qemu: (54 commits)
riscv: Add support for the Zfa extension
target/riscv/kvm.c: read/write (cbom|cboz)_blocksize in KVM
target/riscv/kvm.c: add kvmconfig_get_cfg_addr() helper
target/riscv: update multi-letter extension KVM properties
target/riscv/cpu.c: create KVM mock properties
target/riscv/cpu.c: remove priv_ver check from riscv_isa_string_ext()
target/riscv/cpu.c: add satp_mode properties earlier
target/riscv/kvm.c: add multi-letter extension KVM properties
target/riscv/kvm.c: update KVM MISA bits
target/riscv: add KVM specific MISA properties
target/riscv/cpu: add misa_ext_info_arr[]
target/riscv/kvm.c: init 'misa_ext_mask' with scratch CPU
target/riscv: handle mvendorid/marchid/mimpid for KVM CPUs
target/riscv: read marchid/mimpid in kvm_riscv_init_machine_ids()
target/riscv: use KVM scratch CPUs to init KVM properties
target/riscv/cpu.c: restrict 'marchid' value
target/riscv/cpu.c: restrict 'mimpid' value
target/riscv/cpu.c: restrict 'mvendorid' value
hw/riscv/virt.c: skip 'mmu-type' FDT if satp mode not set
target/riscv: skip features setup for KVM CPUs
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There is also pci_create_simple() which creates non-multifunction PCI
devices. Accordingly the parameter is always set to true when a multi
function PCI device is to be created.
The reason for the parameter's existence seems to be that it is used in the
internal PCI code as well which is the only location where it gets set to
false. This one usage can be replaced by trivial code.
Remove this redundant, error-prone parameter.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230304114043.121024-5-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I440FX realization is currently mixed with PIIX3 creation. Furthermore, it is
common practice to only set properties between a device's qdev_new() and
qdev_realize(). Clean up to resolve both issues.
Since I440FX spawns a PCI bus let's also move the pci_bus initialization there.
Note that when running `qemu-system-x86_64 -M pc -S` before and after this
patch, `info mtree` in the QEMU console doesn't show any differences except that
the ordering is different.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-18-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
i440fx_init() is a legacy init function. The previous patches worked towards
TYPE_I440FX_PCI_HOST_BRIDGE to be instantiated the QOM way. Do this now by
transforming the parameters passed to i440fx_init() into property assignments.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230630073720.21297-17-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
I440FX needs a different PCI device model if the "igd-passthru" property is
enabled. The type name is currently passed as a parameter to i440fx_init(). This
parameter will be replaced by a property assignment once i440fx_init() gets
resolved.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230630073720.21297-16-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce the properties in anticipation of QOM'ification; Q35 has the same
properties.
Note that we want to avoid a "ram size" property in the QOM interface since it
seems redundant to both properties introduced in this change. Thus the removal
of the ram_size parameter. We assume the invariant of both properties to sum up
to "ram size" which is already asserted in pc_memory_init(). Under Xen the
invariant seems to hold as well, so we now also check it there.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230630073720.21297-15-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The goal is to eliminate i440fx_init() which is a legacy init function. This
neccessitates the memory regions to be properties, like in Q35, which will be
assigned in board code.
Since i440fx needs different PCI devices in Xen mode, and since i440fx shall
be self-contained, the PCI device will be created during realization of the
host. Thus the pointers need to be moved to the host structure to be usable as
properties.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-13-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
i440fx_realize() realizes the PCI device inside the host bridge
(PCII440FXState), but is implemented between i440fx_pcihost_realize() and
i440fx_init() which deal with the host bridge itself (I440FXState). Since we
want to append i440fx_init() to i440fx_pcihost_realize() later let's move
i440fx_realize() out of the way.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-12-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The parent-child relation is usually established near a child's qdev_new(). For
i440fx this allows for reusing the machine parameter, thus avoiding
qdev_get_machine() which relies on a global variable.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230630073720.21297-9-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The Q35 PCI host already has a PCI_HOST_BYPASS_IOMMU property. However, the
host initializes this property itself by accessing global machine state,
thereby assuming it to be a PC machine. Avoid this by having board code
set this property.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-6-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The Q35 PCI host currently sets the PC machine's PCI bus attribute
through global state, thereby assuming the machine to be a PC machine.
The Q35 machine code already holds on to Q35's pci bus attribute, so can
easily set its own property while preserving encapsulation.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes the following clangd warning (-Winitializer-overrides):
q35.c:297:19: Initializer overrides prior initialization of this subobject
q35.c:292:19: previous initialization is here
Settle on little endian which is consistent with using pci_host_conf_le_ops.
Fixes: bafc90bdc5 ("q35: implement TSEG")
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630073720.21297-3-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A device reset is issued per device, not per VQ. The legacy device reset
message, VHOST_USER_RESET_OWNER, is already a per device message. Therefore,
this change adds the proper message, VHOST_USER_RESET_DEVICE, to per device
messages.
Signed-off-by: Tom Lonergan <tom.lonergan@nutanix.com>
Message-Id: <20230628163927.108171-3-tom.lonergan@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Some devices, like virtio-scsi, consist of one vhost_dev, while others, like
virtio-net, contain multiple vhost_devs. The QEMU vhost-user code has a
concept of one-time messages which is misleading. One-time messages are sent
once per operation on the device, not once for the lifetime of the device.
Therefore, as discussed in [1], vhost_user_one_time_request should be
renamed to vhost_user_per_device_request and the relevant comments updated
to match the real functionality.
[1] https://lore.kernel.org/qemu-devel/20230127083027-mutt-send-email-mst@kernel.org/
Signed-off-by: Tom Lonergan <tom.lonergan@nutanix.com>
Message-Id: <20230628163927.108171-2-tom.lonergan@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>From SMBIOS 3.0 specification, core count field means:
Core Count is the number of cores detected by the BIOS for this
processor socket. [1]
Before 003f230e37 ("machine: Tweak the order of topology members in
struct CpuTopology"), MachineState.smp.cores means "the number of cores
in one package", and it's correct to use smp.cores for core count.
But 003f230e37 changes the smp.cores' meaning to "the number of cores
in one die" and doesn't change the original smp.cores' use in smbios as
well, which makes core count in type4 go wrong.
Fix this issue with the correct "cores per socket" caculation.
[1] SMBIOS 3.0.0, section 7.5.6, Processor Information - Core Count
Fixes: 003f230e37 ("machine: Tweak the order of topology members in struct CpuTopology")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-5-zhao1.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>From SMBIOS 3.0 specification, thread count field means:
Thread Count is the total number of threads detected by the BIOS for
this processor socket. It is a processor-wide count, not a
thread-per-core count. [1]
So here we should use threads per socket other than threads per core.
[1] SMBIOS 3.0.0, section 7.5.8, Processor Information - Thread Count
Fixes: c97294ec1b ("SMBIOS: Build aggregate smbios tables and entry point")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-4-zhao1.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
smp.sockets is the number of sockets which is configured by "-smp" (
otherwise, the default is 1). Trying to recalculate it here with another
rules leads to errors, such as:
1. 003f230e37 ("machine: Tweak the order of topology members in struct
CpuTopology") changes the meaning of smp.cores but doesn't fix
original smp.cores uses.
With the introduction of cluster, now smp.cores means the number of
cores in one cluster. So smp.cores * smp.threads just means the
threads in a cluster not in a socket.
2. On the other hand, we shouldn't use smp.cpus here because it
indicates the initial number of online CPUs at the boot time, and is
not mathematically related to smp.sockets.
So stop reinventing the another wheel and use the topo values that
has been calculated.
Fixes: 003f230e37 ("machine: Tweak the order of topology members in struct CpuTopology")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-3-zhao1.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The number of cores/threads per socket are needed for smbios, and are
also useful for other modules.
Provide the helpers to wrap the calculation of cores/threads per socket
so that we can avoid calculation errors caused by other modules miss
topology changes.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-2-zhao1.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't have a virtio-scmi implementation in QEMU and only support a
vhost-user backend. This is very similar to virtio-gpio and we add the same
set of tests, just passing some vhost-user messages over the control socket.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230628100524.342666-4-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows is to instantiate a vhost-user-scmi device as part of a PCI bus.
It is mostly boilerplate similar to the other vhost-user-*-pci boilerplates
of similar devices.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Message-Id: <20230628100524.342666-3-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement the virtio-gpu feature in contrib/vhost-user-gpu, which was
unsupported until now.
In this implementation, the feature is enabled inconditionally to avoid
creating another optional config argument.
Similarly to get_display_info, vhost-user-gpu sends a message back to
the frontend to have access to all the display information. In the
case of get_edid, it also needs to pass which scanout we should
retrieve the edid for.
The VHOST_USER_GPU_PROTOCOL_F_EDID protocol feature is required if the
frontend sets the VIRTIO_GPU_F_EDID virtio-gpu feature. If the frontend
sets the virtio-gpu feature but does not support the protocol feature,
the backend will abort with an error.
Signed-off-by: Erico Nunes <ernunes@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230626164708.1163239-4-ernunes@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VHOST_USER_GPU_GET_EDID is defined as a message from the backend to the
frontend to retrieve the EDID data for a given scanout.
The VHOST_USER_GPU_PROTOCOL_F_EDID protocol feature is defined as a way
to check whether this new message is supported or not.
Signed-off-by: Erico Nunes <ernunes@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230626164708.1163239-3-ernunes@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Previous implementation of MIPS cp0_timer computes a
cp0_count_ns based on input clock. However rounding
error of cp0_count_ns can affect precision of cp0_timer.
Using clock API and a divider for cp0_timer, so we can
use clock_ns_to_ticks/clock_ns_to_ticks to avoid rounding
issue.
Also workaround the situation that in such handler flow:
count = read_c0_count()
write_c0_compare(count)
If timer had not progressed when compare was written, the
interrupt would trigger again.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230521110037.90049-1-jiaxun.yang@flygoat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* s390x instruction emulation fixes and corresponding TCG tests
* Extend the readconfig qtest
* Introduce "-run-with chroot=..." and deprecate the old "-chroot" option
* Speed up migration tests
* Fix coding style in the coding style document
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmSsCYQRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbUhLw/+Mg74FGODwb/kdPSSY+ahEmutRaQG5z74
# zWnHFYTB0xLRxu5gwV09wcFt88RjkkdsKORtp1LBRahVaKYzYSq3PxMYsDii2pdr
# Ma58RLZC/42shrzZmXEyl3ilxCCHjq2UCezX+4ca/zuTl/83znVN6Mrq28GUmp7v
# 8yI78mPpZXEkLEN3cnnK3v7AsLwz439aHd3ADZ1IWUohGHQdDAj4nn5Yxp4SeIUj
# sOmCcEfLj3emNM/TTL2suohuZNwYjyLQ5iqQJ/B7v/S88PbWQUA9Cq/KpEGBLk/D
# fxDjbQ7+zpTTSQ+XihShtGdEnl4uPPixvJX43vriYDBQFsHKS7Y38cSAFVTDrQvh
# 4fELCAPg8wXeoyMu7WZWINDA6dVdInCdmljHYpK+mQg7AtHu/CliPWzVUZyeW3XD
# lwybNCoyJQcA4KPAyYrkau74JrLRGtLJJQ5XtQEDsK791xjeHt1hr42QY4YeHyjM
# Utf6inp4D7RZ3O9p5EeKNVpFin5AE+RTvNZKLJicFRb0hFziUkCK61nRwS5gmvXA
# I41av1L+mLI7jvu0M2ID1CfIhFf+/w4GKNkUlcutux7uz5mzxIj0oifsONEZGNo+
# NlVKKNxfQv2eRl+9sZPWNl8q11K3bvZbpvXZS5oSLIererWIIROaxcgzxpU+EGLT
# 8HhF7RZdO8w=
# =LLmM
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Jul 2023 02:37:08 PM BST
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [undefined]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [undefined]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2023-07-10v2' of https://gitlab.com/thuth/qemu: (21 commits)
docs/devel: Fix coding style in style.rst
tests/qtest: massively speed up migration-test
tests/tcg/s390x: Fix test-svc with clang
meson.build: Skip C++ detection unless we're targeting Windows
os-posix: Allow 'chroot' via '-run-with' and deprecate the old '-chroot' option
tests/qtest/readconfig: Test the docs/config/q35-*.cfg files
tests/qtest: Move mkimg() and have_qemu_img() from libqos to libqtest
tests/qtest/readconfig-test: Allow testing for arbitrary memory sizes
tests/tcg/s390x: Test MVCRL with a large value in R0
tests/tcg/s390x: Test MDEB and MDEBR
tests/tcg/s390x: Test LRA
tests/tcg/s390x: Test LARL with a large offset
tests/tcg/s390x: Test EPSW
target/s390x: Fix relative long instructions with large offsets
target/s390x: Fix LRA when DAT is off
target/s390x: Fix LRA overwriting the top 32 bits on DAT error
target/s390x: Fix MVCRL with a large value in R0
target/s390x: Fix MDEB and MDEBR
target/s390x: Fix EPSW CC reporting
linux-user: elfload: Add more initial s390x PSW bits
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The migration test cases that actually exercise live migration want to
ensure there is a minimum of two iterations of pre-copy, in order to
exercise the dirty tracking code.
Historically we've queried the migration status, looking for the
'dirty-sync-count' value to increment to track iterations. This was
not entirely reliable because often all the data would get transferred
quickly enough that the migration would finish before we wanted it
to. So we massively dropped the bandwidth and max downtime to
guarantee non-convergance. This had the unfortunate side effect
that every migration took at least 30 seconds to run (100 MB of
dirty pages / 3 MB/sec).
This optimization takes a different approach to ensuring that a
mimimum of two iterations. Rather than waiting for dirty-sync-count
to increment, directly look for an indication that the source VM
has dirtied RAM that has already been transferred.
On the source VM a magic marker is written just after the 3 MB
offset. The destination VM is now montiored to detect when the
magic marker is transferred. This gives a guarantee that the
first 3 MB of memory have been transferred. Now the source VM
memory is monitored at exactly the 3MB offset until we observe
a flip in its value. This gives us a guaranteed that the guest
workload has dirtied a byte that has already been transferred.
Since we're looking at a place that is only 3 MB from the start
of memory, with the 3 MB/sec bandwidth, this test should complete
in 1 second, instead of 30 seconds.
Once we've proved there is some dirty memory, migration can be
set back to full speed for the remainder of the 1st iteration,
and the entire of the second iteration at which point migration
should be complete.
On a test machine this further reduces the migration test time
from 8 minutes to 1 minute 40.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230601161347.1803440-11-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
clang does not support expressions involving symbols in instructions
like lghi yet, so building hello-s390x-asm.S with it fails.
Move the expression to the literal pool and load it from there.
Fixes: be4a4cb429 ("tests/tcg/s390x: Test single-stepping SVC")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230707154242.457706-1-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The only C++ code that we currently still have in the repository
is the code in qga/vss-win32/ - so we can skip the C++ detection
unless we are compiling binaries for Windows.
Message-Id: <20230705133639.146073-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
We recently introduced "-run-with" for options that influence the
runtime behavior of QEMU. This option has the big advantage that it
can group related options (so that it is easier for the users to spot
them) and that the options become introspectable via QMP this way.
So let's start moving more switches into this option group, starting
with "-chroot" now.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20230703074447.17044-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Test that we can successfully parse the docs/config/q35-emulated.cfg,
docs/config/q35-virtio-graphical.cfg and docs/config/q35-virtio-serial.cfg
config files (the "...-serial.cfg" file is a subset of the graphical
config file, so we skip that in quick mode).
These config files use two hard-coded image names which we have to
replace with unique temporary files to avoid race conditions in case
the tests are run in parallel. So after creating the temporary image
files, we also have to create a copy of the config file where we
replaced the hard-coded image names.
If KVM is not available, we also have to disable the "accel" lines.
Once everything is in place, we can start QEMU with the modified
config file and check that everything is available in QEMU.
Message-Id: <20230704071655.75381-4-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The expression "imm * 2" in gen_ri2() can wrap around if imm is large
enough.
Fix by casting imm to int64_t, like it's done in disas_jdest().
Fixes: e8ecdfeb30 ("Fix EXECUTE of relative branches")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230704081506.276055-8-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When a DAT error occurs, LRA is supposed to write the error information
to the bottom 32 bits of R1, and leave the top 32 bits of R1 alone.
Fix by passing the original value of R1 into helper and copying the
top 32 bits to the return value.
Fixes: d8fe4a9c28 ("target-s390: Convert LRA")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20230704081506.276055-6-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Using a large R0 causes an assertion error:
qemu-s390x: target/s390x/tcg/mem_helper.c:183: access_prepare_nf: Assertion `size > 0 && size <= 4096' failed.
Even though PoP explicitly advises against using more than 8 bits for the
size, an emulator crash is never a good thing.
Fix by truncating the size to 8 bits.
Fixes: ea0a1053e2 ("s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20230704081506.276055-5-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Make the PSW look more similar to the real s390x userspace PSW.
Except for being there, the newly added bits should not affect the
userspace code execution.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230704081506.276055-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Protected Virtualization (PV) is not a real hardware device:
it is a feature of the firmware on s390x that is exposed to
userspace via the KVM interface.
Move the pv.c/pv.h files to target/s390x/kvm/ to make this clearer.
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230624200644.23931-1-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch introduces the RISC-V Zfa extension, which introduces
additional floating-point instructions:
* fli (load-immediate) with pre-defined immediates
* fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
* fround/froundmx (round to integer)
* fcvtmod.w.d (Modular Convert-to-Integer)
* fmv* to access high bits of float register bigger than XLEN
* Quiet comparison instructions (fleq/fltq)
Zfa defines its instructions in combination with the following extensions:
* single-precision floating-point (F)
* double-precision floating-point (D)
* quad-precision floating-point (Q)
* half-precision floating-point (Zfh)
Since QEMU does not support the RISC-V quad-precision floating-point
ISA extension (Q), this patch does not include the instructions that
depend on this extension. All other instructions are included in this
patch.
The Zfa specification can be found here:
https://github.com/riscv/riscv-isa-manual/blob/master/src/zfa.tex
The Zfa specifciation is frozen and is in public review since May 3, 2023:
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/SED4ntBkabg
The patch also includes a TCG test for the fcvtmod.w.d instruction.
The test cases test for correct results and flag behaviour.
Note, that the Zfa specification requires fcvtmod's flag behaviour
to be identical to a fcvt with the same operands (which is also
tested).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Message-Id: <20230710071243.282464-1-christoph.muellner@vrull.eu>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
If we don't set a proper cbom_blocksize|cboz_blocksize in the FDT the
Linux Kernel will fail to detect the availability of the CBOM/CBOZ
extensions, regardless of the contents of the 'riscv,isa' DT prop.
The FDT is being written using the cpu->cfg.cbom|z_blocksize attributes,
so let's expose them as user properties like it is already done with
TCG.
This will also require us to determine proper blocksize values during
init() time since the FDT is already created during realize(). We'll
take a ride in kvm_riscv_init_multiext_cfg() to do it. Note that we
don't need to fetch both cbom and cboz blocksizes every time: check for
their parent extensions (icbom and icboz) and only read the blocksizes
if needed.
In contrast with cbom|z_blocksize properties from TCG, the user is not
able to set any value that is different from the 'host' value when
running KVM. KVM can be particularly harsh dealing with it: a ENOTSUPP
can be thrown for the mere attempt of executing kvm_set_one_reg() for
these 2 regs.
Hopefully we don't need to call kvm_set_one_reg() for these regs.
We'll check if the user input matches the host value in
kvm_cpu_set_cbomz_blksize(), the set() accessor for both blocksize
properties. We'll fail fast since it's already known to not be
supported.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-21-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
We're now ready to update the multi-letter extensions status for KVM.
kvm_riscv_update_cpu_cfg_isa_ext() is called called during vcpu creation
time to verify which user options changes host defaults (via the 'user_set'
flag) and tries to write them back to KVM.
Failure to commit a change to KVM is only ignored in case KVM doesn't
know about the extension (-EINVAL error code) and the user wanted to
disable the given extension. Otherwise we're going to abort the boot
process.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-19-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
KVM-specific properties are being created inside target/riscv/kvm.c. But
at this moment we're gathering all the remaining properties from TCG and
adding them as is when running KVM. This creates a situation where
non-KVM properties are setting flags to 'true' due to its default
settings (e.g. Zawrs). Users can also freely enable them via command
line.
This doesn't impact runtime per se because KVM doesn't care about these
flags, but code such as riscv_isa_string_ext() take those flags into
account. The result is that, for a KVM guest, setting non-KVM properties
will make them appear in the riscv,isa DT.
We want to keep the same API for both TCG and KVM and at the same time,
when running KVM, forbid non-KVM extensions to be enabled internally. We
accomplish both by changing riscv_cpu_add_user_properties() to add a
mock boolean property for every non-KVM extension in
riscv_cpu_extensions[]. Then, when running KVM, users are still free to
set extensions at will, but we'll error out if a non-KVM extension is
enabled. Setting such extension to 'false' will be ignored.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-18-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
riscv_isa_string_ext() is being used by riscv_isa_string(), which is
then used by boards to retrieve the 'riscv,isa' string to be written in
the FDT. All this happens after riscv_cpu_realize(), meaning that we're
already past riscv_cpu_validate_set_extensions() and, more important,
riscv_cpu_disable_priv_spec_isa_exts().
This means that all extensions that needed to be disabled due to
priv_spec mismatch are already disabled. Checking this again during
riscv_isa_string_ext() is unneeded. Remove it.
As a bonus, riscv_isa_string_ext() can now be used with the 'host'
KVM-only CPU type since it doesn't have a env->priv_ver assigned and it
would fail this check for no good reason.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-17-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
riscv_cpu_add_user_properties() ended up with an excess of "#ifndef
CONFIG_USER_ONLY" blocks after changes that added KVM properties
handling.
KVM specific properties are required to be created earlier than their
TCG counterparts, but the remaining props can be created at any order.
Move riscv_add_satp_mode_properties() to the start of the function,
inside the !CONFIG_USER_ONLY block already present there, to remove the
last ifndef block.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-16-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Let's add KVM user properties for the multi-letter extensions that KVM
currently supports: zicbom, zicboz, zihintpause, zbb, ssaia, sstc,
svinval and svpbmt.
As with MISA extensions, we're using the KVMCPUConfig type to hold
information about the state of each extension. However, multi-letter
extensions have more cases to cover than MISA extensions, so we're
adding an extra 'supported' flag as well. This flag will reflect if a
given extension is supported by KVM, i.e. KVM knows how to handle it.
This is determined during KVM extension discovery in
kvm_riscv_init_multiext_cfg(), where we test for EINVAL errors. Any
other error will cause an abort.
The use of the 'user_set' is similar to what we already do with MISA
extensions: the flag set only if the user is changing the extension
state.
The 'supported' flag will be used later on to make an exception for
users that are disabling multi-letter extensions that are unknown to
KVM.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-Id: <20230706101738.460804-15-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Our design philosophy with KVM properties can be resumed in two main
decisions based on KVM interface availability and what the user wants to
do:
- if the user disables an extension that the host KVM module doesn't
know about (i.e. it doesn't implement the kvm_get_one_reg() interface),
keep booting the CPU. This will avoid users having to deal with issues
with older KVM versions while disabling features they don't care;
- for any other case we're going to error out immediately. If the user
wants to enable a feature that KVM doesn't know about this a problem that
is worth aborting - the user must know that the feature wasn't enabled
in the hart. Likewise, if KVM knows about the extension, the user wants
to enable/disable it, and we fail to do it so, that's also a problem we
can't shrug it off.
In the case of MISA bits we won't even try enabling bits that aren't
already available in the host. The ioctl() is so likely to fail that
it's not worth trying. This check is already done in the previous patch,
in kvm_cpu_set_misa_ext_cfg(), thus we don't need to worry about it now.
In kvm_riscv_update_cpu_misa_ext() we'll go through every potential user
option and do as follows:
- if the user didn't set the property or set to the same value of the
host, do nothing;
- Disable the given extension in KVM. Error out if anything goes wrong.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-14-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Using all TCG user properties in KVM is tricky. First because KVM
supports only a small subset of what TCG provides, so most of the
cpu->cfg flags do nothing for KVM.
Second, and more important, we don't have a way of telling if any given
value is an user input or not. For TCG this has a small impact since we
just validating everything and error out if needed. But for KVM it would
be good to know if a given value was set by the user or if it's a value
already provided by KVM. Otherwise we don't know how to handle failed
kvm_set_one_regs() when writing the configurations back.
These characteristics make it overly complicated to use the same user
facing flags for both KVM and TCG. A simpler approach is to create KVM
specific properties that have specialized logic, forking KVM and TCG use
cases for those cases only. Fully separating KVM/TCG properties is
unneeded at this point - in fact we want the user experience to be as
equal as possible, regardless of the acceleration chosen.
We'll start this fork with the MISA properties, adding the MISA bits
that the KVM driver currently supports. A new KVMCPUConfig type is
introduced. It'll hold general information about an extension. For MISA
extensions we're going to use the newly created getters of
misa_ext_infos[] to populate their name and description. 'offset' holds
the MISA bit (RVA, RVC, ...). We're calling it 'offset' instead of
'misa_bit' because this same KVMCPUConfig struct will be used to
multi-letter extensions later on.
This new type also holds a 'user_set' flag. This flag will be set when
the user set an option that's different than what is already configured
in the host, requiring KVM intervention to write the regs back during
kvm_arch_init_vcpu(). Similar mechanics will be implemented for
multi-letter extensions as well.
There is no need to duplicate more code than necessary, so we're going
to use the existing kvm_riscv_init_user_properties() to add the KVM
specific properties. Any code that is adding a TCG user prop is then
changed slightly to verify first if there's a KVM prop with the same
name already added.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-13-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Next patch will add KVM specific user properties for both MISA and
multi-letter extensions. For MISA extensions we want to make use of what
is already available in misa_ext_cfgs[] to avoid code repetition.
misa_ext_info_arr[] array will hold name and description for each MISA
extension that misa_ext_cfgs[] is declaring. We'll then use this new
array in KVM code to avoid duplicating strings. Two getters were added
to allow KVM to retrieve the 'name' and 'description' for each MISA
property.
There's nothing holding us back from doing the same with multi-letter
extensions. For now doing just with MISA extensions is enough.
It is worth documenting that even using the __bultin_ctz() directive to
populate the misa_ext_info_arr[] we are forced to assign 'name' and
'description' during runtime in riscv_cpu_add_misa_properties(). The
reason is that some Gitlab runners ('clang-user' and 'tsan-build') will
throw errors like this if we fetch 'name' and 'description' from the
array in the MISA_CFG() macro:
../target/riscv/cpu.c:1624:5: error: initializer element is not a
compile-time constant
MISA_CFG(RVA, true),
^~~~~~~~~~~~~~~~~~~
../target/riscv/cpu.c:1619:53: note: expanded from macro 'MISA_CFG'
{.name = misa_ext_info_arr[MISA_INFO_IDX(_bit)].name, \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
gcc and others compilers/builders were fine with that change. We can't
ignore failures in the Gitlab pipeline though, so code was changed to
make every runner happy.
As a side effect, misa_ext_cfg[] is no longer a 'const' array because
it must be set during runtime.
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-12-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
At this moment we're retrieving env->misa_ext during
kvm_arch_init_cpu(), leaving env->misa_ext_mask behind.
We want to set env->misa_ext_mask, and we want to set it as early as
possible. The reason is that we're going to use it in the validation
process of the KVM MISA properties we're going to add next. Setting it
during arch_init_cpu() is too late for user validation.
Move the code to a new helper that is going to be called during init()
time, via kvm_riscv_init_user_properties(), like we're already doing for
the machine ID properties. Set both misa_ext and misa_ext_mask to the
same value retrieved by the 'isa' config reg.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-11-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
After changing user validation for mvendorid/marchid/mimpid to guarantee
that the value is validated on user input time, coupled with the work in
fetching KVM default values for them by using a scratch CPU, we're
certain that the values in cpu->cfg.(mvendorid|marchid|mimpid) are
already good to be written back to KVM.
There's no need to write the values back for 'host' type CPUs since the
values can't be changed, so let's do that just for generic CPUs.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-9-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Allow 'marchid' and 'mimpid' to also be initialized in
kvm_riscv_init_machine_ids().
After this change, the handling of mvendorid/marchid/mimpid for the
'host' CPU type will be equal to what we already have for TCG named
CPUs, i.e. the user is not able to set these values to a different val
than the one that is already preset.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-8-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Certain validations, such as the validations done for the machine IDs
(mvendorid/marchid/mimpid), are done before starting the CPU.
Non-dynamic (named) CPUs tries to match user input with a preset
default. As it is today we can't prefetch a KVM default for these cases
because we're only able to read/write KVM regs after the vcpu is
spinning.
Our target/arm friends use a concept called "scratch CPU", which
consists of creating a vcpu for doing queries and validations and so on,
which is discarded shortly after use [1]. This is a suitable solution
for what we need so let's implement it in target/riscv as well.
kvm_riscv_init_machine_ids() will be used to do any pre-launch setup for
KVM CPUs, via riscv_cpu_add_user_properties(). The function will create
a KVM scratch CPU, fetch KVM regs that work as default values for user
properties, and then discard the scratch CPU afterwards.
We're starting by initializing 'mvendorid'. This concept will be used to
init other KVM specific properties in the next patches as well.
[1] target/arm/kvm.c, kvm_arm_create_scratch_host_vcpu()
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-7-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
'marchid' shouldn't be set to a different value as previously set for
named CPUs.
For all other CPUs it shouldn't be freely set either - the spec requires
that 'marchid' can't have the MSB (most significant bit) set and every
other bit set to zero, i.e. 0x80000000 is an invalid 'marchid' value for
32 bit CPUs.
As with 'mimpid', setting a default value based on the current QEMU
version is not a good idea because it implies that the CPU
implementation changes from one QEMU version to the other. Named CPUs
should set 'marchid' to a meaningful value instead, and generic CPUs can
set to any valid value.
For the 'veyron-v1' CPU this is the error thrown if 'marchid' is set to
a different val:
$ ./build/qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,marchid=0x80000000
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.marchid=0x80000000:
Unable to change veyron-v1-riscv-cpu marchid (0x8000000000010000)
And, for generics CPUs, this is the error when trying to set to an
invalid val:
$ ./build/qemu-system-riscv64 -M virt -nographic -cpu rv64,marchid=0x8000000000000000
qemu-system-riscv64: can't apply global rv64-riscv-cpu.marchid=0x8000000000000000:
Unable to set marchid with MSB (64) bit set and the remaining bits zero
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-6-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Following the same logic used with 'mvendorid' let's also restrict
'mimpid' for named CPUs. Generic CPUs keep setting the value freely.
Note that we're getting rid of the default RISCV_CPU_MARCHID value. The
reason is that this is not a good default since it's dynamic, changing
with with every QEMU version, regardless of whether the actual
implementation of the CPU changed from one QEMU version to the other.
Named CPU should set it to a meaningful value instead and generic CPUs
can set whatever they want.
This is the error thrown for an invalid 'mimpid' value for the veyron-v1
CPU:
$ ./qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mimpid=2
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mimpid=2:
Unable to change veyron-v1-riscv-cpu mimpid (0x111)
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-5-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
We're going to change the handling of mvendorid/marchid/mimpid by the
KVM driver. Since these are always present in all CPUs let's put the
same validation for everyone.
It doesn't make sense to allow 'mvendorid' to be different than it
is already set in named (vendor) CPUs. Generic (dynamic) CPUs can have
any 'mvendorid' they want.
Change 'mvendorid' to be a class property created via
'object_class_property_add', instead of using the DEFINE_PROP_UINT32()
macro. This allow us to define a custom setter for it that will verify,
for named CPUs, if mvendorid is different than it is already set by the
CPU. This is the error thrown for the 'veyron-v1' CPU if 'mvendorid' is
set to an invalid value:
$ qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mvendorid=2
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mvendorid=2:
Unable to change veyron-v1-riscv-cpu mvendorid (0x61f)
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The absence of a satp mode in riscv_host_cpu_init() is causing the
following error:
$ ./qemu/build/qemu-system-riscv64 -machine virt,accel=kvm \
-m 2G -smp 1 -nographic -snapshot \
-kernel ./guest_imgs/Image \
-initrd ./guest_imgs/rootfs_kvm_riscv64.img \
-append "earlycon=sbi root=/dev/ram rw" \
-cpu host
**
ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should not be
reached
Bail out! ERROR:../target/riscv/cpu.c:320:satp_mode_str: code should
not be reached
Aborted
The error is triggered from create_fdt_socket_cpus() in hw/riscv/virt.c.
It's trying to get satp_mode_str for a NULL cpu->cfg.satp_mode.map.
For this KVM cpu we would need to inherit the satp supported modes
from the RISC-V host. At this moment this is not possible because the
KVM driver does not support it. And even when it does we can't just let
this broken for every other older kernel.
Since mmu-type is not a required node, according to [1], skip the
'mmu-type' FDT node if there's no satp_mode set. We'll revisit this
logic when we can get satp information from KVM.
[1] https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/riscv/cpus.yaml
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230706101738.460804-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
As it is today it's not possible to use '-cpu host' if the RISC-V host
has RVH enabled. This is the resulting error:
$ ./qemu/build/qemu-system-riscv64 \
-machine virt,accel=kvm -m 2G -smp 1 \
-nographic -snapshot -kernel ./guest_imgs/Image \
-initrd ./guest_imgs/rootfs_kvm_riscv64.img \
-append "earlycon=sbi root=/dev/ram rw" \
-cpu host
qemu-system-riscv64: H extension requires priv spec 1.12.0
This happens because we're checking for priv spec for all CPUs, and
since we're not setting env->priv_ver for the 'host' CPU, it's being
default to zero (i.e. PRIV_SPEC_1_10_0).
In reality env->priv_ver does not make sense when running with the KVM
'host' CPU. It's used to gate certain CSRs/extensions during translation
to make them unavailable if the hart declares an older spec version. It
doesn't have any other use. E.g. OpenSBI version 1.2 retrieves the spec
checking if the CSR_MCOUNTEREN, CSR_MCOUNTINHIBIT and CSR_MENVCFG CSRs
are available [1].
'priv_ver' is just one example. We're doing a lot of feature validation
and setup during riscv_cpu_realize() that it doesn't apply to KVM CPUs.
Validating the feature set for those CPUs is a KVM problem that should
be handled in KVM specific code.
The new riscv_cpu_realize_tcg() helper contains all validation logic that
are applicable to TCG CPUs only. riscv_cpu_realize() verifies if we're
running TCG and, if it's the case, proceed with the usual TCG realize()
logic.
[1] lib/sbi/sbi_hart.c, hart_detect_features()
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706101738.460804-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
fdt_load_addr was previously declared as uint32_t which doe not match
with the return type of riscv_compute_fdt_addr().
This patch modifies the fdt_load_addr type from a uint32_t to a uint64_t
to match the riscv_compute_fdt_addr() return type.
This fixes calculating the fdt address when DRAM is mapped to higher
64-bit address.
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Lakshmi Bai Raja Subramanian <lakshmi.bai.rajasubramanian@bodhicomputing.com>
[ Change by AF:
- Cleanup commit title and message
]
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <168872495192.6334.3845988291412774261-1@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
If the devicetree is created before machine initialization is complete,
it misses dynamic devices. Specifically, the tpm device is not added
to the devicetree file and is therefore not instantiated in Linux.
Load/create devicetree in virt_machine_done() to solve the problem.
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.c>
Fixes: 325b7c4e75 hw/riscv: Enable TPM backends
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230706035937.1870483-1-linux@roeck-us.net>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The privileged spec states:
For a memory access made to support VS-stage address translation (such as
to read/write a VS-level page table), permissions are checked as though
for a load or store, not for the original access type. However, any
exception is always reported for the original access type (instruction,
load, or store/AMO).
The current implementation converts the access type to LOAD if implicit
G-stage translation fails which results in only reporting "Load guest-page
fault". This commit removes the convertion of access type, so the reported
exception conforms to the spec.
Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230627074915.7686-1-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Upgrade OpenSBI from v1.2 to v1.3 and the pre-built bios images.
The v1.3 release includes the following commits:
440fa81 treewide: Replace TRUE/FALSE with true/false
6509127 Makefile: Remove -N ldflag to prevent linker RWX warning
65638f8 lib: utils/sys: Allow custom HTIF base address for RV32
f14595a lib: sbi: Allow platform to influence cold boot HART selection
6957ae0 platform: generic: Allow platform_override to select cold boot HART
cb7e7c3 platform: generic: Allow platform_override to perform firmware init
8020df8 generic/starfive: Add Starfive JH7110 platform implementation
6997552 lib: sbi_hsm: Rename 'priv' argument to 'arg1'
9e397e3 docs: domain_support: Use capital letter for privilege modes
9e0ba09 include: sbi: Fine grain the permissions for M and SU modes
aace1e1 lib: sbi: Use finer permission semantics for address validation
22dbdb3 lib: sbi: Add permissions for the firmware start till end
1ac14f1 lib: sbi: Use finer permission sematics to decide on PMP bits
44f736c lib: sbi: Modify the boot time region flag prints
20646e0 lib: utils: Use SU-{R/W/X} flags for region permissions during parsing
3e2f573 lib: utils: Disallow non-root domains from adding M-mode regions
59a08cd lib: utils: Add M-mode {R/W} flags to the MMIO regions
001106d docs: Update domain's region permissions and requirements
da5594b platform: generic: allwinner: Fix PLIC array bounds
ce2a834 docs: generic.md: fix typo of andes-ae350
8ecbe6d lib: sbi_hsm: handle failure when hart_stop returns SBI_ENOTSUPP
b1818ee include: types: add always inline compiler attribute
9c4eb35 lib: utils: atcsmu: Add Andes System Management Unit support
787296a platform: andes/ae350: Implement hart hotplug using HSM extension
7aaeeab lib: reset/fdt_reset_atcwdt200: Use defined macros and function in atcsmu.h
a990309 lib: utils: Fix reserved memory node for firmware memory
fefa548 firmware: Split RO/RX and RW sections
2f40a99 firmware: Move dynsym and reladyn sections to RX section
c10e3fe firmware: Add RW section offset in scratch
b666760 lib: sbi: Print the RW section offset
230278d lib: sbi: Add separate entries for firmware RX and RW regions
dea0922 platform: renesas/rzfive: Configure Local memory regions as part of root domain
33bf917 lib: utils: Add fdt_add_cpu_idle_states() helper function
c45992c platform: generic: allwinner: Advertise nonretentive suspend
c8ea836 firmware: Fix fw_rw_offset computation in fw_base.S
8050081 firmware: Not to clear all the MIP
84d15f4 lib: sbi_hsm: Use csr_set to restore the MIP
199189b lib: utils: Mark only the largest region as reserved in FDT
66b0e23 lib: sbi: Ensure domidx_to_domain_table is null-terminated
642f3de Makefile: Add missing .dep files for fw_*.elf.ld
09b34d8 include: Add support for byteorder/endianness conversion
680bea0 lib: utils/fdt: Use byteorder conversion functions in libfdt_env.h
b224ddb include: types: Add typedefs for endianness
aa5dafc include: sbi: Fix BSWAPx() macros for big-endian host
e3bf1af include: Add defines for SBI debug console extension
0ee3a86 lib: sbi: Add sbi_nputs() function
4e0572f lib: sbi: Add sbi_ngets() function
eab48c3 lib: sbi: Add sbi_domain_check_addr_range() function
5a41a38 lib: sbi: Implement SBI debug console extension
c43903c lib: sbi: Add console_puts() callback in the console device
29285ae lib: utils/serial: Implement console_puts() for semihosting
65c2190 lib: sbi: Speed-up sbi_printf() and friends using nputs()
321293c lib: utils/fdt: Fix fdt_pmu.c header dependency
aafcc90 platform: generic/allwinner: Fix sun20i-d1.c header dependency
745aaec platform: generic/andes: Fix ae350.c header dependency
99d09b6 include: fdt/fdt_helper: Change fdt_get_address() to return root.next_arg1
6861ee9 lib: utils: fdt_fixup: Fix compile error
4f2be40 docs: fix typo in fw.md
30ea806 lib: sbi_hart: Enable hcontext and scontext
81adc62 lib: sbi: Align SBI vendor extension id with mvendorid CSR
31b82e0 include: sbi: Remove extid parameter from vendor_ext_provider() callback
c100951 platform: generic: renesas: rzfive: Add support to configure the PMA
2491242 platform: generic: renesas: rzfive: Configure the PMA region
67b2a40 lib: sbi: sbi_ecall: Check the range of SBI error
5a75f53 lib: sbi/sbi_domain: cosmetic style fixes
bc06ff6 lib: utils/fdt/fdt_domain: Simplify region access permission check
17b3776 docs: domain_support: Update the DT example
1364d5a lib: sbi_hsm: Factor out invalid state detection
40f16a8 lib: sbi_hsm: Don't try to restore state on failed change
c88e039 lib: sbi_hsm: Ensure errors are consistent with spec
b1ae6ef lib: sbi_hsm: Move misplaced comment
07673fc lib: sbi_hsm: Remove unnecessary include
8a40306 lib: sbi_hsm: Export some functions
73623a0 lib: sbi: Add system suspend skeleton
c9917b6 lib: sbi: Add system_suspend_allowed domain property
7c964e2 lib: sbi: Implement system suspend
37558dc docs: Correct opensbi-domain property name
5ccebf0 platform: generic: Add system suspend test
908be1b gpio/starfive: add gpio driver and support gpio reset
4b28afc make: Add a command line option for debugging OpenSBI
e9d08bd lib: utils/i2c: Add minimal StarFive jh7110 I2C driver
568ea49 platform: starfive: add PMIC power ops in JH7110 visionfive2 board
506144f lib: serial: Cadence: Enable compatibility for cdns,uart-r1p8
1fe8dc9 lib: sbi_pmu: add callback for counter width
51951d9 lib: sbi_pmu: Implement sbi_pmu_counter_fw_read_hi
60c358e lib: sbi_pmu: Reserve space for implementation specific firmware events
548e4b4 lib: sbi_pmu: Rename fw_counter_value
b51ddff lib: sbi_pmu: Update sbi_pmu dev ops
641d2e9 lib: sbi_pmu: Use dedicated event code for platform firmware events
57d3aa3 lib: sbi_pmu: Introduce fw_counter_write_value API
c631a7d lib: sbi_pmu: Add hartid parameter PMU device ops
d56049e lib: sbi: Refactor the calls to sbi_hart_switch_mode()
e8e9ed3 lib: sbi: Set the state of a hart to START_PENDING after the hart is ready
c6a092c lib: sbi: Clear IPIs before init_warm_startup in non-boot harts
ed88a63 lib: sbi_scratch: Optimize the alignment code for alloc size
73ab11d lib: sbi: Fix how to check whether the domain contains fw_region
f64dfcd lib: sbi: Introduce sbi_entry_count() function
30b9e7e lib: sbi_hsm: Fix sbi_hsm_hart_start() for platform with hart hotplug
8e90259 lib: sbi_hart: clear mip csr during hart init
45ba2b2 include: Add defines for SBI CPPC extension
33caae8 lib: sbi: Implement SBI CPPC extension
91767d0 lib: sbi: Print the CPPC device name
edc9914 lib: sbi_pmu: Align the event type offset as per SBI specification
ee016a7 docs: Correct FW_JUMP_FDT_ADDR calculation example
2868f26 lib: utils: fdt_fixup: avoid buffer overrun
66fa925 lib: sbi: Optimize sbi_tlb
24dde46 lib: sbi: Optimize sbi_ipi
80078ab sbi: tlb: Simplify to tlb_process_count/tlb_process function
bf40e07 lib: sbi: Optimize sbi_tlb queue waiting
eeab500 platform: generic: andes/renesas: Add SBI EXT to check for enabling IOCP errata
f692289 firmware: Optimize loading relocation type
e41dbb5 firmware: Change to use positive offset to access relocation entries
bdb3c42 lib: sbi: Do not clear active_events for cycle/instret when stopping
674e019 lib: sbi: Fix counter index calculation for SBI_PMU_CFG_FLAG_SKIP_MATCH
f5dfd99 lib: sbi: Don't check SBI error range for legacy console getchar
7919530 lib: sbi: Add debug print when sbi_pmu_init fails
4e33530 lib: sbi: Remove unnecessary semicolon
6bc02de lib: sbi: Simplify sbi_ipi_process remove goto
dc1c7db lib: sbi: Simplify BITS_PER_LONG definition
f58c140 lib: sbi: Introduce register_extensions extension callback
e307ba7 lib: sbi: Narrow vendor extension range
042f0c3 lib: sbi: pmu: Remove unnecessary probe function
8b952d4 lib: sbi: Only register available extensions
767b5fc lib: sbi: Optimize probe of srst/susp
c3e31cb lib: sbi: Remove 0/1 probe implementations
33f1722 lib: sbi: Document sbi_ecall_extension members
d4c46e0 Makefile: Dereference symlinks on install
8b99a7f lib: sbi: Fix return of sbi_console_init
264d0be lib: utils: Improve fdt_serial_init
9a0bdd0 lib: utils: Improve fdt_ipi
122f226 lib: utils: Improve fdt_timer
df75e09 lib: utils/ipi: buffer overrun aclint_mswi_cold_init
bdde2ec lib: sbi: Align system suspend errors with spec
aad7a37 include: sbi_scratch: Add helper macros to access data type
5cf9a54 platform: Allow platforms to specify heap size
40d36a6 lib: sbi: Introduce simple heap allocator
2a04f70 lib: sbi: Print scratch size and usage at boot time
bbff53f lib: sbi_pmu: Use heap for per-HART PMU state
ef4542d lib: sbi: Use heap for root domain creation
66daafe lib: sbi: Use scratch space to save per-HART domain pointer
fa5ad2e lib: utils/gpio: Use heap in SiFive and StartFive GPIO drivers
903e88c lib: utils/i2c: Use heap in DesignWare and SiFive I2C drivers
5a8cfcd lib: utils/ipi: Use heap in ACLINT MSWI driver
3013716 lib: utils/irqchip: Use heap in PLIC, APLIC and IMSIC drivers
7e5636a lib: utils/timer: Use heap in ACLINT MTIMER driver
3c1c972 lib: utils/fdt: Use heap in FDT domain parsing
acbd8fc lib: utils/ipi: Use scratch space to save per-HART MSWI pointer
f0516be lib: utils/timer: Use scratch space to save per-HART MTIMER pointer
b3594ac lib: utils/irqchip: Use scratch space to save per-HART PLIC pointer
1df52fa lib: utils/irqchip: Don't check hartid in imsic_update_hartid_table()
355796c lib: utils/irqchip: Use scratch space to save per-HART IMSIC pointer
524feec docs: Add OpenSBI logo and use it in the top-level README.md
932be2c README.md: Improve project copyright information
8153b26 platform/lib: Set no-map attribute on all PMP regions
d64942f firmware: Fix find hart index
27c957a lib: reset: Move fdt_reset_init into generic_early_init
8bd666a lib: sbi: check A2 register in ecall_dbcn_handler.
2552799 include: Bump-up version to 1.3
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Message-Id: <20230630160717.843044-1-bmeng@tinylab.org>
Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Pointer mask is also affected by MPRV which means cur_pmbase/pmmask
should also take MPRV into consideration. As pointer mask for instruction
is not supported currently, so we can directly update cur_pmbase/pmmask
based on address related mode and xlen affected by MPRV now.
Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230614032547.35895-3-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Commit 7f0bdfb5bf ("target/riscv/cpu.c: remove cfg setup from
riscv_cpu_init()") removed code that was enabling mmu, pmp, ext_ifencei
and ext_icsr from riscv_cpu_init(), the init() function of
TYPE_RISCV_CPU, parent type of all RISC-V CPUss. This was done to force
CPUs to explictly enable all extensions and features it requires,
without any 'magic values' that were inherited by the parent type.
This commit failed to make appropriate changes in the 'veyron-v1' CPU,
added earlier by commit e1d084a852. The result is that the veyron-v1
CPU has ext_ifencei, ext_icsr and pmp set to 'false', which is not the
case.
The reason why it took this long to notice (thanks LIU Zhiwei for
reporting it) is because Linux doesn't mind 'ifencei' and 'icsr' being
absent in the 'riscv,isa' DT, implying that they're both present if the
'i' extension is enabled. OpenSBI also doesn't error out or warns about
the lack of 'pmp', it'll just not protect memory pages.
Fix it by setting them to 'true' in rv64_veyron_v1_cpu_init() like
7f0bdfb5bf already did with other CPUs.
Reported-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Fixes: 7f0bdfb5bf ("target/riscv/cpu.c: remove cfg setup from riscv_cpu_init()")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Message-Id: <20230620152443.137079-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This patch moves the extension test functions that are used
to gate vendor extension decoders, into cpu_cfg.h.
This allows to reuse them in the disassembler.
This patch does not introduce new functionality.
However, the patch includes a small change:
The parameter for the extension test functions has been changed
from 'DisasContext*' to 'const RISCVCPUConfig*' to keep
the code in cpu_cfg.h self-contained.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Message-Id: <20230612111034.3955227-3-christoph.muellner@vrull.eu>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Disassemble function(plugin_disas, target_disas, monitor_disas) will
always call set_disas_info before disassembling instructions.
plugin_disas and target_disas will always be called under a TB, which
has the same XLEN.
We can't ensure that monitor_disas will always be called under a TB,
but current XLEN will still be a better choice, thus we can ensure at
least the disassemble of the nearest one TB is right.
Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Message-Id: <20230612111034.3955227-2-christoph.muellner@vrull.eu>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
vfio queue:
* Fixes in error handling paths of VFIO PCI devices
* Improvements of reported errors for VFIO migration
* Linux header update
* Enablement of AtomicOps completers on root ports
* Fix for unplug of passthrough AP devices
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmSrug0ACgkQUaNDx8/7
# 7KHYCRAAt6UeZi8nKPlN+cs6guOagCcAJOu13nm7XN0bFxjYf/Q2t618cpM7PLSk
# h+4VGsMUVJ1dumcCkBmv7LAn0G6CpVR3VDi5QuGfMODRhpWfSoaypPIizRgrbarL
# lSyaVaPIaddlDZ4AIfFA9Ebnytvm5/ecsyTr0cv7OejVKWI/jN6bC/v36AmNQKKQ
# J5RCDpQ6fOsdqf0Dzvn7xjuHRE4DYtsWkVoslDoBQMgPWHLF8UwRu/OPD6cBQYAR
# /fmgoOkkNDMdN3laqwAyfAUjKfOFpLuZzJ5KNFjtkBiktm66dw4Y8/lWoChVR+S6
# PRZ3nk0HxyzB96zCytfggBX905PBD54LIuockRaYKTlTxT19C3fDjDz5tsjKNhLR
# aFec4KiJaUJj0fa/Vw8DB/WUbCgbOXGHiWhY8vNdpVoc9AZe8xj9z4nB3hmzx1i/
# lZhsM/s3kTNHpVGlW7vTfbToFBmt1eoglu+ILe/HeHLi8LjzCsHy+wR5c0n0/HVI
# fLUuUS1AGQvi8+HCCUi7gwzpJkl4rPJsPx51wfXJk+q/3GQ8g9Mg9qotHNHm4N60
# zq/I5VqqEkJzdaMjup04ZqsMAWqGrnU2f4aNPvBhgaeO9CQE/buIsA34buQRwiG4
# wTodqm0jrkx0Z59jliZ0mFU/LxMvhMaQCEh+OdyZ9vRtfLBjF4c=
# =U2Hc
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Jul 2023 08:58:05 AM BST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20230710' of https://github.com/legoater/qemu:
vfio/pci: Enable AtomicOps completers on root ports
pcie: Add a PCIe capability version helper
s390x/ap: Wire up the device request notifier interface
linux-headers: update to v6.5-rc1
vfio: Fix null pointer dereference bug in vfio_bars_finalize()
vfio/migration: Return bool type for vfio_migration_realize()
vfio/migration: Remove print of "Migration disabled"
vfio/migration: Free resources when vfio_migration_realize fails
vfio/migration: Change vIOMMU blocker from global to per device
vfio/pci: Disable INTx in vfio_realize error path
hw/vfio/pci-quirks: Sanitize capability pointer
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Dynamically enable Atomic Ops completer support around realize/exit of
vfio-pci devices reporting host support for these accesses and adhering
to a minimal configuration standard. While the Atomic Ops completer
bits in the root port device capabilities2 register are read-only, the
PCIe spec does allow RO bits to change to reflect hardware state. We
take advantage of that here around the realize and exit functions of
the vfio-pci device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Robin Voetter <robin@streamhpc.com>
Tested-by: Robin Voetter <robin@streamhpc.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
vfio_realize() has the following flow:
1. vfio_bars_prepare() -- sets VFIOBAR->size.
2. msix_early_setup().
3. vfio_bars_register() -- allocates VFIOBAR->mr.
After vfio_bars_prepare() is called msix_early_setup() can fail. If it
does fail, vfio_bars_register() is never called and VFIOBAR->mr is not
allocated.
In this case, vfio_bars_finalize() is called as part of the error flow
to free the bars' resources. However, vfio_bars_finalize() calls
object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and
thus we get a null pointer dereference.
Fix it by checking VFIOBAR->mr in vfio_bars_finalize().
Fixes: 89d5202edc ("vfio/pci: Allow relocating MSI-X MMIO")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Property enable_migration supports [on/off/auto].
In ON mode, error pointer is passed to errp and logged.
In OFF mode, we doesn't need to log "Migration disabled" as it's intentional.
In AUTO mode, we should only ever see errors or warnings if the device
supports migration and an error or incompatibility occurs while further
probing or configuring it. Lack of support for migration shoundn't
generate an error or warning.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
to free resources allocated in vfio_realize(); when vfio_realize()
fails, vfio_exitfn() is never called and we need to free resources
in vfio_realize().
In the case that vfio_migration_realize() fails,
e.g: with -only-migratable & enable-migration=off, we see below:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
0000:81:11.1: Migration disabled
Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
Error: vfio 0000:81:11.1: device is already attached
That's because some references to VFIO device isn't released.
For resources allocated in vfio_migration_realize(), free them by
jumping to out_deinit path with calling a new function
vfio_migration_deinit(). For resources allocated in vfio_realize(),
free them by jumping to de-register path in vfio_realize().
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: a22651053b ("vfio: Make vfio-pci device migration capable")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Contrary to multiple device blocker which needs to consider already-attached
devices to unblock/block dynamically, the vIOMMU migration blocker is a device
specific config. Meaning it only needs to know whether the device is bypassing
or not the vIOMMU (via machine property, or per pxb-pcie::bypass_iommu), and
does not need the state of currently present devices. For this reason, the
vIOMMU global migration blocker can be consolidated into the per-device
migration blocker, allowing us to remove some unnecessary code.
This change also makes vfio_mig_active() more accurate as it doesn't check for
global blocker.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When vfio realize fails, INTx isn't disabled if it has been enabled.
This may confuse host side with unhandled interrupt report.
Fixes: c5478fea27 ("vfio/pci: Respond to KVM irqchip change notifier")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Coverity reports a tained scalar when traversing the capabilities
chain (CID 1516589). In practice I've never seen a device with a
chain so broken as to cause an issue, but it's also pretty easy to
sanitize.
Fixes: f6b30c1984 ("hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
linux-user: Fix fcntl64() and accept4() for 32-bit targets
A set of 3 patches:
The first two patches fix fcntl64() and accept4().
the 3rd patch enhances the strace output for pread64/pwrite64().
This pull request does not includes Richard's mmap2 patch:
https://patchew.org/QEMU/20230630132159.376995-1-richard.henderson@linaro.org/20230630132159.376995-12-richard.henderson@linaro.org/
Changes:
v3:
- added r-b from Richard to patches #1 and #2
v2:
- rephrased commmit logs
- return O_LARGFILE for fcntl() syscall too
- dropped #ifdefs in accept4() patch
- Dropped my mmap2() patch (former patch #3)
- added r-b from Richard to 3rd patch
Helge
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZKl5RQAKCRD3ErUQojoP
# X82sAQDnW53s7YkU4sZ1YREPWPVoCXZXgm587jTrmwT4v9AenQEAlbKdsw4hzzr/
# ptuKvgZfZaIp5QjBUl/Dh/CI5aVOLgc=
# =hd4O
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 08 Jul 2023 03:57:09 PM BST
# gpg: using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg: aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D 25F8 3E5F 3D04 A7A2 4603
# Subkey fingerprint: BCE9 123E 1AD2 9F07 C049 BBDE F712 B510 A23A 0F5F
* tag 'linux-user-fcntl64-pull-request' of https://github.com/hdeller/qemu-hppa:
linux-user: Improve strace output of pread64() and pwrite64()
linux-user: Fix accept4(SOCK_NONBLOCK) syscall
linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This method uses one uint32_t * 256 table instead of 4,
which means its data cache overhead is less.
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This implements the AES64DSM instruction. This was the last use
of aes64_operation and its support macros, so remove them all.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This implements the AESIMC instruction. We have converted everything
to crypto/aes-round.h; crypto/aes.h is no longer needed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and
SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set.
Change the qemu implementation accordingly, which means we can not use
the fcntl_flags_tbl[] translation table which allows too many other
values.
Beside the correction in behaviour, this actually fixes the accept4()
emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is
different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK).
The fix can be verified with the testcase of the debian lwt package,
which hangs forever in a read() syscall without this patch.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
When running a 32-bit guest on a 64-bit host, fcntl[64](F_GETFL) should
return with the TARGET_O_LARGEFILE flag set, because all 64-bit hosts
support large files unconditionally.
But on 64-bit hosts, O_LARGEFILE has the value 0, so the flag
translation can't be done with the fcntl_flags_tbl[]. Instead add the
TARGET_O_LARGEFILE flag afterwards.
Note that for 64-bit guests the compiler will optimize away this code,
since TARGET_O_LARGEFILE is zero.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Split these helpers so that we are not passing 'decrypt'
within the simd descriptor.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a primitive for InvSubBytes + InvShiftRows +
InvMixColumns + AddRoundKey.
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Start adding infrastructure for accelerating guest AES.
Begin with a SubBytes + ShiftRows + AddRoundKey primitive.
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These macros will constant fold and avoid the indirection through
memory when fully unrolling some new primitives.
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We do not currently have a table in crypto/ for just MixColumns.
Move both tables for consistency.
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move the code from tcg/. Fix a bug in that PPC_FEATURE2_ARCH_3_10
is actually spelled PPC_FEATURE2_ARCH_3_1.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
virt-acpi-build.c uses warn_report. However, it doesn't include
qemu/error-report.h directly, it include qemu/error-report.h via trace.h
if we enable log trace backend. But if we disable the log trace backend
(e.g., --enable-trace-backends=nop), then virt-acpi-build.c will not
include qemu/error-report.h any more and it will lead to build errors.
Include qemu/error-report.h directly in virt-acpi-build.c to avoid the
errors.
Fixes: 451b157041 ("acpi: Align the size to 128k")
Signed-off-by: Peng Liang <tcx4c70@gmail.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(mjt: move the #include higher as suggested by Ani Sinha)
The description of the options starts at column 16, so fix
this in some runaway lines for a more uniform output.
While we're at it, replace the capital "NOTE" with "Note"
since this seems to be the more common capitalization in
qemu-options.hx.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch sorts the vdpa_feature_bits array
alphabetically in ascending order to avoid future duplicates.
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This entry was duplicated on referenced commit. Removing it.
Fixes: 402378407d ("vhost-vdpa: multiqueue support")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
pci_nic_init_nofail() calls qemu_find_nic_model(), and this function
sets nd->model = g_strdup(default_model) if it has not been initialized
yet. So we don't have to set nd->model to the default_nic in the
calling sites.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This commit addresses a bug in the AVR interrupt handling code.
The modification involves replacing the usage of the ctz32 function
with ctz64 to ensure proper handling of interrupts above 33 in the AVR
target.
Previously, timers 3, 4, and 5 interrupts were not functioning correctly
because most of their interrupt vectors are numbered above 33.
Signed-off-by: Lucas Dietrich <ld.adecy@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: updated subject line to have subsytem prefix)
ppc patch queue for 2023-07-07:
In this last queue for 8.1 we have a lot of fixes and improvements all
around: SMT support for powerNV, XIVE fixes, PPC440 cleanups, exception
handling cleanups and kvm_pph.h cleanups just to name a few.
Thanks everyone in the qemu-ppc community for all the contributions for
the next QEMU 8.1 release.
# -----BEGIN PGP SIGNATURE-----
#
# iIwEABYKADQWIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCZKgihBYcZGFuaWVsaGI0
# MTNAZ21haWwuY29tAAoJEDzZypbeAzFksr0A/jrvSDSDxB5mR7bo0dNGndLXcdTo
# ZGr6k6pcMpr7RDOAAQDVeaw7f8djQ4Aaelk6v1wPs5bYfNY2ElF4NsqHJFX2Cg==
# =8lDs
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 07 Jul 2023 03:34:44 PM BST
# gpg: using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
# gpg: issuer "danielhb413@gmail.com"
# gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 17EB FF99 23D0 1800 AF28 3819 3CD9 CA96 DE03 3164
* tag 'pull-ppc-20230707-1' of https://gitlab.com/danielhb/qemu: (59 commits)
ppc/pnv: Add QME region for P10
target/ppc: Remove pointless checks of CONFIG_USER_ONLY in 'kvm_ppc.h'
target/ppc: Restrict 'kvm_ppc.h' to sysemu in cpu_init.c
target/ppc: Define TYPE_HOST_POWERPC_CPU in cpu-qom.h
target/ppc: Move CPU QOM definitions to cpu-qom.h
target/ppc: Reorder #ifdef'ry in kvm_ppc.h
target/ppc: Have 'kvm_ppc.h' include 'sysemu/kvm.h'
target/ppc: Machine check on invalid real address access on POWER9/10
tests/qtest: Add xscom tests for powernv10 machine
ppc/pnv: Set P10 core xscom region size to match hardware
ppc/pnv: Log all unimp warnings with similar message
ppc440_pcix: Rename QOM type define abd move it to common header
ppc4xx_pci: Add define for ppc4xx-host-bridge type name
ppc4xx_pci: Rename QOM type name define
ppc440_pcix: Stop using system io region for PCI bus
ppc440_pcix: Don't use iomem for regs
ppc/sam460ex: Remove address_space_mem local variable
ppc440: Remove ppc460ex_pcie_init legacy init function
ppc440: Add busnum property to PCIe controller model
ppc440: Stop using system io region for PCIe buses
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Granite Rapids CPU model
* Miscellaneous bugfixes
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSn7uYUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPi1gf+MJNyMneyyEZgBwlwgs2NYjz+cKwW
# KxtCOHDfew5S1qpq+gyvUnq5K0JJBGZKoFMwS6JwOpHASGx1o6mlF06CgLAk7wKh
# yCf1kzvRA4y3tYbSwvxD5iKV3YSsayIHuJ8q2GslVXBtAZ0xC2cREQLzKLNuEV6M
# rO4bj6QUV2fRc9u9TlurXijsdalUAEjmkIeZhtghhkD+lJo44yzcF7qAROaE3pFa
# IYEp8pTgcbJeiI0BUNFTRk0OlE5f7MT3GIQwTC34WWPO+r/uBXL5FXNqN38svugh
# 7hjOliIMU4I6jpL1t7v2+9Vs38gAEPchJ0Nly4TV+dydh7l1pIn9G7ssoA==
# =OBRZ
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 07 Jul 2023 11:54:30 AM BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
target/i386: Add new CPU model GraniteRapids
target/i386: Add few security fix bits in ARCH_CAPABILITIES into SapphireRapids CPU model
target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES
target/i386: Allow MCDT_NO if host supports
target/i386: Add support for MCDT_NO in CPUID enumeration
target/i386: Adjust feature level according to FEAT_7_1_EDX
qemu_cleanup: begin drained section after vm_shutdown()
meson.build: Remove the logic to link C code with the C++ linker
python: bump minimum requirements so they are compatible with 3.12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The GraniteRapids CPU model mainly adds the following new features
based on SapphireRapids:
- PREFETCHITI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]
And adds the following security fix for corresponding vulnerabilities:
- MCDT_NO CPUID.(EAX=7,ECX=2):EDX[bit 5]
- SBDR_SSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 13]
- FBSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 14]
- PSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 15]
- PBRSB_NO MSR_IA32_ARCH_CAPABILITIES[bit 24]
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-7-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MCDT_NO bit indicates HW contains the security fix and doesn't need to
be mitigated to avoid data-dependent behaviour for certain instructions.
It needs no hypervisor support. Treat it as supported regardless of what
KVM reports.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-4-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID.(EAX=7,ECX=2):EDX[bit 5] enumerates MCDT_NO. Processors enumerate
this bit as 1 do not exhibit MXCSR Configuration Dependent Timing (MCDT)
behavior and do not need to be mitigated to avoid data-dependent behavior
for certain instructions.
Since MCDT_NO is in a new sub-leaf, add a new CPUID feature word
FEAT_7_2_EDX. Also update cpuid_level_func7 by FEAT_7_2_EDX.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If FEAT_7_1_EAX is 0 and FEAT_7_1_EDX is non-zero, as is the case
with a Granite Rapids host and
'-cpu host,-avx-vnni,-avx512-bf16,-fzrm,-fsrs,-fsrc,-amx-fp16', we can't
get CPUID_7_1 leaf even though CPUID_7_1_EDX has non-zero value.
Update cpuid_level_func7 according to CPUID_7_1_EDX, otherwise
guest may report wrong maximum number sub-leaves in leaf 07H.
Fixes: eaaa197d5b ("target/i386: Add support for AVX-VNNI-INT8 in CPUID enumeration")
Cc: qemu-stable@nongnu.org
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-2-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
in order to avoid requests being stuck in a BlockBackend's request
queue during cleanup. Having such requests can lead to a deadlock [0]
with a virtio-scsi-pci device using iothread that's busy with IO when
initiating a shutdown with QMP 'quit'.
There is a race where such a queued request can continue sometime
(maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1].
The completion will hold the AioContext lock and wait for the BQL
during SCSI completion, but the main thread will hold the BQL and
wait for the AioContext as part of bdrv_root_unref_child(), leading to
the deadlock [0].
[0]:
> Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"):
> #0 __lll_lock_wait (futex=futex@entry=0x564183365f00 <qemu_global_mutex>, private=0) at lowlevellock.c:52
> #1 0x00007f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 <qemu_global_mutex>, file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../util/qemu-thread-posix.c:94
> #3 0x000056418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504
> #4 0x00005641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at ../softmmu/physmem.c:2593
> #5 0x00005641826d6fe7 in address_space_stl_internal (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318
> #6 0x00005641826d7154 in address_space_stl_le (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0) at /home/febner/repos/qemu/memory_ldst.c.inc:357
> #7 0x0000564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at ../hw/pci/pci.c:359
> #8 0x000056418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at ../hw/pci/msi.c:379
> #9 0x0000564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at ../hw/pci/msix.c:542
> #10 0x000056418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at ../hw/virtio/virtio-pci.c:77
> #11 0x00005641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, vector=8) at ../hw/virtio/virtio.c:1985
> #12 0x00005641826948d6 in virtio_irq (vq=0x5641867ac078) at ../hw/virtio/virtio.c:2461
> #13 0x0000564182694978 in virtio_notify (vdev=0x5641867a34a0, vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473
> #14 0x0000564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:115
> #15 0x00005641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:641
> #16 0x000056418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, resid=0) at ../hw/scsi/virtio-scsi.c:712
> #17 0x000056418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at ../hw/scsi/scsi-bus.c:1526
> #18 0x000056418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:242
> #19 0x000056418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265
> #20 0x000056418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:340
> #21 0x000056418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:371
> #22 0x00005641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:107
> #23 0x0000564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:127
> #24 0x00005641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at ../block/block-backend.c:1563
> #25 0x00005641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at ../block/block-backend.c:1630
> #26 0x000056418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at ../util/coroutine-ucontext.c:177
> #27 0x00007f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007f3bbd8757f0 in ?? ()
> #29 0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"):
> #0 __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at lowlevellock.c:52
> #1 0x00007f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at ../nptl/pthread_mutex_lock.c:115
> #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
> #3 0x000056418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
> #4 0x00005641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at ../util/async.c:728
> #5 0x000056418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) at ../block.c:7493
> #6 0x000056418294e288 in tran_commit (tran=0x56418630bfe0) at ../util/transactions.c:87
> #7 0x000056418279d880 in bdrv_try_change_aio_context (bs=0x5641856f7130, ctx=0x56418548f810, ignore_child=0x0, errp=0x0) at ../block.c:7626
> #8 0x0000564182793f39 in bdrv_root_unref_child (child=0x5641856f47d0) at ../block.c:3242
> #9 0x00005641827be137 in blk_remove_bs (blk=0x564185709880) at ../block/block-backend.c:914
> #10 0x00005641827bd689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #11 0x0000564182798699 in bdrv_close_all () at ../block.c:5117
> #12 0x000056418248a5b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #13 0x0000564182738603 in qemu_default_main () at ../softmmu/main.c:38
> #14 0x0000564182738631 in main (argc=30, argv=0x7ffd675a8a48) at ../softmmu/main.c:48
>
> (gdb) p *((QemuMutex*)0x5641856f2a00)
> $1 = {lock = {__data = {__lock = 2, __count = 2, __owner = 135952, ...
> (gdb) p *((QemuMutex*)0x564183365f00)
> $2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 135944, ...
[1]:
> Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_drain_all_end () at ../block/io.c:551
> #0 bdrv_drain_all_end () at ../block/io.c:551
> #1 0x00005569810f0376 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:156
> #2 0x00005569810bd3e0 in bdrv_replace_child_noperm (child=0x556982e2d7d0, new_bs=0x0) at ../block.c:2897
> #3 0x00005569810bdef2 in bdrv_root_unref_child (child=0x556982e2d7d0) at ../block.c:3227
> #4 0x00005569810e8137 in blk_remove_bs (blk=0x556982e42880) at ../block/block-backend.c:914
> #5 0x00005569810e7689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #6 0x00005569810c2699 in bdrv_close_all () at ../block.c:5117
> #7 0x0000556980db45b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #8 0x0000556981062603 in qemu_default_main () at ../softmmu/main.c:38
> #9 0x0000556981062631 in main (argc=30, argv=0x7ffd7a82a418) at ../softmmu/main.c:48
> [Switching to Thread 0x7fe76dab2700 (LWP 103649)]
>
> Thread 3 "qemu-system-x86" hit Breakpoint 4, blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #0 blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #1 0x00005569810e8f36 in blk_wait_while_drained (blk=0x556982e42880) at ../block/block-backend.c:1312
> #2 0x00005569810e9231 in blk_co_do_pwritev_part (blk=0x556982e42880, offset=3422961664, bytes=4096, qiov=0x556983028060, qiov_offset=0, flags=0) at ../block/block-backend.c:1402
> #3 0x00005569810e9a4b in blk_aio_write_entry (opaque=0x556982e2cfa0) at ../block/block-backend.c:1628
> #4 0x000055698128038a in coroutine_trampoline (i0=-2090057872, i1=21865) at ../util/coroutine-ucontext.c:177
> #5 0x00007fe770f50d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #6 0x00007ffd7a829570 in ?? ()
> #7 0x0000000000000000 in ?? ()
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20230706131418.423713-1-f.ebner@proxmox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are not mixing C++ with C code anymore, the only remaining
C++ code in qga/vss-win32/ is used for a plain C++ executable.
Thus we can remove the hacks for linking C code with the C++ linker
now to simplify meson.build a little bit, and also to avoid that
some C++ code sneaks in by accident again.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230706064736.178962-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are many Python 3.12 issues right now, but a particularly
problematic one when debugging them is that one cannot even use
minreqs.txt in a Python 3.12 virtual environment to test with
locked package versions.
Bump the mypy and wrapt versions to fix this, while remaining
within the realm of versions compatible with Python 3.7.
This requires a workaround for a mypy false positive
qemu/qmp/qmp_tui.py:350: error: Non-overlapping equality check (left operand type: "Literal[Runstate.DISCONNECTING]", right operand type: "Literal[Runstate.IDLE]") [comparison-overlap]
where mypy does not realize that self.disconnect() could change
the value of self.runstate.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Quad Management Engine (QME) manages power related settings for its
quad. The xscom region is separate from the quad xscoms, therefore a new
region is added. The xscoms in a QME select a given core by selecting
the forth nibble.
Implement dummy reads for the stop state history (SSH) and special
wakeup (SPWU) registers. This quietens some sxcom errors when skiboot
boots on p10.
Power9 does not have a QME.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Message-ID: <20230707071213.9924-1-joel@jms.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJkp86uAAoJEO8Ells5jWIRX00H/1T20eOfMZ+8ZyO32P1DBl5U
# ZQNl5/rcg5cqjatragwagAHGYzmoegJlY3/JbWju09SPtsgbMT/nQI6EFDfpTHb6
# 9HB2h+43eHq+OBpmPPsmqVRzjuNi9lUmJ20We4aqJe/VM4/DHMtKW3EXGmORb7cF
# wjazN5FVn+YQHgA+pckQ79k6h/lJhtLv+MuainS12o8yyCO8OyqP6Bm4lYPbBNpb
# Im3HXiv05gFuS2P4lD8ZvjcdWalHDzDZW4RzKHlpcic0GBN/rcU3FDqGeOIP8qWL
# oxokpjd2QmW1rX/TwaweiObEjo/3n7ymRu5PofE3T7e+gnAVfAyqDxrgAU6fMjA=
# =CGHw
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 07 Jul 2023 09:37:02 AM BST
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* tag 'net-pull-request' of https://github.com/jasowang/qemu:
igb: Remove obsolete workaround for Windows
e1000e: Add ICR clearing by corresponding IMS bit
net: socket: remove net_init_socket()
net: socket: move fd type checking to its own function
net: socket: prepare to cleanup net_init_socket()
hw/net: ftgmac100: Drop the small packet check in the receive path
hw/net: sunhme: Remove the logic of padding short frames in the receive path
hw/net: sungem: Remove the logic of padding short frames in the receive path
hw/net: rtl8139: Remove the logic of padding short frames in the receive path
hw/net: pcnet: Remove the logic of padding short frames in the receive path
hw/net: ne2000: Remove the logic of padding short frames in the receive path
hw/net: i82596: Remove the logic of padding short frames in the receive path
hw/net: vmxnet3: Remove the logic of padding short frames in the receive path
hw/net: e1000: Remove the logic of padding short frames in the receive path
virtio-net: correctly report maximum tx_queue_size value
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
I confirmed it works with Windows even without this workaround. It is
likely to be a mistake so remove it.
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The datasheet does not say what happens when interrupt was asserted
(ICR.INT_ASSERT=1) and auto mask is *not* active.
However, section of 13.3.27 the PCIe* GbE Controllers Open Source
Software Developer’s Manual, which were written for older devices,
namely 631xESB/632xESB, 82563EB/82564EB, 82571EB/82572EI &
82573E/82573V/82573L, does say:
> If IMS = 0b, then the ICR register is always clear-on-read. If IMS is
> not 0b, but some ICR bit is set where the corresponding IMS bit is not
> set, then a read does not clear the ICR register. For example, if
> IMS = 10101010b and ICR = 01010101b, then a read to the ICR register
> does not clear it. If IMS = 10101010b and ICR = 0101011b, then a read
> to the ICR register clears it entirely (ICR.INT_ASSERTED = 1b).
Linux does no longer activate auto mask since commit
0a8047ac68e50e4ccbadcfc6b6b070805b976885 and the real hardware clears
ICR even in such a case so we also should do so.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Move the file descriptor type checking before doing anything with it.
If it's not usable, don't close it as it could be in use by another
part of QEMU, only fail and report an error.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Use directly net_socket_fd_init_stream() and net_socket_fd_init_dgram()
when the socket type is already known.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, the small packet check logic in the receive
path is no longer needed.
Suggested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
This actually reverts commit 40a87c6c9b.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we have implemented unified short frames padding in the
QEMU networking codes, remove the same logic in the NIC codes.
This actually reverts commit 78aeb23ede.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Maximum value for tx_queue_size depends on the backend type.
1024 for vDPA/vhost-user, 256 for all the others.
The value is returned by virtio_net_max_tx_queue_size() to set the
parameter:
n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n),
n->net_conf.tx_queue_size);
But the parameter checking uses VIRTQUEUE_MAX_SIZE (1024).
So the parameter is silently ignored and ethtool reports a different
value than the one provided by the user.
... -netdev tap,... -device virtio-net,tx_queue_size=1024
# ethtool -g enp0s2
Ring parameters for enp0s2:
Pre-set maximums:
RX: 256
RX Mini: n/a
RX Jumbo: n/a
TX: 256
Current hardware settings:
RX: 256
RX Mini: n/a
RX Jumbo: n/a
TX: 256
... -netdev vhost-user,... -device virtio-net,tx_queue_size=2048
Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024
With this patch the correct maximum value is checked and displayed.
For vDPA/vhost-user:
Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024
For all the others:
Invalid tx_queue_size (= 512), must be a power of 2 between 256 and 256
Fixes: 2eef278b9e ("virtio-net: fix tx queue size for !vhost-user")
Cc: mst@redhat.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The P10 core xscom memory regions overlap because the size is wrong.
The P10 core+L2 xscom region size is allocated as 0x1000 (with some
unused ranges). "EC" is used as a closer match, as "EX" includes L3
which has a disjoint xscom range that would require a different
region if it were implemented.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230706053923.115003-2-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Add the function name so there's an indication as to where the message
is coming from. Change all prints to use the offset instead of the
address.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230706024528.40065-1-joel@jms.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Set the TIR default value with the SMT thread index, and place some
standard limits on SMT configurations. Now powernv is able to boot
skiboot and Linux with a SMT topology, including booting a KVM guest.
There are several SPRs and other features (e.g., broadcast msgsnd)
that are not implemented, but not used by OPAL or Linux and can be
added incrementally.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230705120631.27670-4-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The Power ISA has the concept of sub-processors:
Hardware is allowed to sub-divide a multi-threaded processor into
"sub-processors" that appear to privileged programs as multi-threaded
processors with fewer threads.
POWER9 and POWER10 have two modes, either every thread is a
sub-processor or all threads appear as one multi-threaded processor. In
the user manuals these are known as "LPAR per thread" / "Thread LPAR",
and "LPAR per core" / "1 LPAR", respectively.
The practical difference is: in thread LPAR mode, non-hypervisor SPRs
are not shared between threads and msgsndp can not be used to message
siblings. In 1 LPAR mode, some SPRs are shared and msgsndp is usable.
Thrad LPAR allows multiple partitions to run concurrently on the same
core, and is a requirement for KVM to run on POWER9/10 (which does not
gang-schedule an LPAR on all threads of a core like POWER8 KVM).
Traditionally, SMT in PAPR environments including PowerVM and the
pseries QEMU machine with KVM acceleration behaves as in 1 LPAR mode.
In OPAL systems, Thread LPAR is used. When adding SMT to the powernv
machine, it is therefore preferable to emulate Thread LPAR.
To account for this difference between pseries and powernv, an LPAR mode
flag is added such that SPRs can be implemented as per-LPAR shared, and
that becomes either per-thread or per-core depending on the flag.
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230705120631.27670-2-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The low-level functions to access the TIMA take a presenter object as
a first argument. When accessing the TIMA from the IC BAR,
i.e. indirect calls, we currently pass a NULL pointer for the
presenter argument. While it appears ok with the current usage, it's
dangerous. And it's pretty easy to figure out the presenter in that
context, so this patch fixes it.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230705081400.218408-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Add the CPU target in the trace when reading/writing the TIMA
space. It was already done for other TIMA ops (notify, accept, ...),
only missing for those 2. Useful for debug and even more now that we
experiment with SMT.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230705110039.231148-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We currently only allow 64-bit operations on the ESB CI pages. There's
no real reason for that limitation, skiboot/linux didn't need
more. However the hardware supports any size, so this patch relaxes
that restriction. It impacts both the ESB pages for "normal"
interrupts as well as the ESB pages for escalation interrupts defined
for the ENDs.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230704144848.164287-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Add a PnvQuad class for the P10 powernv machine. No xscoms are
implemented yet, but this allows them to be added.
The size is reduced to avoid the quad region from overlapping with the
core region.
address-space: xscom-0
0000000000000000-00000003ffffffff (prio 0, i/o): xscom-0
0000000100000000-00000001000fffff (prio 0, i/o): xscom-quad.0
0000000100108000-0000000100907fff (prio 0, i/o): xscom-core.3
0000000100110000-000000010090ffff (prio 0, i/o): xscom-core.2
0000000100120000-000000010091ffff (prio 0, i/o): xscom-core.1
0000000100140000-000000010093ffff (prio 0, i/o): xscom-core.0
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-ID: <20230704054204.168547-4-joel@jms.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
On the powernv9 and powernv10 machines, the PSIHB interrupts are
currently initialized with a PQ state of 0b01, i.e. interrupts are
disabled. However real hardware initializes them to 0b00 for the
PSIHB. This patch updates it, in case an hypervisor is in the mood of
checking it.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230703081215.55252-3-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The PQ state of a xive interrupt is always initialized to Q=1, which
means the interrupt is disabled. Since a xive source can be embedded
in many objects, this patch adds a property to allow that behavior to
be refined if needed.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230703081215.55252-2-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Direct TIMA operations can be done through 4 pages, each with a
different privilege level dictating what fields can be accessed. On
the other hand, indirect TIMA accesses on P10 are done through a
single page, which is the equivalent of the most privileged page of
direct TIMA accesses.
The offset in the IC bar of an indirect access specifies what hw
thread is targeted (page shift bits) and the offset in the
TIMA being accessed (the page offset bits). When the indirect
access is calling the underlying direct access functions, it is
therefore important to clearly separate the 2, as the direct functions
assume any page shift bits define the privilege ring level. For
indirect accesses, those bits must be 0. This patch fixes the offset
passed to direct TIMA functions.
It didn't matter for SMT1, as the 2 least significant bits of the page
shift are part of the hw thread ID and always 0, so the direct TIMA
functions were accessing the privilege ring 0 page. With SMT4/8, it is
no longer true.
The fix is specific to P10, as indirect TIMA access on P9 was handled
differently.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230703080858.54060-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Currently on PPC64 qemu always dumps the guest memory in
Big Endian (BE) format even though the guest running in Little Endian
(LE) mode. So crash tool fails to load the dump as illustrated below:
Log :
$ virsh dump DOMAIN --memory-only dump.file
Domain 'DOMAIN' dumped to dump.file
$ crash vmlinux dump.file
<snip>
crash 8.0.2-1.el9
WARNING: endian mismatch:
crash utility: little-endian
dump.file: big-endian
WARNING: machine type mismatch:
crash utility: PPC64
dump.file: (unknown)
crash: dump.file: not a supported file format
<snip>
This happens because cpu_get_dump_info() passes cpu->env->has_hv_mode
to function ppc_interrupts_little_endian(), the cpu->env->has_hv_mode
always set for powerNV even though the guest is not running in hv mode.
The hv mode should be taken from msr_mask MSR_HVB bit
(cpu->env.msr_mask & MSR_HVB). This patch fixes the issue by passing
MSR_HVB value to ppc_interrupts_little_endian() in order to determine
the guest endianness.
The crash tool also expects guest kernel endianness should match the
endianness of the dump.
The patch was tested on POWER9 box booted with Linux as host in
following cases:
Host-Endianess Qemu-Target-Machine Qemu-Generated-Guest
Memory-Dump-Format
BE powernv(OPAL/PowerNV) LE
BE powernv(OPAL/PowerNV) BE
LE powernv(OPAL/PowerNV) LE
LE powernv(OPAL/PowerNV) BE
LE pseries(OPAL/PowerNV/pSeries) KVMHV LE
LE pseries TCG LE
Fixes: 5609400a42 ("target/ppc: Set the correct endianness for powernv memory
dumps")
Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Message-ID: <20230623072506.34713-1-nnmlinux@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Booting linux on the powernv10 machine logs a few errors like:
Invalid read at addr 0x38, size 1, region 'xive-ic-tm-indirect', reason: invalid size (min:8 max:8)
Invalid write at addr 0x38, size 1, region 'xive-ic-tm-indirect', reason: invalid size (min:8 max:8)
Invalid read at addr 0x38, size 1, region 'xive-ic-tm-indirect', reason: invalid size (min:8 max:8)
Those errors happen when linux is resetting XIVE. We're trying to
read/write the enablement bit for the hardware context and qemu
doesn't allow indirect TIMA accesses of less than 8 bytes. Direct TIMA
access can go through though, as well as indirect TIMA accesses on P9.
So even though there are some restrictions regarding the address/size
combinations for TIMA access, the example above is perfectly valid.
This patch lets indirect TIMA accesses of all sizes go through. The
special operations will be intercepted and the default "raw" handlers
will pick up all other requests and complain about invalid sizes as
appropriate.
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230626094057.1192473-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Apple sungem devices are expected to have WOL MMIO registers.
Add a region to prevent transaction failures, and implement the
WOL-disable CSR write because the Linux driver reset writes
this.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-ID: <20230625201628.65231-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
TFMR is the Time Facility Management Register which is specific to
POWER CPUs, and used for the purpose of timebase management (generally
by firmware, not the OS).
Add helpers for the TFMR register, which will form part of the core
timebase facility model in future but for now behaviour is unchanged.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230625120317.13877-3-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
POWER book4 (implementation-specific) SPRs are sometimes in their own
functions, but in other cases are mixed with architected SPRs. Do some
spring cleaning on these.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230625120317.13877-2-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We don't emulate the gigabit ethernet part of the chip but the MorphOS
driver accesses these and expects to get some valid looking result
otherwise it hangs. Add some minimal dummy implementation to avoid rhis.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230605215145.29458746335@zero.eik.bme.hu>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The clock update logic reads the clock twice to compute the new clock
value, with a value derived from the later time subtracted from a value
derived from the earlier time. The delta causes time to be lost.
This can ultimately result in time becoming unsynchronized between CPUs
and that can cause OS lockups, timeouts, watchdogs, etc. This can be
seen running a KVM guest (that causes lots of TB updates) on a powernv
SMP machine.
Fix this by reading the clock once.
Cc: qemu-stable@nongnu.org
Fixes: dbdd25065e ("Implement time-base start/stop helpers.")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-ID: <20230629020713.327745-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
HDEC interrupts are edge-triggered on HDECR underflow (notably different
from DEC which is level-triggered).
HDEC interrupts already clear the irq on delivery so that does not need
to be changed.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230625122045.15544-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
skiboot only uses mmio to access the PSI registers (once the BAR is
set) but we don't have any reason to block the accesses through
xscom. This patch enables xscom access to the PSI registers. It
converts the xscom addresses to mmio addresses, which requires a bit
of care for the PSIHB, then reuse the existing mmio ops.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230630102609.193214-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If you build QEMU with the clang sanitizer enabled, you can see it
fire when running the arm-cpu-features test:
$ QTEST_QEMU_BINARY=./build/arm-clang/qemu-system-aarch64 ./build/arm-clang/tests/qtest/arm-cpu-features
[...]
../../target/arm/cpu64.c:125:19: runtime error: shift exponent 64 is too large for 64-bit type 'unsigned long long'
[...]
This happens because the user can specify some incorrect SVE
properties that result in our calculating a max_vq of 0. We catch
this and error out, but before we do that we calculate
vq_mask = MAKE_64BIT_MASK(0, max_vq);$
and the MAKE_64BIT_MASK() call is only valid for lengths that are
greater than zero, so we hit the undefined behaviour.
Change the logic so that if max_vq is 0 we specifically set vq_mask
to 0 without going via MAKE_64BIT_MASK(). This lets us drop the
max_vq check from the error-exit logic, because if max_vq is 0 then
vq_map must now be 0.
The UB only happens in the case where the user passed us an incorrect
set of SVE properties, so it's not a big problem in practice.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230704154332.3014896-1-peter.maydell@linaro.org
We already squash the ID register field for FEAT_SPE (the Statistical
Profiling Extension) because TCG does not implement it and if we
advertise it to the guest the guest will crash trying to look at
non-existent system registers. Do the same for some other features
which a real hardware Neoverse-V1 implements but which TCG doesn't:
* FEAT_TRF (Self-hosted Trace Extension)
* Trace Macrocell system register access
* Memory mapped trace
* FEAT_AMU (Activity Monitors Extension)
* FEAT_MPAM (Memory Partitioning and Monitoring Extension)
* FEAT_NV (Nested Virtualization)
Most of these, like FEAT_SPE, are "introspection/trace" type features
which QEMU is unlikely to ever implement. The odd-one-out here is
FEAT_NV -- we could implement that and at some point we probably
will.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230704130647.2842917-2-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
In handle_interrupt() we use level as an index into the interrupt_vector[]
array. This is safe because we have checked it against env->config->nlevel,
but Coverity can't see that (and it is only true because each CPU config
sets its XCHAL_NUM_INTLEVELS to something less than MAX_NLEVELS), so it
complains about a possible array overrun (CID 1507131)
Add an assert() which will make Coverity happy and catch the unlikely
case of a mis-set XCHAL_NUM_INTLEVELS in future.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Message-id: 20230623154135.1930261-1-peter.maydell@linaro.org
This code is only relevant when TCG is present in the build. Building
with --disable-tcg --enable-xen on an x86 host we get:
$ ../configure --target-list=x86_64-softmmu,aarch64-softmmu --disable-tcg --enable-xen
$ make -j$(nproc)
...
libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `m_sysreg_ptr':
../target/arm/gdbstub.c:358: undefined reference to `arm_v7m_get_sp_ptr'
../target/arm/gdbstub.c:361: undefined reference to `arm_v7m_get_sp_ptr'
libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `arm_gdb_get_m_systemreg':
../target/arm/gdbstub.c:405: undefined reference to `arm_v7m_mrs_control'
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20230628164821.16771-1-farosas@suse.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Following are done to fix the coverity issues:
1. Change read_data to fix the CID 1512899: Out-of-bounds access (OVERRUN)
2. Fix match_rx_tx_data to fix CID 1512900: Logically dead code (DEADCODE)
3. Replace rand() in generate_random_data() with g_rand_int()
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Message-id: 20230628202758.16398-1-vikram.garhwal@amd.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Unlike architectures with precise self-modifying code semantics
(e.g. x86) ARM processors do not maintain coherency for instruction
execution and memory, requiring an instruction synchronization
barrier on every core that will execute the new code, and on many
models also the explicit use of cache management instructions.
While this is required to make JITs work on actual hardware, QEMU
has gotten away with not handling this since it does not emulate
caches, and unconditionally invalidates code whenever the softmmu
or the user-mode page protection logic detects that code has been
modified.
Unfortunately the latter does not work in the face of dual-mapped
code (a common W^X workaround), where one page is executable and
the other is writable: user-mode has no way to connect one with the
other as that is only known to the kernel and the emulated
application.
This commit works around the issue by telling software that
instruction cache invalidation is required by clearing the
CPR_EL0.DIC flag (regardless of whether the emulated processor
needs it), and then invalidating code in IC IVAU instructions.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1034
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: John Högberg <john.hogberg@ericsson.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 168778890374.24232.3402138851538068785-1@git.sr.ht
[PMM: removed unnecessary AArch64 feature check; moved
"clear CTR_EL1.DIC" code up a bit so it's not in the middle
of the vfp/neon related tests]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Some assemblers will complain about attempts to access
id_aa64zfr0_el1 and id_aa64smfr0_el1 by name if the test
binary isn't built for the right processor type:
/tmp/ccASXpLo.s:782: Error: selected processor does not support system register name 'id_aa64zfr0_el1'
/tmp/ccASXpLo.s:829: Error: selected processor does not support system register name 'id_aa64smfr0_el1'
However, these registers are in the ID space and are guaranteed to
read-as-zero on older CPUs, so the access is both safe and sensible.
Switch to using the S syntax, as we already do for ID_AA64ISAR2_EL1
and ID_AA64MMFR2_EL1. This allows us to drop the HAS_ARMV9_SME check
and the makefile machinery to adjust the CFLAGS for this test, so we
don't rely on having a sufficiently new compiler to be able to check
these registers.
This means we're actually testing the SME ID register: no released
GCC yet recognizes -march=armv9-a+sme, so that was always skipped.
It also avoids a future problem if we try to switch the "do we have
SME support in the toolchain" check from "in the compiler" to "in the
assembler" (at which point we would otherwise run into the above
errors).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
As recent CVE-2023-2861 (fixed by f6b0de53fb) once again showed, the 9p
'proxy' fs driver is in bad shape. Using the 'proxy' backend was already
discouraged for safety reasons before and we recommended to use the
'local' backend (preferably in conjunction with its 'mapped' security
model) instead, but now it is time to officially deprecate the 'proxy'
backend.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1qDkmw-0007M1-8f@lizzy.crudebyte.com>
When QEMU is built with --enable-modules, the module_block.py script
parses block/*.c to find block drivers that are built as modules. The
script generates a table of block drivers called block_driver_modules[].
This table is used for block driver module loading.
The blkio.c driver uses macros to define its BlockDriver structs. This
was done to avoid code duplication but the module_block.py script is
unable to parse the macro. The result is that libblkio-based block
drivers can be built as modules but will not be found at runtime.
One fix is to make the module_block.py script or build system fancier so
it can parse C macros (e.g. by parsing the preprocessed source code). I
chose not to do this because it raises the complexity of the build,
making future issues harder to debug.
Keep things simple: use the macro to avoid duplicating BlockDriver
function pointers but define .format_name and .protocol_name manually
for each BlockDriver. This way the module_block.py is able to parse the
code.
Also get rid of the block driver name macros (e.g. DRIVER_IO_URING)
because module_block.py cannot parse them either.
Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230704123436.187761-1-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The current sbsa-ref cannot use EHCI controller which is only
able to do 32-bit DMA, since sbsa-ref doesn't have RAM below 4GB.
Hence, this uses XHCI to provide a usb controller with 64-bit
DMA capablity instead of EHCI.
We bump the platform version to 0.3 with this change. Although the
hardware at the USB controller address changes, the firmware and
Linux can both cope with this -- on an older non-XHCI-aware
firmware/kernel setup the probe routine simply fails and the guest
proceeds without any USB. (This isn't a loss of functionality,
because the old USB controller never worked in the first place.) So
we can call this a backwards-compatible change and only bump the
minor version.
Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Message-id: 20230621103847.447508-2-wangyuquan1236@phytium.com.cn
[PMM: tweaked commit message; add line to docs about what
changes in platform version 0.3]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Some registers whose 'cooked' writefns induce TLB maintenance do
not have raw_writefn ops defined. If only the writefn ops is set
(ie. no raw_writefn is provided), it is assumed the cooked also
work as the raw one. For those registers it is not obvious the
tlb_flush works on KVM mode so better/safer setting the raw write.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
maintainer updates: testing, fuzz, plugins, docs, gdbstub
- clean up gitlab artefact handling
- ensure gitlab publishes artefacts with coverage data
- reduce testing scope for coverage job
- mention CI pipeline in developer docs
- add ability to add plugin args to check-tcg
- fix some memory leaks and UB in tests
- suppress xcb leaks from fuzzing output
- add a test-fuzz to mirror the CI run
- allow lci-refresh to be run in $SRC
- update lcitool to latest version
- add qemu-minimal package set with gcc-native
- convert riscv64-cross to lcitool
- update sbsa-ref tests
- don't include arm_casq_ptw emulation unless TCG
- convert plugins to use g_memdup2
- ensure plugins instrument SVE helper mem access
- improve documentation of QOM/QDEV
- make gdbstub send stop responses when it should
- report user-mode pid in gdbstub
- add support for info proc mappings in gdbstub
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmSiuH4ACgkQ+9DbCVqe
# KkRt0Qf+N0oD/VuEcRSxK1bWlLtf5nxQpPKKzkRItPc5jqJnLWa/gh21sfQgs5Uq
# BczAT+JfgTnMozbq0mjvQ+uAGI4MHzBs+UAn60+ZcXfk2inyk77XKBEoHOFuK1ry
# rgQ4+p21/hcZedDiDLnLSfbGfUU0KkM/pbAegOz7HO0EQDV0CSXqeAW3WAuM1lne
# +YmXkKwoFI1V8HvslzCT12GFiaUfmSSBtASqWcf67Ief97K24+rpkAVM7JChLm5X
# fC1MOFNuNYV+jO+9U3KIs15P1WH12oMcpNUY+KqQ5ZWovBg83yOLtKY1o3f6Z2Y+
# iQgFJr6F8ZVBdKNJtqVi8DkbiFfbsA==
# =Ho/h
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 03 Jul 2023 02:01:02 PM CEST
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* tag 'pull-maintainer-ominbus-030723-1' of https://gitlab.com/stsquad/qemu: (38 commits)
tests/tcg: Add a test for info proc mappings
docs: Document security implications of debugging
gdbstub: Add support for info proc mappings
gdbstub: Report the actual qemu-user pid
gdbstub: Expose gdb_get_process() and gdb_get_first_cpu_in_process()
linux-user: Emulate /proc/self/smaps
linux-user: Add "safe" parameter to do_guest_openat()
linux-user: Expose do_guest_openat() and do_guest_readlink()
gdbstub: clean-up vcont handling to avoid goto
gdbstub: Permit reverse step/break to provide stop response
gdbstub: lightly refactor connection to avoid snprintf
docs/devel: introduce some key concepts for QOM development
docs/devel: split qom-api reference into new file
docs/devel/qom.rst: Correct code style
include/hw/qdev-core: fixup kerneldoc annotations
include/migration: mark vmstate_register() as a legacy function
docs/devel: add some front matter to the devel index
plugins: update lockstep to use g_memdup2
plugins: fix memory leak while parsing options
plugins: force slow path when plugins instrument memory ops
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Currently the GDB's generate-core-file command doesn't work well with
qemu-user: the resulting dumps are huge [1] and at the same time
incomplete (argv and envp are missing). The reason is that GDB has no
access to proc mappings and therefore has to fall back to using
heuristics for discovering them. This is, in turn, because qemu-user
does not implement the Host I/O feature of the GDB Remote Serial
Protocol.
Implement vFile:{open,close,pread,readlink} and also
qXfer:exec-file:read+. With that, generate-core-file begins to work on
aarch64 and s390x.
[1] https://sourceware.org/pipermail/gdb-patches/2023-May/199432.html
Co-developed-by: Dominik 'Disconnect3d' Czarnota <dominik.b.czarnota@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230621203627.1808446-7-iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-37-alex.bennee@linaro.org>
/proc/self/smaps is an extension of /proc/self/maps: it provides the
same lines, plus additional information about each range.
GDB uses /proc/self/smaps when available, which means that
generate-core-file tries it first before falling back to
/proc/self/maps. This, in turn, causes it to dump the host mappings,
since /proc/self/smaps is not emulated and is just passed through.
Fix by emulating /proc/self/smaps. Provide true values only for
Size, KernelPageSize, MMUPageSize and VmFlags. Leave all other values
at 0, which is a valid conservative estimate.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230621203627.1808446-4-iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-34-alex.bennee@linaro.org>
Using QOM correctly is increasingly important to maintaining a modern
code base. However the current documentation skips some important
concepts before launching into a simple example. Lets:
- at least mention properties
- mention TYPE_OBJECT and TYPE_DEVICE
- talk about why we have realize/unrealize
- mention the QOM tree
- lightly re-arrange the order we mention things
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-28-alex.bennee@linaro.org>
Lets try and keep the overview of the sub-system digestible by
splitting the core API stuff into a separate file. As QOM and QDEV
work together we should also try and enumerate the qdev_ functions.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-27-alex.bennee@linaro.org>
Fix up the kerneldoc markup and start documenting the various fields
in QDEV related structures. This involved:
- moving overall description to a DOC: comment at top
- fixing various markup issues for types and structures
- adding missing Return: statements
- adding some typedefs to hide QLIST macros in headers
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-25-alex.bennee@linaro.org>
It was hard to track down this leak as it was an internal allocation
by glib and the backtraces did not give much away. The autofree was
freeing the allocation with g_free() but not taking care of the
individual strings. They should have been freed with g_strfreev()
instead.
Searching the glib source code for the correct string free function
led to:
G_DEFINE_AUTO_CLEANUP_FREE_FUNC(GStrv, g_strfreev, NULL)
and indeed if you read to the bottom of the documentation page you
will find:
typedef gchar** GStrv;
A typedef alias for gchar**. This is mostly useful when used together with g_auto().
So fix up all the g_autofree g_strsplit case that smugly thought they
had de-allocation covered.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-21-alex.bennee@linaro.org>
The lack of SVE memory instrumentation has been an omission in plugin
handling since it was introduced. Fortunately we can utilise the
probe_* functions to force all all memory access to follow the slow
path. We do this by checking the access type and presence of plugin
memory callbacks and if set return the TLB_MMIO flag.
We have to jump through a few hoops in user mode to re-use the flag
but it was the desired effect:
./qemu-system-aarch64 -display none -serial mon:stdio \
-M virt -cpu max -semihosting-config enable=on \
-kernel ./tests/tcg/aarch64-softmmu/memory-sve \
-plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin
gives (disas doesn't currently understand st1w):
0, 0x40001808, 0xe54342a0, ".byte 0xa0, 0x42, 0x43, 0xe5", store, 0x40213010, RAM, store, 0x40213014, RAM, store, 0x40213018, RAM
And for user-mode:
./qemu-aarch64 \
-plugin contrib/plugins/libexeclog.so,afilter=0x4007c0 \
-d plugin \
./tests/tcg/aarch64-linux-user/sha512-sve
gives:
1..10
ok 1 - do_test(&tests[i])
0, 0x4007c0, 0xa4004b80, ".byte 0x80, 0x4b, 0x00, 0xa4", load, 0x5500800370, load, 0x5500800371, load, 0x5500800372, load, 0x5500800373, load, 0x5500800374, load, 0x5500800375, load, 0x5500800376, load, 0x5500800377, load, 0x5500800378, load, 0x5500800379, load, 0x550080037a, load, 0x550080037b, load, 0x550080037c, load, 0x550080037d, load, 0x550080037e, load, 0x550080037f, load, 0x5500800380, load, 0x5500800381, load, 0x5500800382, load, 0x5500800383, load, 0x5500800384, load, 0x5500800385, load, 0x5500800386, lo
ad, 0x5500800387, load, 0x5500800388, load, 0x5500800389, load, 0x550080038a, load, 0x550080038b, load, 0x550080038c, load, 0x550080038d, load, 0x550080038e, load, 0x550080038f, load, 0x5500800390, load, 0x5500800391, load, 0x5500800392, load, 0x5500800393, load, 0x5500800394, load, 0x5500800395, load, 0x5500800396, load, 0x5500800397, load, 0x5500800398, load, 0x5500800399, load, 0x550080039a, load, 0x550080039b, load, 0x550080039c, load, 0x550080039d, load, 0x550080039e, load, 0x550080039f, load, 0x55008003a0, load, 0x55008003a1, load, 0x55008003a2, load, 0x55008003a3, load, 0x55008003a4, load, 0x55008003a5, load, 0x55008003a6, load, 0x55008003a7, load, 0x55008003a8, load, 0x55008003a9, load, 0x55008003aa, load, 0x55008003ab, load, 0x55008003ac, load, 0x55008003ad, load, 0x55008003ae, load, 0x55008003af
(4007c0 is the ld1b in the sha512-sve)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Robert Henry <robhenry@microsoft.com>
Cc: Aaron Lindsay <aaron@os.amperecomputing.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-20-alex.bennee@linaro.org>
The ptw code is accessed by non-TCG code (specifically arm_pamax and
arm_cpu_get_phys_page_attrs_debug) but most of it is really only for
TCG emulation. Seeing as we already assert for a non TARGET_AARCH64
build lets extend the test rather than further messing with the ifdef
ladder.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-19-alex.bennee@linaro.org>
The test_arm_bpim2u_gmac test sometimes fails (ca. 1 out of 20 runs
here) since the disk shows up as /dev/mmcblk1 instead of /dev/mmcblk0
in some runs. No matter of the name in /dev, the major:minor encoding
seems always to be the same, so we can fix this issue by using the
correct major:minor hex number in the "root=" parameter instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230630161604.446394-1-thuth@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-18-alex.bennee@linaro.org>
We still need to base this on Debian Sid until riscv64 is promoted to
a release architecture (or another distro provides a full cross
compile target). We use the new qemu-minimal project description to
avoid bringing in all the extra dependencies because every extra
package is another chance for sid to fail.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-16-alex.bennee@linaro.org>
Running the fuzzer requires some hoop jumping and some problems only
show up in containers. This basically replicates the build-oss-fuzz
job from our CI so we can run in the same containers we use in CI.
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-10-alex.bennee@linaro.org>
When updating to the latest fedora the santizer found more leaks
inside xkbmap:
FAILED: pc-bios/keymaps/ar
/builds/stsquad/qemu/build-oss-fuzz/qemu-keymap -f pc-bios/keymaps/ar -l ara
=================================================================
==3604==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1424 byte(s) in 1 object(s) allocated from:
#0 0x56316418ebec in __interceptor_calloc (/builds/stsquad/qemu/build-oss-fuzz/qemu-keymap+0x127bec) (BuildId: a2ad9da3190962acaa010fa8f44a9269f9081e1c)
#1 0x7f60d4dc067e (/lib64/libxkbcommon.so.0+0x1c67e) (BuildId: b243a34e4e58e6a30b93771c256268b114d34b80)
#2 0x7f60d4dc2137 in xkb_keymap_new_from_names (/lib64/libxkbcommon.so.0+0x1e137) (BuildId: b243a34e4e58e6a30b93771c256268b114d34b80)
#3 0x5631641ca50f in main /builds/stsquad/qemu/build-oss-fuzz/../qemu-keymap.c:215:11
and many more. As we can't do anything about the library add a
suppression to keep the CI going with what its meant to be doing.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-8-alex.bennee@linaro.org>
We can return XKB_MOD_INVALID for AltGr which rightly gets flagged by
sanitisers as an overly wide shift attempt. Properly check the return
type and leave the bitmap as zero in that case. Tested output before
and after is unchanged with the gb and ara keymaps.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-7-alex.bennee@linaro.org>
We recently missed a regression that should have been picked up by
check-tcg. This was because the libmem plugin is effectively a NOP if
the user doesn't specify the type to use.
Rather than changing the default behaviour add an additional expansion
so we can take this into account in future.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-6-alex.bennee@linaro.org>
When new dependencies and packages are added to containers, its important to
run CI container generation pipelines on gitlab to make sure that there are no
obvious conflicts between packages that are being added and those that are
already present. Running CI container pipelines will make sure that there are
no such breakages before we commit the change updating the containers. Add a
line in the documentation reminding developers to run the pipeline before
submitting the change. It will also ease the life of the maintainers.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230506072012.10350-1-anisinha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-5-alex.bennee@linaro.org>
If not set explicitly, gitlab assumes 'when: on_success" as the
publishing criteria for artifacts. This is reasonable if the
artifact is an output deliverable of the job. This is useless
if the artifact is a log file to be used for debugging job
failures.
This change makes the desired criteria explicit for every job
that publishes artifacts.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230503145535.91325-2-berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-2-alex.bennee@linaro.org>
dbus: Two hot fixes, per request of Marc-André Lureau
accel/tcg: Fix tb_invalidate_phys_range iteration
fpu: Add float64_to_int{32,64}_modulo
tcg: Reduce scope of tcg_assert_listed_vecop
target/nios2: Explicitly ask for target-endian loads
linux-user: Avoid mmap of the last byte of the reserved_va
# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmSfzXwdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+GMAgAicMA7dZEUNiKT1co
# pwQNF/aQehs3a+UYcHFZRQWjwNsXzDrPRTAyBkDFrzR2ILxKlpPw2JBRiqrr9pqj
# YWit0pHVv/OAYfSEzcqUaIeWyAh2xlAT4IbSz+sLcPBdPgUwm3z0Y7mTz3kUAkB2
# gXO/iuoD8ORwgSnFvH+FSws16kr1x/8cAaObY7BupUhS7hK8M9zsCehhk6ssxv7+
# EpR0kDIeoC2kjJLvQAoGW4DPzfmAvVmI/OiJKpqrAlTJIeAkngalSuaxj/t9Dte6
# zy4h8JW5VbHw3qLxTvg42/Pk4AiweBh38hpUfLQ2cprO7dy+T9qS2v8CGnMzrmeB
# kzlIMg==
# =a7vA
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 01 Jul 2023 08:53:48 AM CEST
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* tag 'pull-tcg-20230701' of https://gitlab.com/rth7680/qemu:
linux-user: Avoid mmap of the last byte of the reserved_va
target/nios2 : Explicitly ask for target-endian loads and stores
tcg: Reduce tcg_assert_listed_vecop() scope
target/arm: Use float64_to_int32_modulo for FJCVTZS
target/alpha: Use float64_to_int64_modulo for CVTTQ
tests/tcg/alpha: Add test for cvttq
fpu: Add float64_to_int{32,64}_modulo
accel/tcg: Assert one page in tb_invalidate_phys_page_range__locked
accel/tcg: Fix start page passed to tb_invalidate_phys_page_range__locked
audio: dbus requires pixman
ui/dbus: fix build errors in dbus_update_gl_cb and dbus_call_update_gl
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
vfio queue:
* migration: New switchover ack to reduce downtime
* VFIO migration pre-copy support
* Removal of the VFIO migration experimental flag
* Alternate offset for GPUDirect Cliques
* Misc fixes
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmSeVHYACgkQUaNDx8/7
# 7KHeZw/+LRe9QQpx8hU//vKBvLet2QvI3WUaXGHiHbblbRT6HhiHjWHB2/8j6jji
# QhAGJ6w9yoKODyY0kGpVFEnkmXOKyqwWssBheV219ntZs09pFGxZr/ldUhT22aBN
# kH8mHU9BZ3J+zF/kKphpcIC1sPxVu/DlrtnJu5vDGuRAOu8+3kFV217JC1yGs1Vh
# n+KOho8a8oP9qxtzfvQ9iZ4dpBOOKpE9vscS12wJAlen93AGB6esR7VaLxDjExRP
# yL1pguQ8ZZ1gEXXbXO62djKo3IViobtD08KmCXTzQ6TVquLleJzqgjp+A0THnYAe
# J9Rlja7LpsO9MYSxmRE9WcQccC+sAGn/t/ufB0tL8zR43FvfhbF5H0PzBBY0H7YA
# JlzN+fgrKEEHJwMhXANNvSddhWCwvrkjNxo/80u3ySYMQR1Hav/tsXYBlk16e5nS
# fmtrFGTwhsVdy1Q6ZqEOyTni1eiYt5stEQMZFODdUNj6b9FugSZ0BK+2WN/M0CzU
# 6mKmJQgZAG/nBoRJm/XCO5OKQ6wm/4tm6F4HSH5EJ6mDT+DqETAk4GRUWTbYa2/G
# yAAOlhTMu8Xc/NhMeJ7Z99dyq0SM8pi/XpVEIv7p9yBak8ix60iCWZtDE8vlDv3M
# UfMVMTAvTS30kbS6FDN2Yyl6l8/ETdcwVIN4l02ipGzpMCtn9EQ=
# =dKUj
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 30 Jun 2023 06:05:10 AM CEST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20230630' of https://github.com/legoater/qemu:
vfio/pci: Free leaked timer in vfio_realize error path
vfio/pci: Fix a segfault in vfio_realize
MAINTAINERS: Promote Cédric to VFIO co-maintainer
vfio/migration: Make VFIO migration non-experimental
vfio/migration: Reset bytes_transferred properly
vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path
hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
vfio: Implement a common device info helper
vfio/migration: Add support for switchover ack capability
vfio/migration: Add VFIO migration pre-copy support
vfio/migration: Store VFIO migration flags in VFIOMigration
vfio/migration: Refactor vfio_save_block() to return saved data size
tests: Add migration switchover ack capability test
migration: Enable switchover ack capability
migration: Implement switchover ack logic
migration: Add switchover ack capability
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Fix a compilation issue in the s390-ccw bios with Clang + binutils 2.40
* Create an initial stack frame for the main() function of the s390-ccw bios
* Clean up type definitions in the s390-ccw bios
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmSd1MwRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbUNAg//aO7pkzKPIUXG/g8PSzzgjYu9bDTketrQ
# P08wk1jj9CQMLN6dcnVnmzPhC4EqyrZqMYvRH4qFPLJmi0m+Jq3fEEkVzKbI3baO
# 0qQX6DNJVLn6qcgvZ8+ZjkLmuWn/lN4+MH92vdUgpkCcj5y7FB4FjoaG+Z0yZxsS
# YI6gG8D/i6fnq0zsKGMzmzHCswmN4s9qnY9a4nLV0YeMnrZJjUmUUKomWv0FP5jM
# qtLf6pRtgR4u/WD9ktwjISlOn7AKQeCYgZcMu1kBnrSWDjhLytUrv8h2JqRxGOap
# nRtdFzTvgeWKJbCX9v+XLb1bqzFj/LLgoCRzUOqV1CdBKf3JycIXyLMpTJ1+kV4J
# NnzCjnfq/LSDwwCjeg3cRBUFjGkuHBZwQzBh5m4xXBqae07UhMGpWBmhIh7qgPy2
# RXox0xK8Ot/vhYxtNojOiEW0Wp4KJElB9Wxn1Vz0kX4OXRcxHu9CDazZXTKBuBGA
# YWZ9HbsquvwNMV5pgCuXzVWW3FCzrhGgtVYREwYyBIInJaEGCWKCyMAuDXb4fkWL
# eS0Mryp3AMaJ6CidK2ELWygMkKA8xDF8pKm5jgQWRhs5jirydi1B4hPeGFsm1vUI
# TYs08XuC9p66O2Ffn2Sc/uAXbe/FQ7Ce6EbGUUetpafo9FxPhbP28hPUhkcHt68Y
# tmGzqAuwgxc=
# =oWSq
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 Jun 2023 09:00:28 PM CEST
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [undefined]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [undefined]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2023-06-29' of https://gitlab.com/thuth/qemu:
pc-bios: Update the s390 bios images with the recent changes
pc-bios/s390-ccw: Don't use __bss_start with the "larl" instruction
pc-bios/s390-ccw: Move the stack array into start.S
pc-bios/s390-ccw: Provide space for initial stack frame in start.S
pc-bios/s390-ccw: Fix indentation in start.S
pc-bios/s390-ccw/Makefile: Use -z noexecstack to silence linker warning
pc-bios/s390-ccw: Get rid of the the __u* types
s390-ccw: Getting rid of ulong
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When vfio_realize fails, the mmap_timer used for INTx optimization
isn't freed. As this timer isn't activated yet, the potential impact
is just a piece of leaked memory.
Fixes: ea486926b0 ("vfio-pci: Update slow path INTx algorithm timer related")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The kvm irqchip notifier is only registered if the device supports
INTx, however it's unconditionally removed in vfio realize error
path. If the assigned device does not support INTx, this will cause
QEMU to crash when vfio realize fails. Change it to conditionally
remove the notifier only if the notify hook is setup.
Before fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Connection closed by foreign host.
After fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Error: vfio 0000:81:11.1: xres and yres properties require display=on
(qemu)
Fixes: c5478fea27 ("vfio/pci: Respond to KVM irqchip change notifier")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Cédric has stepped up involvement in vfio, reviewing and managing
patches, as well as pull requests. This work deserves gratitude and
punishment with a promotion to co-maintainer ;)
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The major parts of VFIO migration are supported today in QEMU. This
includes basic VFIO migration, device dirty page tracking and precopy
support.
Thus, at this point in time, it seems appropriate to make VFIO migration
non-experimental: remove the x prefix from enable_migration property,
change it to ON_OFF_AUTO and let the default value be AUTO.
In addition, make the following adjustments:
1. When enable_migration is ON and migration is not supported, fail VFIO
device realization.
2. When enable_migration is AUTO (i.e., not explicitly enabled), require
device dirty tracking support. This is because device dirty tracking
is currently the only method to do dirty page tracking, which is
essential for migrating in a reasonable downtime. Setting
enable_migration to ON will not require device dirty tracking.
3. Make migration error and blocker messages more elaborate.
4. Remove error prints in vfio_migration_query_flags().
5. Rename trace_vfio_migration_probe() to
trace_vfio_migration_realize().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Currently, VFIO bytes_transferred is not reset properly:
1. bytes_transferred is not reset after a VM snapshot (so a migration
following a snapshot will report incorrect value).
2. bytes_transferred is a single counter for all VFIO devices, however
upon migration failure it is reset multiple times, by each VFIO
device.
Fix it by introducing a new function vfio_reset_bytes_transferred() and
calling it during migration and snapshot start.
Remove existing bytes_transferred reset in VFIO migration state
notifier, which is not needed anymore.
Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When vfio_enable_vectors() returns with less than requested nr_vectors
we retry with what kernel reported back. But the retry path doesn't
call vfio_prepare_kvm_msi_virq_batch() and this results in,
qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1
qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed
Fixes: dc580d51f7 ("vfio: defer to commit kvm irq routing when enable msi/msix")
Reviewed-by: Longpeng <longpeng2@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
previously reserved for use by hypervisors to implement the GPUDirect
Cliques capability. A revised specification provides an alternate
location. Add a config space walk to the quirk to check for conflicts,
allowing us to fall back to the new location or generate an error at the
quirk setup rather than when the real conflicting capability is added
should there be no available location.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Loading of a VFIO device's data can take a substantial amount of time as
the device may need to allocate resources, prepare internal data
structures, etc. This can increase migration downtime, especially for
VFIO devices with a lot of resources.
To solve this, VFIO migration uAPI defines "initial bytes" as part of
its precopy data stream. Initial bytes can be used in various ways to
improve VFIO migration performance. For example, it can be used to
transfer device metadata to pre-allocate resources in the destination.
However, for this to work we need to make sure that all initial bytes
are sent and loaded in the destination before the source VM is stopped.
Use migration switchover ack capability to make sure a VFIO device's
initial bytes are sent and loaded in the destination before the source
stops the VM and attempts to complete the migration.
This can significantly reduce migration downtime for some devices.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Pre-copy support allows the VFIO device data to be transferred while the
VM is running. This helps to accommodate VFIO devices that have a large
amount of data that needs to be transferred, and it can reduce migration
downtime.
Pre-copy support is optional in VFIO migration protocol v2.
Implement pre-copy of VFIO migration protocol v2 and use it for devices
that support it. Full description of it can be found in the following
Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol
with PRE_COPY").
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
VFIO migration flags are queried once in vfio_migration_init(). Store
them in VFIOMigration so they can be used later to check the device's
migration capabilities without re-querying them.
This will be used in the next patch to check if the device supports
precopy migration.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Refactor vfio_save_block() to return the size of saved data on success
and -errno on error.
This will be used in next patch to implement VFIO migration pre-copy
support.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add migration switchover ack capability test. The test runs without
devices that support this capability, but is still useful to make sure
it didn't break anything.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Implement switchover ack logic. This prevents the source from stopping
the VM and completing the migration until an ACK is received from the
destination that it's OK to do so.
To achieve this, a new SaveVMHandlers handler switchover_ack_needed()
and a new return path message MIG_RP_MSG_SWITCHOVER_ACK are added.
The switchover_ack_needed() handler is called during migration setup in
the destination to check if switchover ack is used by the migrated
device.
When switchover is approved by all migrated devices in the destination
that support this capability, the MIG_RP_MSG_SWITCHOVER_ACK return path
message is sent to the source to notify it that it's OK to do
switchover.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Migration downtime estimation is calculated based on bandwidth and
remaining migration data. This assumes that loading of migration data in
the destination takes a negligible amount of time and that downtime
depends only on network speed.
While this may be true for RAM, it's not necessarily true for other
migrated devices. For example, loading the data of a VFIO device in the
destination might require from the device to allocate resources, prepare
internal data structures and so on. These operations can take a
significant amount of time which can increase migration downtime.
This patch adds a new capability "switchover ack" that prevents the
source from stopping the VM and completing the migration until an ACK
is received from the destination that it's OK to do so.
This can be used by migrated devices in various ways to reduce downtime.
For example, a device can send initial precopy metadata to pre-allocate
resources in the destination and use this capability to make sure that
the pre-allocation is completed before the source VM is stopped, so it
will have full effect.
This new capability relies on the return path capability to communicate
from the destination back to the source.
The actual implementation of the capability will be added in the
following patches.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The startup code of the bios has slightly been changed, apart
from that, there should not be any functional changes this time.
Signed-off-by: Thomas Huth <thuth@redhat.com>
start.S currently cannot be compiled with Clang 16 and binutils 2.40:
ld: start.o(.text+0x8): misaligned symbol `__bss_start' (0xc1e5) for
relocation R_390_PC32DBL
According to the built-in linker script of ld, the symbol __bss_start
can actually point *before* the .bss section and does not need to have
any alignment, so in certain situations (like when using the internal
assembler of Clang), the __bss_start symbol can indeed be unaligned
and thus it is not suitable for being used with the "larl" instruction
that needs an address that is at least aligned to halfwords.
The problem went unnoticed so far since binutils <= 2.39 did not
check the alignment, but starting with binutils 2.40, such unaligned
addresses are now refused.
Fix it by loading the address indirectly instead.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2216662
Reported-by: Miroslav Rezanina <mrezanin@redhat.com>
Suggested-by: Andreas Krebbel <andreas.krebbel@de.ibm.com>
Message-Id: <20230629104821.194859-8-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The stack array is only referenced from the start-up code (which is
shared between the s390-ccw.img and the s390-netboot.img), but it is
currently declared twice, once in main.c and once in netmain.c.
It makes more sense to declare this in start.S instead - which will
also be helpful in the next patch, since we need to mention the .bss
section in start.S in that patch.
While we're at it, let's also drop the huge alignment of the stack,
since there is no technical requirement for aligning it to page
boundaries.
Message-Id: <20230627074703.99608-4-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Providing the space of a stack frame is the duty of the caller,
so we should reserve 160 bytes before jumping into the main function.
Otherwise the main() function might write past the stack array.
While we're at it, add a proper STACK_SIZE macro for the stack size
instead of using magic numbers (this is also required for the following
patch).
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20230627074703.99608-3-thuth@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
start.S is currently indented with a mixture of spaces and tabs, which
is quite ugly. QEMU coding style says indentation should be 4 spaces,
and this is also what we are using in the assembler files in the
tests/tcg/s390x/ folder already, so let's adjust start.S accordingly.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20230627074703.99608-2-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Recent versions of ld complain when linking the s390-ccw bios:
/usr/bin/ld: warning: start.o: missing .note.GNU-stack section implies
executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in
a future version of the linker
We can silence the warning by telling the linker to mark the stack
as not executable.
Message-Id: <20230622130822.396793-1-thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* Make named CPU models usable for qemu-{i386,x86_64}
* Fix backwards time with -icount auto
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSdRiQUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroOqcwf9FGAqZ+0V34Y8XeXMu8Es3bFjEKG8
# t3BpVNhTBOYDPvpshnPVx2I29nRT2opc1C4YkjMAv5/1nivj1kDM7hDObOSJQvqy
# 5FgTsJYqRtGj+J7uVBrspWZsP8BYeykKmXR6deBOPvCuw5nnLdDQ3dLV2F26lKUu
# lsFyEVbi4dzf8+TVuNIXEg7mVBYytjBQwBmmHgeOofeikjq9WEudr49mwJMCHyzl
# iXCatnctXGKZYSnp+eHIBiFRdSzjqdgrDRa0ysSqABoBI1pmkhyQKSay6cSjfG4n
# gFlqPF/i9RqAWpsQrM1IMGgPK39SrT2dYlHDJV2P/NEQrS6kLh2HoW/ArQ==
# =oj3B
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 Jun 2023 10:51:48 AM CEST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
target/i386: emulate 64-bit ring 0 for linux-user if LM feature is set
target/i386: ignore CPL0-specific features in user mode emulation
target/i386: ignore ARCH_CAPABILITIES features in user mode emulation
target/i386: Export MSR_ARCH_CAPABILITIES bits to guests
icount: don't adjust virtual time backwards after warp
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
32-bit binaries can run on a long mode processor even if the kernel
is 64-bit, of course, and this can have slightly different behavior;
for example, SYSCALL is allowed on Intel processors.
Allow reporting LM to programs running under user mode emulation,
so that "-cpu" can be used with named CPU models even for qemu-i386
and even without disabling LM by hand.
Fortunately, most of the runtime code in QEMU has to depend on HF_LMA_MASK
or on HF_CS64_MASK (which is anyway false for qemu-i386's 32-bit code
segment) rather than TARGET_X86_64, therefore all that is needed is an
update of linux-user's ring 0 setup.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1534
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Features such as PCID are only accessible through privileged operations,
and therefore have no impact on any user-mode operation. Allow reporting
them to programs running under user mode emulation, so that "-cpu" can be
used with more named CPU models.
XSAVES would be similar, but it doesn't make sense to provide it until
XSAVEC is implemented.
With this change, all CPUs up to Broadwell-v4 can be emulate. Skylake-Client
requires XSAVEC, while EPYC also requires SHA-NI, MISALIGNSSE and TOPOEXT.
MISALIGNSSE is not hard to implement, but I am not sure it is worth using
a precious hflags bit for it.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1534
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ARCH_CAPABILITIES is only accessible through a read-only MSR, so it has
no impact on any user-mode operation (user-mode cannot read the MSR).
So do not bother printing warnings about it in user mode emulation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Block layer patches
- Re-enable the graph lock
- More fixes to coroutine_fn marking
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI
# Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q
# o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0
# QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk
# Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD
# Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca
# qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq
# HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa
# wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8
# xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo
# VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX
# e9QrPhL3DXg=
# =ZQcP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 04:13:56 PM CEST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (23 commits)
block: use bdrv_co_debug_event in coroutine context
block: use bdrv_co_getlength in coroutine context
qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
block: mark another function as coroutine_fns and GRAPH_UNLOCKED
bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
file-posix: remove incorrect coroutine_fn calls
Revert "graph-lock: Disable locking for now"
graph-lock: Unlock the AioContext while polling
blockjob: Fix AioContext locking in block_job_add_bdrv()
block: Fix AioContext locking in bdrv_open_backing_file()
block: Fix AioContext locking in bdrv_open_inherit()
block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing()
block: Fix AioContext locking in bdrv_attach_child_common()
block: Fix AioContext locking in bdrv_open_child()
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
qemu-sparc queue
# -----BEGIN PGP SIGNATURE-----
#
# iQFSBAABCgA8FiEEzGIauY6CIA2RXMnEW8LFb64PMh8FAmScHBkeHG1hcmsuY2F2
# ZS1heWxhbmRAaWxhbmRlLmNvLnVrAAoJEFvCxW+uDzIfuZ8H/3KjLLCaGcO3jnus
# P/ky3wGYx9aah/iNfRDgaaGRkPX18Eabq0BidUt/DN28yQmKgnOcbCwHlIt4QdCt
# PeO9hRNLpCop63LwyQQTrSZEdVZP75CX6dRcN+6h5TsY66/ESZjBsivuJGVHIU6O
# L8zJv2KKg0SKtJHsPGkUppmfyM4btmGTerqSJHv1SJfy4DJdzRMF83/WOZtE5srm
# YvpgZsiztBpHbG/+jLn2mX7iaQiZQCCs+weU0ynszr5WENAnuJderjO+mo0DZkqD
# j+R6LMcHHj6I4uP68eJowdTezOpoZNROh/gdUozCweA1AC/8RotkJa9UcBeEplY/
# +wV8mts=
# =ga0/
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 01:40:09 PM CEST
# gpg: using RSA key CC621AB98E82200D915CC9C45BC2C56FAE0F321F
# gpg: issuer "mark.cave-ayland@ilande.co.uk"
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* tag 'qemu-sparc-20230628' of https://github.com/mcayland/qemu:
escc: emulate dip switch language layout settings on SUN keyboard
target/sparc: Use tcg_gen_lookup_and_goto_ptr for v9 WRASI
target/sparc: Use DYNAMIC_PC_LOOKUP for v9 RETURN
target/sparc: Use DYNAMIC_PC_LOOKUP for JMPL
target/sparc: Use DYNAMIC_PC_LOOKUP for conditional branches
target/sparc: Introduce DYNAMIC_PC_LOOKUP
target/sparc: Drop inline markers from translate.c
target/sparc: Fix npc comparison in sparc_tr_insn_start
target/sparc: Use tcg_gen_lookup_and_goto_ptr in gen_goto_tb
Revert "hw/sparc64/niagara: Use blk_name() instead of open-coding it"
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
virtio: regression fix
A regression was introduced in the last pull request. Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmScH0QPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpEPUH/1s424Aerch82tdps+qIhuclf9Jq47oo7Q/Y
# JVeizUsFLtE0Wwmfyna1rIbaILM//Akcq8Y0Ny+GHtYA8NdIaAQfue87uy+k8qbc
# qFXbimZEzjZp7CAC+6tUiv8UDaYF7I9giImZnHkkbPDz22ACQQCzV6nTogoc1pzg
# BkLxbWjYUdSTT8l1h/H7XwGWKsKZ9RUGxxAOpKqdK3NElmy+1I1eeUvhnLZwAc3i
# 9HUMOg2JQBhky0jjkrDHQcyopxlHNBrz7D6/sZKOyua627DgRS1BOAM9h2u2F3rq
# +6Hv258g48764Hl0SYEKCBULI+CrgtpcS/aq8sLW6Tm7Cw2k/N0=
# =y9dL
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 01:53:40 PM CEST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
net/vhost-net: do not assert on null pointer return from tap_get_vhost_net()
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
003f230e37 ("machine: Tweak the order of topology members in struct
CpuTopology") changes the meaning of MachineState.smp.cores from "the
number of cores in one package" to "the number of cores in one die"
and doesn't fix other uses of MachineState.smp.cores. And because of
the introduction of cluster, now smp.cores just means "the number of
cores in one cluster". This clearly does not fit the semantics here.
And before this error message, WHvSetPartitionProperty() is called to
set prop.ProcessorCount.
So the error message should show the prop.ProcessorCount other than
"cores per cluster" or "cores per package".
Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230529124331.412822-1-zhao1.liu@linux.intel.com>
[PMD: Use '%u' format for ProcessorCount]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
These fields shouldn't be accessed when KVM is not available.
Restrict the KVM timer migration state. Rename the KVM timer
post_load() handler accordingly, because cpu_post_load() is
too generic.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230626232007.8933-3-philmd@linaro.org>
The 'kvm_sw_tlb' and 'tlb_dirty' fields introduced in commit
93dd5e852c ("kvm: ppc: booke206: use MMU API") are specific
to KVM and shouldn't be accessed when it is not available.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20230624192645.13680-1-philmd@linaro.org>
"sysemu/kvm.h" is indirectly pulled in. Explicit its
inclusion to avoid when refactoring include/:
hw/arm/sbsa-ref.c:693:9: error: implicit declaration of function 'kvm_enabled' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (kvm_enabled()) {
^
Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230405160454.97436-6-philmd@linaro.org>
"hw/core/cpu.h" defines 'first_cpu' as QTAILQ_FIRST_RCU(&cpus).
arm_gic_common_reset_irq_state() calls its second argument
'first_cpu', producing a build failure when "hw/core/cpu.h"
is included:
hw/intc/arm_gic_common.c:238:68: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
static inline void arm_gic_common_reset_irq_state(GICState *s, int first_cpu,
^
include/hw/core/cpu.h:451:26: note: expanded from macro 'first_cpu'
#define first_cpu QTAILQ_FIRST_RCU(&cpus)
^
KISS, rename the function argument.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230405160454.97436-5-philmd@linaro.org>
"kvm_arm.h" contains external and internal prototype declarations.
Files under the hw/ directory should only access the KVM external
API.
In order to avoid machine / device models to include "kvm_arm.h"
simply to get the QOM GIC/ITS class name, un-inline each class
name getter to the proper device model file.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230405160454.97436-4-philmd@linaro.org>
Avoid when calling kvm_direct_msi_enabled() from
arm_gicv3_its_common.c the next commit:
Undefined symbols for architecture arm64:
"_kvm_direct_msi_allowed", referenced from:
_its_class_name in hw_intc_arm_gicv3_its_common.c.o
ld: symbol(s) not found for architecture arm64
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230405160454.97436-3-philmd@linaro.org>
Commit 1e05888ab5 ("sysemu/kvm: Remove unused headers") was
a bit overzealous while cleaning "sysemu/kvm.h" headers:
kvm_arch_post_run() returns a MemTxAttrs type, so depends on
"exec/memattrs.h" for its definition.
Fixes: 1e05888ab5 ("sysemu/kvm: Remove unused headers")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230619074153.44268-5-philmd@linaro.org>
We want all accelerators to share the same opaque pointer in
CPUState.
Rename the 'hvf_vcpu_state' structure as 'AccelCPUState'.
Use the generic 'accel' field of CPUState instead of 'hvf'.
Replace g_malloc0() by g_new0() for readability.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230624174121.11508-17-philmd@linaro.org>
We want all accelerators to share the same opaque pointer in
CPUState. Start with the HAX context, renaming its forward
declarated structure 'hax_vcpu_state' as 'AccelCPUState'.
Document the CPUState field. Directly use the typedef.
Remove the amusing but now unnecessary casts in NVMM / WHPX.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230624174121.11508-8-philmd@linaro.org>
When the vCPU thread finished its processing, destroy
it and signal its destruction to generic vCPU management
layer.
Add a sanity check for the vCPU accelerator context.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230624174121.11508-6-philmd@linaro.org>
Since MinGW commit 395dcfdea ("rename hyper-v headers and def
files to lower case") [*], WinHvPlatform.h and WinHvEmulation.h
got respectively renamed as winhvplatform.h / winhvemulation.h.
The mingw64-headers package included in the Fedora version we
use for CI does include this commit; and meson fails to detect
these present-but-renamed headers while cross-building (on
case-sensitive filesystems).
Use the renamed header in order to detect and successfully
cross-build with the WHPX accelerator.
Note, on Windows hosts, the libraries are still named as
WinHvPlatform.dll and WinHvEmulation.dll, so we don't bother
renaming the definitions used by load_whp_dispatch_fns() in
target/i386/whpx/whpx-all.c.
[*] https://sourceforge.net/p/mingw-w64/mingw-w64/ci/395dcfdea
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230624142211.8888-3-philmd@linaro.org>
Since commit 93cc0506f6 ("tests/docker: Use Fedora containers
for MinGW cross-builds in the gitlab-CI") the MinGW toolchain
is packaged inside the fedora-win[32/64]-cross images.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230624142211.8888-2-philmd@linaro.org>
When 'vhost=off' or no vhost specific options at all are passed for the tap
net-device backend, tap_get_vhost_net() can return NULL. The function
net_init_tap_one() does not call vhost_net_init() on such cases and therefore
vhost_net pointer within the tap device state structure remains NULL. Hence,
assertion here on a NULL pointer return from tap_get_vhost_net() would not be
correct. Remove it and fix the crash generated by qemu upon initialization in
the following call chain :
qdev_realize() -> pci_qdev_realize() -> virtio_device_realize() ->
virtio_bus_device_plugged() -> virtio_net_get_features() -> get_vhost_net()
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Fixes: 0e994668d0 ("vhost_net: add an assertion for TAP client backends")
Reported-by: Cédric Le Goater <clg@redhat.com>
Report: <abab7a71-216d-b103-fa47-70bdf9dc0080@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230628112804.36676-1-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SUN Type 4, 5 and 5c keyboards have dip switches to choose the language layout
of the keyboard. Solaris makes an ioctl to query the value of the dipswitches
and uses that value to select keyboard layout. Also the SUN bios like the one
in the file ss5.bin uses this value to support at least some keyboard layouts.
However, the OpenBIOS provided with qemu is hardcoded to always use an US
keyboard layout.
Before this patch, qemu allways gave dip switch value 0x21 (US keyboard),
this patch uses a command line switch like
"-global escc.chnA-sunkbd-layout=de" to select dip switch value. A table is
used to lookup values from arguments like:
-global escc.chnA-sunkbd-layout=fr
-global escc.chnA-sunkbd-layout=es
But the patch also accepts numeric dip switch values directly:
-global escc.chnA-sunkbd-layout=0x2b
-global escc.chnA-sunkbd-layout=43
Both values above are the same and select swedish keyboard as explained in
table 3-15 at
https://docs.oracle.com/cd/E19683-01/806-6642/new-43/index.html
Unless you want to do a full Solaris installation but happen to have
access to a Sun bios file, the easiest way to test that the patch works
is to:
qemu-system-sparc -global escc.chnA-sunkbd-layout=sv -bios /path/to/ss5.bin
If you already happen to have a Solaris installation in a qemu disk image
file you can easily try different keyboard layouts after this patch is
applied.
Signed-off-by: Henrik Carlqvist <hc1245@poolhem.se>
Message-Id: <20230623203007.56d3d182.hc981@poolhem.se>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[MCA edit: update unsigned char to uint8_t, fix spacing issues]
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
We incorporate %asi into tb->flags so that we may generate
inline code for the many ASIs for which it is easy to do so.
Setting %asi is common for e.g. memcpy and memset performing
block copy and clear, so it is worth noticing this case.
We must end the TB but do not need to return to the main loop.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230628071202.230991-9-richard.henderson@linaro.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
hw/nvme updates
Small set of fixes and some updates for the FDP support.
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmSb/D4ACgkQTeGvMW1P
# DemziAf/eQfjnVr57A+Kglf8J15MCW0GiArbHCJfcl9vf0HPP/iY1c9V4cCZjTLG
# vkkkU6W+TFaYALGOVgAldHWC7OCpOi7GHrlqRJDuw86d2dyLDn/l+GQin/rVoocD
# fzF2gRVQU4x9qzmjRUikVhRzZbrB4F/AH6QQ8EV3wx2wrljyusItEGe53FEuCugx
# pwtKrG990188+UCT1ofr2JYhLq3OmYQi3o2fWgzMp9jP+NeROgKaevWG4UEhFonG
# CdeL9BMlSRAfrdR1gTvZpG2mFsrroeBCCjXcrKSwkAxBqpMJDSLvbGqoGJo6kDWm
# c9x82Zy2/wVuQaDk+atmcTF1+Pddgw==
# =//ks
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 11:24:14 AM CEST
# gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9
# gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown]
# gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838
# Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9
* tag 'nvme-next-pull-request' of https://gitlab.com/birkelund/qemu:
docs: update hw/nvme documentation for TP4146
hw/nvme: add placement handle list ranges
hw/nvme: verify uniqueness of reclaim unit handle identifiers
hw/nvme: fix verification of number of ruhis
hw/nvme: check maximum copy length (MCL) for COPY
hw/nvme: consider COPY command in nvme_aio_err
hw/nvme: add comment for nvme-ns properties
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allow the placement handles to be specified as ranges, i.e.
`fdp.ruhs=1:3-5` will attempt to assign ruh 1, 3, 4 and 5 to the
namespace.
Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Verify that a reclaim unit handle identifier is only specified once in
fdp.ruhs.
Fixes: 73064edfb8 ("hw/nvme: flexible data placement emulation")
Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Fix a off-by-one error when verifying the number of reclaim unit handle
identifiers specified in fdp.ruhs. To make the fix nicer, move the
verification of the fdp.nruh parameter to an earlier point.
Fixes: 73064edfb8 ("hw/nvme: flexible data placement emulation")
Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
MCL(Maximum Copy Length) in the Identify Namespace data structure limits
the number of LBAs to be copied inside of the controller. We've not
checked it at all, so added the check with returning the proper error
status.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If we don't have NVME_CMD_COPY consideration in the switch statement in
nvme_aio_err(), it will go to have NVME_INTERNAL_DEV_ERROR and
`req->status` will be ovewritten to it. During the aio context, it
might set the NVMe status field like NVME_CMD_SIZE_LIMIT, but it's
overwritten in the nvme_aio_err().
Add consideration for the NVME_CMD_COPY not to overwrite the status at
the end of the function.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
bdrv_co_debug_event was recently introduced, with bdrv_debug_event
becoming a wrapper for use in unknown context. Because most of the
time bdrv_debug_event is used on a BdrvChild via the wrapper macro
BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls
bdrv_co_debug_event, and switch whenever possible.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-13-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-11-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-10-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-9-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-8-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-7-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-5-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-4-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend. Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-3-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a coroutine
function. This is harmless because raw_co_getlength does not actually suspend,
but in the interest of clarity make it a non-coroutine_fn that is just wrapped
by the coroutine_fn raw_co_getlength. Likewise, check_cache_dropped was only
a coroutine_fn because it called raw_co_getlength, so it can be made non-coroutine
as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-2-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that bdrv_graph_wrlock() temporarily drops the AioContext lock that
its caller holds, it can poll without causing deadlocks. We can now
re-enable graph locking.
This reverts commit ad128dff0bf4b6f971d05eb4335a627883a19c1d.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-12-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the caller keeps the AioContext lock for a block node in an iothread,
polling in bdrv_graph_wrlock() deadlocks if the condition isn't
fulfilled immediately.
Now that all callers make sure to actually have the AioContext locked
when they call bdrv_replace_child_noperm() like they should, we can
change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext
lock the caller holds (NULL if it doesn't) and unlock it temporarily
while polling.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open_inherit() calls several functions for which it needs to hold
the AioContext lock, but currently doesn't. This includes calls in
bdrv_append_temp_snapshot(), for which bdrv_open_inherit() is the only
caller. Fix the locking in these places.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-8-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_file_or_backing_noperm() requires the caller to hold the
AioContext lock for the child node, but we hold the one for the parent
node in bdrv_reopen_parse_file_or_backing(). Take the other one
temporarily.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-7-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function can move the child node to a different AioContext. In this
case, it also must take the AioContext lock for the new context before
calling functions that require the caller to hold the AioContext for the
child node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_attach_child() requires that the caller holds the AioContext lock
for the new child node. Take it in bdrv_open_child() and document that
the caller must not hold any AioContext apart from the main AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a better regression test for the bugs hidden by commit 80fc5d26
('graph-lock: Disable locking for now'). With that commit reverted, it
hangs instantaneously and reliably for me.
It is important to have a reliable test like this, because the following
commits will set out to fix the actual root cause of the deadlocks and
then finally revert commit 80fc5d26, which was only a stopgap solution.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
wef
# -----BEGIN PGP SIGNATURE-----
#
# iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmSa+6McHG1hcmNhbmRy
# ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5YxfD/46nwpTCTWWdKdTeo5p
# OUdrF0rfHFqu4FN3OWHbfRxZCO/AdHL+UZ52owV4+5bJJI5uOnXwYKpkrwKYBrFd
# H8Bll7NitzA41rw4AQa0GeaQYCPJ99OOfnhbRI5Aep2NG2DfX5PK4RWnfqYw8LD1
# TiHtRv2lWnX9EyMjnEh93C+n17OfquP5Ew3ozZNQJ0+SiJ3CvsUn6hEqxOA8OdyX
# lj6l00CASQA2BxW+zjXjJKvRakCV4gfdvrL9eMf4eu0UopzET7ombBJGPnYVsrDU
# /4R7b0JgGM4iOpXFxK4Ng6myP28vPdOEJAU/OJLH+oMRz1caohS+0Ijl2KviUCex
# SGpb9plxqI7fI2QQt+1CxAlXADSW7oV1zV0/tLkKl/n5+MF3HJ/5qR3tefLhYu1p
# 2LpfbPMKGQ9V3+5Z/UvWx6GQYP1iBRm5THPLn+HSDMSqLmt6yp5cOTwP3KTx1Zlc
# JfpBtekT2Cgs54nnCcfnXa6/EPo4uR7cMFzrgXdSacPz/GssMVa1c2mNUYkgYEYU
# PeyDWZG2Rt/70y+CFDPBpKWEQVICnf7Ha43oj4BtGTqqUFeuZClMTTtZ5poSg3ir
# FcRNJ5zSWg2KhHIQ9TQKxIAwrxxVBY0AiQleNRyDzx+YGAuBBadO6i5eCqqpGgOa
# QRVBsP33Pg/QD1JdxN9GSSEh0w==
# =cR6x
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 27 Jun 2023 05:09:23 PM CEST
# gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5
# gpg: issuer "marcandre.lureau@redhat.com"
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full]
# gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full]
* tag 'ui-pull-request' of https://gitlab.com/marcandre.lureau/qemu: (33 commits)
ui/dbus: use shared D3D11 Texture2D when possible
virtio-gpu-virgl: use D3D11_SHARE_TEXTURE when available
ui: add optional d3d texture pointer to scanout texture
ui/egl: query ANGLE d3d device
virtio-gpu-virgl: teach it to get the QEMU EGL display
ui/dbus: add some GL traces
ui/dbus: add GL support on win32
ui: add egl_fb_read_rect()
ui/egl: default to GLES on windows
ui: add egl-headless support on win32
ui/dbus: use shared memory when possible on win32
virtio-gpu/win32: allocate shareable 2d resources/images
console/win32: allocate shareable display surface
ui/dbus: introduce "Interfaces" properties
tests: make dbus-display-test work on win32
qtest: add qtest_pid()
ui/dbus: win32 support
scripts: add a XML preprocessor script
ui/dbus: compile without gio/gunixfdlist.h
ui/egl: fix make_context_current() callback return value
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When the client implements "org.qemu.Display1.Listener.Win32.D3d11" and
we are running on ANGLE/win32, share the scanout texture with the peer
process, and draw with ScanoutTexture2d/UpdateTexture2d methods.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-22-marcandre.lureau@redhat.com>
Enable usage of dbus,gl= on win32. At this point, the scanout texture is
read on the DisplaySurface memory, and the client is then updated with
the "2D" API (with shared memory if possible).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-16-marcandre.lureau@redhat.com>
Windows GL drivers are notoriously not very good. Otoh, ANGLE provides
rock solid GLES implementation on top of direct3d. We should recommend
it and default to ES when using EGL (users can easily override this if
necessary)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-14-marcandre.lureau@redhat.com>
When the display surface has an associated HANDLE, we can duplicate it
to the client process and let it map the memory to avoid expensive copies.
Introduce two new win32-specific methods ScanoutMap and UpdateMap. The
first is used to inform the listener about the a shared map
availability, and the second for display updates.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-12-marcandre.lureau@redhat.com>
Allocate pixman bits for scanouts with qemu_win32_map_alloc() so we can
set a shareable handle on the associated display surface.
Note: when bits are provided to pixman_image_create_bits(), you must also give
the rowstride (the argument is ignored when bits is NULL)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-11-marcandre.lureau@redhat.com>
Introduce qemu_win32_map_alloc() and qemu_win32_map_free() to allocate
shared memory mapping. The handle can be used to share the mapping with
another process.
Teach qemu_create_displaysurface() to allocate shared memory. Following
patches will introduce other places for shared memory allocation.
Other patches for -display dbus will share the memory when possible with
the client, to avoid expensive memory copy between the processes.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-10-marcandre.lureau@redhat.com>
This property is similar to ``org.freedesktop.DBus.Interfaces`` property
on the bus interface: it's an array of strings listing the extra
interfaces and capabilities available, in a convenient way.
Most interfaces are implicit, as they are required. For
``org/qemu/Display1_$id``, we can list the Keyboard And Mouse
interfaces. Those could be optional.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-9-marcandre.lureau@redhat.com>
D-Bus doesn't support fd-passing on Windows (AF_UNIX doesn't have
SCM_RIGHTS yet, but there are other means to share objects. I have
proposed various solutions upstream, but none seem fitting enough atm).
To make the "-display dbus" work on Windows, implement an alternative
D-Bus interface where all the 'h' (FDs) arguments are replaced with
'ay' (WSASocketW data), and sockets are passed to the other end via
WSADuplicateSocket().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-6-marcandre.lureau@redhat.com>
gdbus-codegen doesn't support conditions or pre-processing.
Rather than duplicating D-Bus interfaces for win32 adaptation, let's
have a preprocess step, so we can have platform-specific interfaces.
The python script is based on
https://github.com/peitaosu/XML-Preprocessor, with bug fixes, some
testing and replacing lxml dependency with the built-in xml module.
This preprocessing syntax style is not very common, but is similar to
the one provided by WiX (https://wixtoolset.org/docs/v3/overview/preprocessor/)
or wixl, that we adopted in QEMU for packaging the guest agent.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-5-marcandre.lureau@redhat.com>
eglMakeCurrent() returns 1/EGL_TRUE on success. This is not what the
callback expects, where 0 indicates success.
While at it, print the EGL error to ease debugging.
As with virgl_renderer_callbacks, the return value is now checked since
version >= 4:
7f09e6bf0c
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230606115658.677673-3-marcandre.lureau@redhat.com>
There were often cases where a scanout blob sometimes has just 1 entry
that is linked to many pages in it. So just checking whether iov_cnt is 1
is not enough for screening small, non-scanout blobs. Therefore adding
iov_len check as well to make sure it creates an udmabuf only for a scanout
blob, which is at least bigger than one page size.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230621222704.29932-1-dongwon.kim@intel.com>
If the monitor or the serial port use STDIO as backend on Windows 11 host,
e.g. -nographic options is used, the monitor or the guest Linux do not
response to arrow keys.
When Windows creates a console, ENABLE_VIRTUAL_PROCESS_INPUT is disabled
by default. Arrow keys cannot be retrieved by ReadFile or ReadConsoleInput
functions.
Add ENABLE_VIRTUAL_PROCESS_INPUT to the flag which is passed to SetConsoleMode,
when opening stdio console.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1674
Signed-off-by: Zhang Huasen <huasenzhang@foxmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <tencent_8DA57B405D427A560FD40F8FB0C0B1ADDE09@qq.com>
The following points sometimes can reduce much data
to copy:
1. When width matches, we can transfer data with one
call of iov_to_buf().
2. Only the required height need to transfer, not
whole image.
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230612021358.25068-1-zhukeqian1@huawei.com>
The icount-based QEMU_CLOCK_VIRTUAL runs ahead of the RT clock at times.
When warping, it is possible it is still ahead at the end of the warp,
which causes icount adaptive mode to adjust it backward. This can result
in the machine observing time going backwards.
Prevent this by clamping adaptive adjustment to 0 at minimum.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230627061406.241847-1-npiggin@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-27 09:00:13 +02:00
616 changed files with 24032 additions and 5372 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.