A DRC with a pending unplug request releases its associated device at
machine reset time.
In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.
This is exactly what happens with vhost backends, and QEMU aborts
with:
qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
`r >= 0' failed.
The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).
To avoid such situations, let's reset DRCs after all devices
have been reset.
Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features
are used to communicate features of the cpu to the guest operating
system. The properties of each of these are determined based on the
selected cpu model and the availability of hypervisor features.
Currently the compatibility mode of the cpu is not taken into account.
The ibm,arch-vec-5-platform-support node is used to communicate the
level of support for various ISAv3 processor features to the guest
before CAS to inform the guests' request. The available mmu mode should
only be hash unless the cpu is a POWER9 which is not in a prePOWER9
compat mode, in which case the available modes depend on the
accelerator and the hypervisor capabilities.
The ibm,pa-featues node is used to communicate the level of cpu support
for various features to the guest os. This should only contain features
relevant to the operating mode of the processor, that is the selected
cpu model taking into account any compat mode. This means that the
compat mode should be taken into account when choosing the properties of
ibm,pa-features and they should match the compat mode selected, or the
cpu model selected if no compat mode.
Update the setting of these cpu features in the device tree as described
above to properly take into account any compat mode. We use the
ppc_check_compat function which takes into account the current processor
model and the cpu compat mode.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Block layer patches for 2.11.0-rc2
# gpg: Signature made Fri 17 Nov 2017 17:58:36 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (25 commits)
iotests: Make 087 pass without AIO enabled
block: Make bdrv_next() keep strong references
qcow2: Fix overly broad madvise()
qcow2: Refuse to get unaligned offsets from cache
qcow2: Add bounds check to get_refblock_offset()
block: Guard against NULL bs->drv
qcow2: Unaligned zero cluster in handle_alloc()
qcow2: check_errors are fatal
qcow2: reject unaligned offsets in write compressed
iotests: Add test for failing qemu-img commit
tests: Add check-qobject for equality tests
iotests: Add test for non-string option reopening
block: qobject_is_equal() in bdrv_reopen_prepare()
qapi: Add qobject_is_equal()
qapi/qlist: Add qlist_append_null() macro
qapi/qnull: Add own header
qcow2: fix image corruption on commit with persistent bitmap
iotests: test clearing unknown autoclear_features by qcow2
block: Fix permissions in image activation
qcow2: fix image corruption after committing qcow2 image into base
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block patches for 2.11.0-rc2
# gpg: Signature made Fri Nov 17 18:22:07 2017 CET
# gpg: using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-11-17:
iotests: Make 087 pass without AIO enabled
block: Make bdrv_next() keep strong references
qcow2: Fix overly broad madvise()
qcow2: Refuse to get unaligned offsets from cache
qcow2: Add bounds check to get_refblock_offset()
block: Guard against NULL bs->drv
qcow2: Unaligned zero cluster in handle_alloc()
qcow2: check_errors are fatal
qcow2: reject unaligned offsets in write compressed
iotests: Add test for failing qemu-img commit
tests: Add check-qobject for equality tests
iotests: Add test for non-string option reopening
block: qobject_is_equal() in bdrv_reopen_prepare()
qapi: Add qobject_is_equal()
qapi/qlist: Add qlist_append_null() macro
qapi/qnull: Add own header
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On one hand, it is a good idea for bdrv_next() to return a strong
reference because ideally nearly every pointer should be refcounted.
This fixes intermittent failure of iotest 194.
On the other, it is absolutely necessary for bdrv_next() itself to keep
a strong reference to both the BB (in its first phase) and the BDS (at
least in the second phase) because when called the next time, it will
dereference those objects to get a link to the next one. Therefore, it
needs these objects to stay around until then. Just storing the pointer
to the next in the iterator is not really viable because that pointer
might become invalid as well.
Both arguments taken together means we should probably just invoke
bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert
that bdrv_next() is always called from the main loop, but that was
probably necessary already before this patch and judging from the
callers, it also looks to actually be the case.
Keeping these strong references means however that callers need to give
them up if they decide to abort the iteration early. They can do so
through the new bdrv_next_cleanup() function.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110172545.32609-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
@mem_size and @offset are both size_t, thus subtracting them from one
another will just return a big size_t if mem_size < offset -- even more
obvious here because the result is stored in another size_t.
Checking that result to be positive is therefore not sufficient to
exclude the case that offset > mem_size. Thus, we currently sometimes
issue an madvise() over a very large address range.
This is triggered by iotest 163, but with -m64, this does not result in
tangible problems. But with -m32, this test produces three segfaults,
all of which are fixed by this patch.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171114184127.24238-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Instead of using an assertion, it is better to emit a corruption event
here. Checking all offsets for correct alignment can be tedious and it
is easily possible to forget to do so. qcow2_cache_do_get() is a
function every L2 and refblock access has to go through, so this is a
good central point to add such a check.
And for good measure, let us also add an assertion that the offset is
non-zero. Making this a corruption event is not feasible, because a
zero offset usually means something special (such as the cluster is
unused), so all callers should be checking this anyway. If they do not,
it is their fault, hence the assertion here.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-6-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We currently do not guard everywhere against a NULL bs->drv where we
should be doing so. Most of the places fixed here just do not care
about that case at all.
Some care implicitly, e.g. through a prior function call to
bdrv_getlength() which would always fail for an ejected BDS. Add an
assert there to make it more obvious.
Other places seem to care, but do so insufficiently: Freeing clusters in
a qcow2 image is an error-free operation, but it may leave the image in
an unusable state anyway. Giving qcow2_free_clusters() an error code is
not really viable, it is much easier to note that bs->drv may be NULL
even after a successful driver call. This concerns bdrv_co_flush(), and
the way the check is added to bdrv_co_pdiscard() (in every iteration
instead of only once).
Finally, some places employ at least an assert(bs->drv); somewhere, that
may be reasonable (such as in the reopen code), but in
bdrv_has_zero_init(), it is definitely not. Returning 0 there in case
of an ejected BDS saves us much headache instead.
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728660
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Currently, bdrv_reopen_prepare() assumes that all BDS options are
strings. However, this is not the case if the BDS has been created
through the json: pseudo-protocol or blockdev-add.
Note that the user-invokable reopen command is an HMP command, so you
can only specify strings there. Therefore, specifying a non-string
option with the "same" value as it was when originally created will now
return an error because the values are supposedly similar (and there is
no way for the user to circumvent this but to just not specify the
option again -- however, this is still strictly better than just
crashing).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171114180128.17076-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
If an image contains persistent bitmaps, we cannot use the
fast path of bdrv_make_empty() to clear the image during
qemu-img commit, because that will lose the clusters related
to the bitmaps.
Also leave a comment in qcow2_read_extensions to remind future
feature additions to think about fast-path removal, since we
just barely fixed the same bug for LUKS encryption.
It's a pain that qemu-img has not yet been taught to manipulate,
or even at a very minimum display, information about persistent
bitmaps; instead, we have to use QMP commands. It's also a
pain that only qeury-block and x-debug-block-dirty-bitmap-sha256
will allow bitmap introspection; but the former requires the
node to be hooked to a block device, and the latter is experimental.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Test clearing unknown autoclear_features by qcow2 on incoming
migration.
[ kwolf: Fixed wait for destination VM startup ]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Inactive images generally request less permissions for their image files
than they would if they were active (in particular, write permissions).
Activating the image involves extending the permissions, therefore.
drv->bdrv_invalidate_cache() can already require write access to the
image file, so we have to update the permissions earlier than that.
The current code does it only later, so we have to move up this part.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
nbd patches for 2017-11-17
Eric Blake - nbd: Don't crash when server reports NBD_CMD_READ failure
Eric Blake - nbd/client: Use error_prepend() correctly
Eric Blake - nbd/client: Don't hard-disconnect on ESHUTDOWN from server
Eric Blake - nbd/server: Fix error reporting for bad requests
# gpg: Signature made Fri 17 Nov 2017 14:53:30 GMT
# gpg: using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2017-11-17:
nbd/server: Fix error reporting for bad requests
nbd/client: Don't hard-disconnect on ESHUTDOWN from server
nbd/client: Use error_prepend() correctly
nbd: Don't crash when server reports NBD_CMD_READ failure
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored. The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC). However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.
More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply. Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests). So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113194857.13933-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:
can't open device nbd://localhost:10809/: Failed to read dataUnexpected end-of-file before all bytes were read
Originally botched in commit e44ed99d, then several more instances
added in the meantime.
Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113152424.25381-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
If a server fails a read, for example with EIO, but the connection
is still live, then we would crash trying to print a non-existent
error message in nbd_client_co_preadv(). For consistency, also
change the error printout in nbd_read_reply_entry(), although that
instance does not crash. Bug introduced in commit f140e300.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171112013936.5942-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
After committing the qcow2 image contents into the base image, qemu-img
will call bdrv_make_empty to drop the payload in the layered image.
When this is done for qcow2 images, it blows away the LUKS encryption
header, making the resulting image unusable. There are two codepaths
for emptying a qcow2 image, and the second (slower) codepath leaves
the LUKS header intact, so force use of that codepath.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_read_only() is used by some block drivers to override the
read-only option given by the user. This is not how read-only images
generally work in QEMU: Instead of second guessing what the user really
meant (which currently includes making an image read-only even if the
user didn't only use the default, but explicitly said read-only=off), we
should error out if we can't provide what the user requested.
This adds deprecation warnings to all callers of bdrv_set_read_only() so
that the behaviour can be corrected after the usual deprecation period.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently if trying to change encryption parameters on a qcow2 image, qemu-img
will abort. We already explicitly check for attempt to change encrypt.format
but missed other parameters like encrypt.key-secret. Rather than list each
parameter, just blacklist changing of all parameters with a 'encrypt.' prefix.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This avoids that random UI frontend error messages end up in the output.
In particular, we were seeing this line in CI error logs:
+Unable to init server: Could not connect: Connection refused
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
replication_child_perm request write
permissions for all child which will lead bdrv_check_perm fail.
replication_child_perm() should request write
permissions only if it is writable itself.
Signed-off-by: Wang Guang <wang.guang55@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong155@zte.com.cn>
Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sdl2 fixes for 2.11
# gpg: Signature made Fri 17 Nov 2017 10:06:27 GMT
# gpg: using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138
* remotes/kraxel/tags/ui-20171117-pull-request:
sdl2: Fix broken display updating after the window is hidden
sdl2: Do not leave grab when fullscreen
sdl2: Fix dead keyboard after fullsceen
sdl2: Use the same pointer show/hide logic for absolute and relative mode
sdl2: Do not quit the emulator when an auxilliary window is closed
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
pc, pci, virtio: fixes for rc1
A bunch of fixes all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Thu 16 Nov 2017 16:37:21 GMT
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
tests/bios-tables-test: Fix endianess problems when passing data to iasl
build-sys: restrict vmcoreinfo to fw_cfg+dma capable targets
vmcoreinfo: put it in the 'misc' device category
NUMA: Enable adding NUMA node implicitly
tests/acpi-test-data: update _CRS in DSDT
hw/pcie-pci-bridge: restrict to X86 and ARM
hw/pci-host: Fix x86 Host Bridges 64bit PCI hole
pci: Initialize pci_dev->name before use
fix: unrealize virtio device if we fail to hotplug it
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The bios-tables-test was writing out files that we pass to iasl in
with the wrong endianness in the header when running on a big endian
host. So instead of storing mixed endian information in our structures,
let's keep everything in little endian and byte-swap it only when we
need a value in the code.
Reported-by: Daniel P. Berrange <berrange@redhat.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Tested-by: "Daniel P. Berrange" <berrange@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vmcoreinfo is built for all targets. However, it requires fw_cfg with
DMA operations support (write operation). Restrict vmcoreinfo exposure
to architectures that are supporting FW_CFG_DMA, that is arm-virt and
x86 only atm.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.
Which breaks both linux and windows guests in certain conditions:
* Windows: won't enable memory hotplug without SRAT table at all
* Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
when memory is hotplugged and guest tries to use it with that drivers.
Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break migration).
Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
* Windows: to enable memory hotplug
* Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
buffers that legacy drivers/hw can handle.
[Rewritten by Igor]
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Izumi Taku <izumi.taku@jp.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
commit dadf988e81b15065ac1d6dbaf4b87b5b80c7b670
hw/pci-host: Fix x86 Host Bridges 64bit PCI hole
Added a 64 bit hole to _CRS of PCI0.
Update the expected files accordingly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently there is no MMIO range over 4G
reserved for PCI hotplug. Since the 32bit PCI hole
depends on the number of cold-plugged PCI devices
and other factors, it is very possible is too small
to hotplug PCI devices with large BARs.
Fix it by reserving 2G for I4400FX chipset
in order to comply with older Win32 Guest OSes
and 32G for Q35 chipset.
Even if the new defaults of pci-hole64-size will appear in
"info qtree" also for older machines, the property was
not implemented so no changes will be visible to guests.
Note this is a regression since prev QEMU versions had
some range reserved for 64bit PCI hotplug.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This moves pci_dev->name initialization earlier so
pci_dev->bus_master_as could get a name instead of an empty string.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If we fail to hotplug virtio-blk device and then suspend
or shutdown VM, qemu is likely to crash.
Re-production steps:
1. Run VM named vm001
2. Create a virtio-blk.xml which contains wrong configurations:
<disk device="lun" rawio="yes" type="block">
<driver cache="none" io="native" name="qemu" type="raw" />
<source dev="/dev/mapper/11-dm" />
<target bus="virtio" dev="vdx" />
</disk>
3. Run command : virsh attach-device vm001 virtio-blk.xml
error: Failed to attach device from blk-scsi.xml
error: internal error: unable to execute QEMU command 'device_add': Please set scsi=off for virtio-blk devices in order to use virtio 1.0
it means hotplug virtio-blk device failed.
4. Suspend or shutdown VM will leads to qemu crash
Problem happens in virtio_vmstate_change which is called by
vm_state_notify:
vdev’s parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
virtio_vmstate_change is added to the list vm_change_state_head at virtio_blk_device_realize(virtio_init),
but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed from vm_change_state_head.
Adding unrealize function of virtio-blk device can solve this problem.
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
tg->any_timer_armed[] must be cleared when detaching pending timers from
the AioContext. Failure to do so leads to hung I/O because it looks
like there are still timers pending when in fact they have been removed.
Other ThrottleGroupMembers might have requests pending too so it's
necessary to schedule the next TGM so it can set a timer.
This patch fixes hung I/O when QEMU is launched with drives that are in
the same throttling group:
(guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 &
(guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 &
(qemu) stop
(qemu) cont
...I/O is stuck...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171116112150.27607-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Merge tpm 2017/11/15 v1
# gpg: Signature made Wed 15 Nov 2017 11:51:47 GMT
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-11-15-1:
tpm_tis: Return 0 for every register in case of failure mode
tpm_tis: Return TPM_VERSION_UNSPEC in case of BE failure
tpm-emulator: protect concurrent ctrl_chr access
specs: Extend TPM spec with TPM emulator description
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This fixes a crash caused by picking the wrong memory region in
address_space_lookup_region seen with client code accessing a device
model that uses alias memory regions. The expensive part of
address_space_lookup_region anyway is phys_page_find; performance-wise
it is okay to repeat the subsequent subpage lookup.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20171114225941.072707456B5@zero.eik.bme.hu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than returning ~0, return 0 for every register in case of failure
mode. The '0' is better to indicate that there's no device there. It avoids
SeaBIOS detecting a device and getting stuck on it trying to read and write
its registers.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
In case the backend has a failure, such as the tpm_emulator's CMD_INIT
failing, the TIS goes into failure mode and does not respond to reads
or writes to MMIO registers. In this case we need to prevent the ACPI
table from being added and the straight-forward way is to indicate that
there's no known TPM version being used.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The control chardev is being used from the data thread to set the
locality of the next request. Altough the chr has a write mutex, we
may potentially read the reply from another thread request.
Add a mutex to protect from concurrent control commands.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Following the recent extension of QEMU with a TPM emulator device,
update the specs describing for how to interact with the device.
The results of commands run inside a Linux VM are expected to be
similar to those when the TPM passthrough device is used, so we
just reuse that.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
We use raw memory primitives along the !parallel_cpus paths in order to
simplify the endianness handling. Because of that, we did not benefit
from the generic changes to cpu_ldst_user_only_template.h.
The simplest fix is to manipulate helper_retaddr here.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When we handle a signal from a fault within a user-only memory helper,
we cannot cpu_restore_state with the PC found within the signal frame.
Use a TLS variable, helper_retaddr, to record the unwind start point
to find the faulting guest insn.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Block patches for 2.11.0-rc1
# gpg: Signature made Tue 14 Nov 2017 17:22:17 GMT
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2017-11-14:
qemu-iotests: update unsupported image formats in 194
block/parallels: add migration blocker
block/parallels: Do not update header or truncate image when INMIGRATE
block/vhdx.c: Don't blindly update the header
iotests: 077: Filter out 'resume' lines
block/snapshot: dirty all dirty bitmaps on snapshot-switch
qcow2: Check that corrupted images can be repaired in iotest 060
iotests: Use new-style NBD connections
iotests: Make 136 less flaky
iotests: Make 083 less flaky
iotests: Make 055 less flaky
iotests: Add missing 'blkdebug::' in 040
iotests: Make 030 less flaky
qcow2: Assert that the crypto header does not overlap other metadata
qcow2: Add iotest for an empty refcount table
qcow2: Add iotest for an image with header.refcount_table_offset == 0
qcow2: Don't open images with header.refcount_table_clusters == 0
qcow2: Prevent allocating compressed clusters at offset 0
qcow2: Prevent allocating L2 tables at offset 0
qcow2: Prevent allocating refcount blocks at offset 0
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The VHDX specification requires that before user data modification of
the vhdx image, the VHDX header file and data GUIDs need to be updated.
In vhdx_open(), if the image is set to RDWR, we go ahead and update the
header.
However, just because the image is set to RDWR does not mean we can go
ahead and write at this point - specifically, if the QEMU run state is
INMIGRATE, the underlying file BS may be set to inactive via the BDS
open flag of BDRV_O_INACTIVE. Attempting to write under this condition
will cause an assert in bdrv_co_pwritev().
We can alternatively latch the first time the image is written. And lo
and behold, we do just that, via vhdx_user_visible_write() in
vhdx_co_writev(). This means the call to vhdx_update_headers() in
vhdx_open() is likely just vestigial, and can be removed.
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 659e4cdba6ef4c651737852777c8c93d27b38040.1510059970.git.jcody@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
In the "Overlapping multiple requests" cases, the 3rd reqs (the break
point B) doesn't wait for the 2nd, and once resumed the I/O will just
continue. This is because the 2nd is already waiting for the 1st, and
in wait_serialising_requests() there is:
/* If the request is already (indirectly) waiting for us, or
* will wait for us as soon as it wakes up, then just go on
* (instead of producing a deadlock in the former case). */
if (!req->waiting_for) {
/* actually break */
...
}
Consequently, the following "sleep 100; resume A" command races with the
completion of that request, and sometimes results in an unexpected
order of output:
> @@ -56,9 +56,9 @@
> wrote XXX/XXX bytes at offset XXX
> XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> blkdebug: Resuming request 'B'
> +blkdebug: Resuming request 'A'
> wrote XXX/XXX bytes at offset XXX
> XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -blkdebug: Resuming request 'A'
> wrote XXX/XXX bytes at offset XXX
> XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> wrote XXX/XXX bytes at offset XXX
Filter out the "Resuming request" lines to make the output
deterministic.
Reported-by: Patchew <no-reply@patchew.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20171113150026.4743-1-famz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Old-style NBD is deprecated upstream (it is documented, but no
longer implemented in the reference implementation), and it is
severely limited (it cannot support structured replies, which
means it cannot support efficient handling of zeroes), when
compared to new-style NBD. We are better off having our iotests
favor new-style everywhere (although some explicit tests,
particularly 83, still cover old-style for back-compat reasons);
this is as simple as supplying the empty string as the default
export name, as it does not change the URI needed to connect a
client to the server. This also gives us more coverage of the
just-added structured reply code, when not overriding $QEMU_NBD
to intentionally point to an older server.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20171109221216.10248-1-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
136 executes some AIO requests without a final aio_flush; then it
advances the virtual clock and thus expects the last access time of the
device to be less than the current time when queried (i.e. idle_time_ns
to be greater than 0). However, without the aio_flush, some requests
may be settled after the clock_step invocation. In that case,
idle_time_ns would be 0 and the test fails.
Fix this by adding an aio_flush if any AIO request other than some other
aio_flush has been executed.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171109203025.27493-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
083 has (at least) two issues:
1. By launching the nbd-fault-injector in background, it may not be
scheduled until the first grep on its output file is executed.
However, until then, that file may not have been created yet -- so it
either does not exist yet (thus making the grep emit an error), or it
does exist but contains stale data (thus making the rest of the test
case work connect to a wrong address).
Fix this by explicitly overwriting the output file before executing
nbd-fault-injector.
2. The nbd-fault-injector prints things other than "Listening on...".
It also prints a "Closing connection" message from time to time. We
currently invoke sed on the whole file in the hope of it only
containing the "Listening on..." line yet. That hope is sometimes
shattered by the brutal reality of race conditions, so make the sed
script more robust.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171109203025.27493-5-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
First of all, test 055 does a valiant job of invoking pause_drive()
sometimes, but that is worth nothing without blkdebug. So the first
thing to do is to sprinkle a couple of "blkdebug::" in there -- with the
exception of the transaction tests, because the blkdebug break points
make the transaction QMP command hang (which is bad). In that case, we
can get away with throttling the block job that it effectively is
paused.
Then, 055 usually does not pause the drive before starting a block job
that should be cancelled. This means that the backup job might be
completed already before block-job-cancel is invoked; thus making the
test either fail (currently) or moot if cancel_and_wait() ignored this
condition. Fix this by pausing the drive before starting the job.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171109203025.27493-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch fixes two race conditions in 030:
1. The first is in TestENOSPC.test_enospc(). After resuming the job,
querying it to confirm it is no longer paused may fail because in the
meantime it might have completed already. The same was fixed in
TestEIO.test_ignore() already (in commit
2c3b44da07).
2. The second is in TestSetSpeed.test_set_speed_invalid(): Here, a
stream job is started on a drive without any break points, with a
block-job-set-speed invoked subsequently. However, without any break
points, the job might have completed in the meantime (on tmpfs at
least); or it might complete before cancel_and_wait() which expects
the job to still exist. This can be fixed like everywhere else by
pausing the drive (installing break points) before starting the job
and letting cancel_and_wait() resume it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171109203025.27493-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The crypto header is initialized only when QEMU is creating a new
image, so there's no chance of this happening on a corrupted image.
If QEMU is really trying to allocate the header overlapping other
existing metadata sections then this is a serious bug in QEMU itself
so let's add an assertion.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: ae3d77f312fc0c5e0ac2bbd71676c0112eebe2e5.1509718618.git.berto@igalia.com
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
qcow2_do_open() is checking that header.refcount_table_clusters is not
too large, but it doesn't check that it's greater than zero. Apart
from the fact that an image like that is obviously corrupted, trying
to use it crashes QEMU since we end up with a null s->refcount_table
after qcow2_refcount_init().
These images can however be repaired, so allow opening them if the
BDRV_O_CHECK flag is set.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: f9750f50c80359babba11062e88f5075a47e8e16.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
If the refcount data is corrupted then we can end up trying to
allocate a new compressed cluster at offset 0 in the image, triggering
an assertion in qcow2_alloc_bytes() that would crash QEMU:
qcow2_alloc_bytes: Assertion `offset' failed.
This patch adds an explicit check for this scenario and a new test
case.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: fb53467cf48e95ff3330def1cf1003a5b862b7d9.1509718618.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
If the refcount data is corrupted then we can end up trying to
allocate a new L2 table at offset 0 in the image, triggering an
assertion in the qcow2 cache that would crash QEMU:
qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed
This patch adds an explicit check for this scenario and a new test
case.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 92dac37191ae7844a2da22c122204eb493cc3133.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Each entry in the qcow2 cache contains an offset field indicating the
location of the data in the qcow2 image. If the offset is 0 then it
means that the entry contains no data and is available to be used when
needed.
Because of that it is not possible to store in the cache the first
cluster of the qcow2 image (offset = 0). This is not a problem because
that cluster always contains the qcow2 header and we're not using this
cache for that.
However, if the qcow2 image is corrupted it can happen that we try to
allocate a new refcount block at offset 0, triggering this assertion
and crashing QEMU:
qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed
This patch adds an explicit check for this scenario and a new test
case.
This problem was originally reported here:
https://bugs.launchpad.net/qemu/+bug/1728615
Reported-by: R.Nageswara Sastry <nasastry@in.ibm.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 92a2fadd10d58b423f269c1d1a309af161cdc73f.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Pull request
The following disk I/O throttling fixes solve recent bugs.
# gpg: Signature made Tue 14 Nov 2017 10:37:12 GMT
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
qemu-iotests: Test I/O limits with removable media
block: Leave valid throttle timers when removing a BDS from a backend
block: Check for inserted BlockDriverState in blk_io_limits_disable()
throttle-groups: drain before detaching ThrottleState
block: all I/O should be completed before removing throttle timers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Update our pre-release seabios snapshot to the final release.
git shortlog
============
Gerd Hoffmann (1):
sercon: Disable ScreenAndDebug in case both serial console and serial debug are active
Kevin O'Connor (2):
timer: Avoid integer overflows in usec and nsec calculations
docs: Note v1.11.0 release
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
# gpg: Signature made Tue 14 Nov 2017 02:05:34 GMT
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net/socket: fix coverity issue
Add new PCI ID for i82559a
Fix eepro100 simple transmission mode
colo: Consolidate the duplicate code chunk into a routine
colo-compare: Fix comments
colo-compare: compare the packet in a specified Connection
colo-compare: Insert packet into the suitable position of packet queue directly
net: fix check for number of parameters to -netdev socket
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch ensures that icount_decr.u32.high is clear before calling
cpu_exec_nocache when exception is pending. Because the exception is
caused by the first instruction in the block and it cannot be executed
without resetting the flag.
There are two parts in the fix. First, clear icount_decr.u32.high in
cpu_handle_interrupt (just before processing the "dependent" request,
stored in cpu->interrupt_request or cpu->exit_request) rather than
cpu_loop_exec_tb; this ensures that cpu_handle_exception is always
reached with zero icount_decr.u32.high unless another interrupt has
happened in the meanwhile.
Second, try to cause the exception at the beginning of
cpu_handle_exception, and exit immediately if the TB cannot
execute. With this change, interrupts are processed and
cpu_exec_nocache can make process.
Signed-off-by: Maria Klimushenkova <maria.klimushenkova@ispras.ru>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20171114081818.27640.33165.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds a condition before overwriting exception_index fiels.
It is needed when exception_index is already set to some meaningful value.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20171114081812.27640.26372.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 5c0919d0 [1] introduced virtqueue_size parameter
for common virtio-scsi path, without updaing the vhost-user-scsi
code. vhost-user-scsi devices right now report size 0 for each vq.
This patch introduces virtqueue_size param to vhost-user-scsi,
that can now be set by the user. However, the most importantly, it
now has a default value of 128 (same as QEMU's virtio-scsi).
[1] 5c0919d0 ("virtio-scsi: Add virtqueue_size parameter
allowing virtqueue size to be set.")
Change-Id: I70e87eab702ebf1196c028dbf17d54fdc0c89a14
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Message-Id: <1510676916-76409-1-git-send-email-dariuszx.stojaczyk@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using obscure black magic introduced in eaa2ddbb76 :)
In an out-of-tree directory, running "../configure && make help" will generate
some required files (.mak), then clone some submodules, compile at least
the capstone submodule, generate QMP and Trace files, and finally display
the help.
On an outdated computer (Sun Blade workstation), running "make help" took
more than 5h :) With this patch it took roughly 37min.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20171108032052.20029-1-f4bug@amsat.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
target-arm queue:
* translate-a64.c: silence gcc5 warning
* highbank: validate register offset before access
* MAINTAINERS: Add entries for Smartfusion2
* accel/tcg/translate-all: expand cpu_restore_state addr check
(so usermode insn aborts don't crash with an assertion failure)
* fix TCG initialization of some Arm boards by allowing them
to specify min/default number of CPUs to create
# gpg: Signature made Mon 13 Nov 2017 14:11:09 GMT
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171113:
accel/tcg/translate-all: expand cpu_restore_state addr check
hw: add .min_cpus and .default_cpus fields to machine_class
xlnx-zcu102: Specify the max number of CPUs for the EP108
xlnx-zcu102: Add an info message deprecating the EP108
xlnx-zynqmp: Properly support the smp command line option
qom: move CPUClass.tcg_initialize to a global
MAINTAINERS: Add entries for Smartfusion2
highbank: validate register offset before access
arm/translate-a64: mark path as unreachable to eliminate warning
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When using the emulated XICS, the 'info pic' monitor command shows:
CPU 0 XIRR=ff000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10040060340
1000 MSI 05 00
1001 MSI 05 00
1002 MSI 05 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI 05 00
1009 MSI 05 00
100a MSI 05 00
100b MSI 05 00
100c MSI 05 00
but when using the in-kernel XICS with the very same guest, we get:
CPU 0 XIRR=00000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10032e00340
1000 MSI ff 00
1001 MSI ff 00
1002 MSI ff 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI ff 00
1009 MSI ff 00
100a MSI ff 00
100b MSI ff 00
100c MSI ff 00
ie, all irqs are masked and XIRR is null, while we should get the
same output as with the emulated XICS.
If the guest is then migrated, 'info pic' shows the expected values
on both source and destination.
The problem is that QEMU doesn't synchronize with KVM before printing
the XICS state. Migration happens to fix the output because it enforces
synchronization with KVM.
To fix the invalid output of 'info pic', this patch introduces a new
synchronize_state operation for both ICPStateClass and ICSStateClass.
The ICP operation relies on run_on_cpu() in order to kick the vCPU
and avoid sleeping on KVM_GET_ONE_REG.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM HV will soon support running a guest in hash mode on a POWER9 host
running in radix mode (see [1]), however the guest currently fails to
boot.
This is because the "htab_shift" value (the size of the MMU's hash
table) is added to the device tree before KVM has had a chance to
change it. If the host is in hash mode, KVM does not need to change it
and so the problem is not seen, but when the host is in radix mode a
change is required and we see a problem.
To fix this, move the call spapr_setup_hpt_and_vrma() (where
htab_shift could be changed) up a little so that it's called before
spapr_h_cas_compose_response() (where htab_shift is added to the
device tree).
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[1] See http://www.spinics.net/lists/kvm-ppc/msg13057.html
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This test hotplugs a CD drive to a VM and checks that I/O limits can
be set only when the drive has media inserted and that they are kept
when the media is replaced.
This also tests the removal of a device with valid I/O limits set but
no media inserted. This involves deleting and disabling the limits
of a BlockBackend without BlockDriverState, a scenario that has been
crashing until the fixes from the last couple of patches.
[Python PEP8 fixup: "Don't use spaces are the = sign when used to
indicate a keyword argument or a default parameter value"
--Stefan]
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 071eb397118ed207c5a7f01d58766e415ee18d6a.1510339534.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a BlockBackend has I/O limits set then its ThrottleGroupMember
structure uses the AioContext from its attached BlockDriverState.
Those two contexts must be kept in sync manually. This is not
ideal and will be fixed in the future by removing the throttling
configuration from the BlockBackend and storing it in an implicit
filter node instead, but for now we have to live with this.
When you remove the BlockDriverState from the backend then the
throttle timers are destroyed. If a new BlockDriverState is later
inserted then they are created again using the new AioContext.
There are a couple of problems with this:
a) The code manipulates the timers directly, leaving the
ThrottleGroupMember.aio_context field in an inconsisent state.
b) If you remove the I/O limits (e.g by destroying the backend)
when the timers are gone then throttle_group_unregister_tgm()
will attempt to destroy them again, crashing QEMU.
While b) could be fixed easily by allowing the timers to be freed
twice, this would result in a situation in which we can no longer
guarantee that a valid ThrottleState has a valid AioContext and
timers.
This patch ensures that the timers and AioContext are always valid
when I/O limits are set, regardless of whether the BlockBackend has a
BlockDriverState inserted or not.
[Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com>
--Stefan]
Reported-by: sochin jiang <sochin.jiang@huawei.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When you set I/O limits using block_set_io_throttle or the command
line throttling.* options they are kept in the BlockBackend regardless
of whether a BlockDriverState is attached to the backend or not.
Therefore when removing the limits using blk_io_limits_disable() we
need to check if there's a BDS before attempting to drain it, else it
will crash QEMU. This can be reproduced very easily using HMP:
(qemu) drive_add 0 if=none,throttling.iops-total=5000
(qemu) drive_del none0
Reported-by: sochin jiang <sochin.jiang@huawei.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
I/O requests hang after stop/cont commands at least since QEMU 2.10.0
with -drive iops=100:
(guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000
(qemu) stop
(qemu) cont
...I/O is stuck...
This happens because blk_set_aio_context() detaches the ThrottleState
while requests may still be in flight:
if (tgm->throttle_state) {
throttle_group_detach_aio_context(tgm);
throttle_group_attach_aio_context(tgm, new_context);
}
This patch encloses the detach/attach calls in a drained region so no
I/O request is left hanging. Also add assertions so we don't make the
same mistake again in the future.
Reported-by: Yongxue Hong <yhong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20171110151934.16883-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In blk_remove_bs, all I/O should be completed before removing throttle
timers. If there has inflight I/O, removing throttle timers here will
cause the inflight I/O never return.
This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
to let all I/O completed before removing throttle timers.
[Moved declaration of bs as suggested by Alberto Garcia
<berto@igalia.com>.
--Stefan]
Signed-off-by: Zhengui <lizhengui@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We are still seeing signals during translation time when we walk over
a page protection boundary. This expands the check to ensure the host
PC is inside the code generation buffer. The original suggestion was
to check versus tcg_ctx.code_gen_ptr but as we now segment the
translation buffer we have to settle for just a general check for
being inside.
I've also fixed up the declaration to make it clear it can deal with
invalid addresses. A later patch will fix up the call sites.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20171108153245.20740-2-alex.bennee@linaro.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
max_cpus needs to be an upper bound on the number of vCPUs
initialized; otherwise TCG region initialization breaks.
Some boards initialize a hard-coded number of vCPUs, which is not
captured by the global max_cpus and therefore breaks TCG initialization.
Fix it by adding the .min_cpus field to machine_class.
This commit also changes some user-facing behaviour: we now die if
-smp is below this hard-coded vCPU minimum instead of silently
ignoring the passed -smp value (sometimes announcing this by printing
a warning). However, the introduction of .default_cpus lessens the
likelihood that users will notice this: if -smp isn't set, we now
assign the value in .default_cpus to both smp_cpus and max_cpus. IOW,
if a user does not set -smp, they always get a correct number of vCPUs.
This change fixes 3468b59 ("tcg: enable multiple TCG contexts in
softmmu", 2017-10-24), which broke TCG initialization for some
ARM boards.
Fixes: 3468b59e18
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-id: 1510343626-25861-6-git-send-email-cota@braap.org
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The EP108 was an early access development board that is no longer used.
Add an info message to convert any users to the ZCU102 instead. On QEMU
they are both identical.
This patch also updated the qemu-doc.texi file to indicate that the
EP108 has been deprecated.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Message-id: 1510343626-25861-4-git-send-email-cota@braap.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
55c3cee ("qom: Introduce CPUClass.tcg_initialize", 2017-10-24)
introduces a per-CPUClass bool that we check so that the target CPU
is initialized for TCG only once. This works well except when
we end up creating more than one CPUClass, in which case we end
up incorrectly initializing TCG more than once, i.e. once for
each CPUClass.
This can be replicated with:
$ aarch64-softmmu/qemu-system-aarch64 -machine xlnx-zcu102 -smp 6 \
-global driver=xlnx,,zynqmp,property=has_rpu,value=on
In this case the class name of the "RPUs" is prefixed by "cortex-r5-",
whereas the "regular" CPUs are prefixed by "cortex-a53-". This
results in two CPUClass instances being created.
Fix it by introducing a static variable, so that only the first
target CPU being initialized will initialize the target-dependent
part of TCG, regardless of CPUClass instances.
Fixes: 55c3ceef61
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1510343626-25861-2-git-send-email-cota@braap.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fixes the following warning when compiling with gcc 5.4.0 with -O1
optimizations and --enable-debug:
target/arm/translate-a64.c: In function ‘aarch64_tr_translate_insn’:
target/arm/translate-a64.c:2361:8: error: ‘post_index’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (!post_index) {
^
target/arm/translate-a64.c:2307:10: note: ‘post_index’ was declared here
bool post_index;
^
target/arm/translate-a64.c:2386:8: error: ‘writeback’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (writeback) {
^
target/arm/translate-a64.c:2308:10: note: ‘writeback’ was declared here
bool writeback;
^
Note that idx comes from selecting 2 bits, and therefore its value
can be at most 3.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1510087611-1851-1-git-send-email-cota@braap.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Adds a new PCI ID for the i82559a (0x8086 0x1030) interface. The
"x-use-alt-device-id" property controls whether this new ID is to be
used, and is true by default, and set to false in a compat entry.
Signed-off-by: Mike Nawrocki <michael.nawrocki@gtri.gatech.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The simple transmission mode was treating the area immediately after the
transmit command block (TCB) as if it were a transmit buffer descriptor,
when in reality it is simply the packet data. This change simply copies
the data following the TCB into the packet buffer.
Signed-off-by: Mike Nawrocki <michael.nawrocki@gtri.gatech.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A package from pri_indev or sec_indev only belongs to a particular
Connection, so we only need to compare the package in the specified
Connection's primary_list and secondary_list, rather than for each
the whole Connection list to compare. This is time-consuming and
unnecessary.
Less checkpoint more efficiency.
Cc: Zhang Chen <zhangckid@gmail.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Currently, a packet from pri_dev or sec_dev is fristly pushed at the
tail of the primary or secondary packet queue then sorted by the tcp
sequence number.
Now, this patch use g_queue_insert_sorted to insert the packet directly
into the suitable position to avoid ordering all packets each time when
a new packet is comming, thereby increasing efficiency.
In addition, consolidate the code that add a packet to the list of
Connection (primary or secondary) into a separate routine colo_insert_packet()
since the same chunk of code is called from two place.
Cc: Zhang Chen <zhangckid@gmail.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since commit 0f8c289ad "net: fix -netdev socket,fd= for UDP sockets"
we allow more than one parameter for -netdev socket. But now
we run into an assert when no parameter at all is specified
> qemu-system-x86_64 -netdev socket
socket.c:729: net_init_socket: Assertion `sock->has_udp' failed.
Fix this by reverting the change of the if condition done in 0f8c289ad.
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-stable@nongnu.org
Fixes: 0f8c289ad5
Reported-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Merge qcrypto 2017/11/08 v1
# gpg: Signature made Wed 08 Nov 2017 11:06:38 GMT
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qcrypto-2017-11-08-1:
crypto: afalg: fix a NULL pointer dereference
tests: Run the luks tests in test-crypto-block only if encryption is available
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ppc patch queue 2017-11-08
Here's the current set of accumulated ppc patches for qemu-2.11.
Since we're now in hard freeze these are all bugfixes (although some
fix a bug by way of a cleanup).
# gpg: Signature made Wed 08 Nov 2017 08:10:38 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.11-20171108:
e500: ppce500_init_mpic() return device instead of IRQ array
hw/display/sm501: Fix comment in sm501_sysbus_class_init()
ppc: fix setting of compat mode
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
98c63057d2 ('slirp: Factorizing
tcpiphdr structure with an union') introduced a memset call to clear
possibly-undefined fields in ti. This however overwrites src/dst/pr which
are used below.
So let us clear only the unused fields.
This should fix some rare cases (some RST cases, keep alive probes)
where packets would be sent to 0.0.0.0.
Signed-off-by: Tao Wu <lepton@google.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
The NBD spec was recently clarified to state that a read of length 0
should not be attempted by a compliant client; but that a server must
still handle it correctly in an unspecified manner (that is, either
a successful no-op or an error reply, but not a crash) [1]. However,
it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero
payload length, but our existing code was replying with a chunk
that a picky client could reject as invalid because it was missing
a payload (our own client implementation was recently patched to be
that picky, after first fixing it to not send 0-length requests).
We are already doing successful no-ops for 0-length writes and for
non-structured reads; so for consistency, we want structured reply
reads to also be a no-op. The easiest way to do this is to return
a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper
function (especially since future patches for other structured
replies may benefit from using the same helper).
[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-8-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Ensure that the server is not sending unexpected chunk lengths
for either the NONE or the OFFSET_DATA chunk, nor unexpected
hole length for OFFSET_HOLE. This will flag any server as
broken that responds to a zero-length read with an OFFSET_DATA
(what our server currently does, but that's about to be fixed)
or with OFFSET_HOLE, even though we previously fixed our client
to never be able to send such a request over the wire.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD spec was recently clarified to state that clients should
not send 0-length requests to the server, as the server behavior
is undefined [1]. We know that qemu-nbd's behavior is a successful
no-op (once it has filtered for read-only exports), but other NBD
implementations might return an error. To avoid any questionable
server implementations, it is better to just short-circuit such
requests on the client side (we are relying on the block layer to
already filter out requests such as invalid offset, write to a
read-only volume, and so forth); do the short-circuit as late as
possible to still benefit from protections from assertions that
the block layer is not violating our assumptions.
[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-6-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
A closer read of the NBD spec shows that a structured reply chunk
for a hole is not quite identical to the prefix of a data chunk,
because the hole has to also send a 32-bit size field. Although
we do not yet send holes, we should fix the misleading information
in our header and make it easier for a future patch to support
sparse reads. Messed up in commit bae245d1.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD spec says that clients should not try to write/trim to
an export advertised as read-only by the server. But we failed
to check that, and would allow the block layer to use NBD with
BDRV_O_RDWR even when the server is read-only, which meant we
were depending on the server sending a proper EPERM failure for
various commands, and also exposes a leaky abstraction: using
qemu-io in read-write mode would succeed on 'w -z 0 0' because
of local short-circuiting logic, but 'w 0 0' would send a
request over the wire (where it then depends on the server, and
fails at least for qemu-nbd but might pass for other NBD
implementations).
With this patch, a client MUST request read-only mode to access
a server that is doing a read-only export, or else it will get
a message like:
can't open device nbd://localhost:10809/foo: request for write access conflicts with read-only export
It is no longer possible to even attempt writes over the wire
(including the corner case of 0-length writes), because the block
layer enforces the explicit read-only request; this matches the
behavior of qcow2 when backed by a read-only POSIX file.
Fix several iotests to comply with the new behavior (since
qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP,
default to a read-only export, we must tell blockdev-add/qemu-io to
set up a read-only client).
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When cross compiling QEMU for Windows we need to specify the cross
version of ranlib to avoid build errors when building capstone. This
patch ensures we use the same cross prefix on ranlib as other toolchain
components.
- Fedora23 mingw
- RHEL-7.2 with mingw packages from epel:
LINK qemu-img.exe
build-win64/capstone/capstone.lib: error adding symbols: Archive has no
index; run ranlib to add one
collect2: error: ld returned 1 exit status
$ x86_64-w64-mingw32-ar --version
GNU ar (GNU Binutils) 2.25
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <e457d4e906dceea4de6c3431813a06b137c1ab9c.1510103351.git.alistair.francis@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This feature is present for some targets in the bfd disassembler(s).
Implement it generically for all capstone users.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
namelen should be here, length is unrelated, and always 0 at this
point. Broken in introduction in commit f37708f6, but mostly
harmless (replying with '' as the name does not violate protocol,
and does not confuse qemu as the nbd client since our implementation
does not ask for the name; but might confuse some other client that
does ask for the name especially if the default export is different
than the export name being queried).
Adding an assert makes it obvious that we are not skipping any bytes
in the client's message, as well as making it obvious that we were
using the wrong variable.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: qemu-stable@nongnu.org
Message-Id: <20171101154204.27146-1-vsementsov@virtuozzo.com>
[eblake: improve commit message, squash in assert addition]
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit b7a745d added a qemu_bh_cancel call to the completion function
as an optimization to prevent it from unnecessarily rescheduling itself.
This completion function is scheduled from worker_thread, after setting
the state of a ThreadPoolElement to THREAD_DONE.
This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
memory barrier, the read of req->state may actually happen _before_ the
call, seeing it still as THREAD_QUEUED, and ending the completion
function without having processed a pending TPE linked at pool->head:
worker thread | I/O thread
------------------------------------------------------------------------
| speculatively read req->state
req->state = THREAD_DONE; |
qemu_bh_schedule(p->completion_bh) |
bh->scheduled = 1; |
| qemu_bh_cancel(p->completion_bh)
| bh->scheduled = 0;
| if (req->state == THREAD_DONE)
| // sees THREAD_QUEUED
The source of the misunderstanding was that qemu_bh_cancel is now being
used by the _consumer_ rather than the producer, and therefore now needs
to have acquire semantics just like e.g. aio_bh_poll.
In some situations, if there are no other independent requests in the
same aio context that could eventually trigger the scheduling of the
completion function, the omitted TPE and all operations pending on it
will get stuck forever.
[Added Sergio's updated wording about the HW memory barrier.
--Stefan]
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-id: 20171108063447.2842-1-slp@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Test-crypto-hash calls qcrypto_hash_bytesv/digest/base64 with
errp=NULL, this will cause a NULL pointer dereference if afalg_driver
doesn't support requested algos:
ret = qcrypto_hash_afalg_driver.hash_bytesv(alg, iov, niov,
result, resultlen,
errp);
if (ret == 0) {
return ret;
}
error_free(*errp); // <--- here
Because the error message is thrown away immediately, we should
just pass NULL to hash_bytesv(). There is also the same problem in
afalg-backend cipher & hmac, let's fix them together.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The test-crypto-block currently fails if encryption has not been
compiled into QEMU:
TEST: tests/test-crypto-block... (pid=22231)
/crypto/block/qcow: OK
/crypto/block/luks/default:
Unexpected error in qcrypto_pbkdf2() at qemu/crypto/pbkdf-stub.c:41:
FAIL
GTester: last random seed: R02Sbbb5b6f299c6727f41bb50ba4aa6ef5c
(pid=22237)
/crypto/block/luks/aes-256-cbc-plain64:
Unexpected error in qcrypto_pbkdf2() at qemu/crypto/pbkdf-stub.c:41:
FAIL
GTester: last random seed: R02S3e27992a5ab4cc95e141c4ed3c7f0d2e
(pid=22239)
/crypto/block/luks/aes-256-cbc-essiv:
Unexpected error in qcrypto_pbkdf2() at qemu/crypto/pbkdf-stub.c:41:
FAIL
GTester: last random seed: R02S51b52bb02a66c42d8b331fd305384f53
(pid=22241)
FAIL: tests/test-crypto-block
So run the luks test only if the required encryption support is available.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, to enable a pci device in the guest, the user has to issue
echo 1 > /sys/bus/pci/slots/00000000/power. This is not what people
expect. On an LPAR, the user can put a PCI device in configured or
deconfigured state via IOCDS. The "start in deconfigured state" can be
used for "sharing" a pci function across LPARs. This is not what we are
going to use in KVM, so always start configured.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Message-Id: <20171107175455.73793-2-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
test_multi_co_schedule_entry() set to_schedule[id] in the final loop
iteration before terminating the coroutine. There is a race condition
where the main thread attempts to enter the terminating or terminated
coroutine when signalling coroutines to stop:
atomic_mb_set(&now_stopping, true);
for (i = 0; i < NUM_CONTEXTS; i++) {
ctx_run(i, finish_cb, NULL); <--- enters dead coroutine!
to_schedule[i] = NULL;
}
Make sure only to set to_schedule[id] if this coroutine really needs to
be scheduled!
Reported-by: "R.Nageswara Sastry" <nasastry@in.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20171106190233.1175-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In Makefiles the $ must be escaped as $$ in shell uses.
Since 8a2390a4f4:
$ make docker
[...]
NETWORK=1 Enable virtual network interface with default backend.
NETWORK=ACKEND Enable virtual network interface with ACKEND.
Once escaped:
$ make docker
[...]
NETWORK=1 Enable virtual network interface with default backend.
NETWORK=$BACKEND Enable virtual network interface with $BACKEND.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <20171108024719.8389-1-f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
When a base image locally defined by QEMU, such as in the debian images,
is updated, the dockerfile checksum mechanism in docker.py still skips
updating the derived image, because it only looks at the literal content
of the dockerfile, without considering changes to the base image.
For example we have a recent fix e58c1f9b35 that fixed
debian-win64-cross by updating its base image, debian8-mxe, but due to
above "feature" of docker.py the image in question is automatically NOT
rebuilt unless you add NOCACHE=1. It is noticed on Shippable:
https://app.shippable.com/github/qemu/qemu/runs/541/2/console
because after the fix is merged, the error still occurs, and the log
shows the container image is, as explained above, not updated.
This is because at the time docker.py was written, there wasn't any
dependencies between QEMU's docker images.
Now improve this to preprocess any "FROM qemu:*" directives in the
dockerfiles while doing checksum, and inline the base image's dockerfile
content, recursively. This ensures any changes on the depended _QEMU_
images are taken into account.
This means for external images that we expect to retrieve from docker
registries, we still do it as before. It is not perfect, because
registry images can get updated too. Technically we could substitute the
image name with its hex ID as obtained with $(docker images $IMAGE
--format="{{.Id}}"), but --format is not supported by RHEL 7, so leave
it for now.
Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20171103131229.4737-1-famz@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Actual number of interrupt pins isn't known
in ppce500_init_mpic() so a hardcoded number
was used, which causes a crash with older openpic.
Instead, return the DeviceState* and change ppce500_init()
to call qdev_get_gpio_in() to get only the irq pins
which are needed.
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "cannot_instantiate_with_device_add_yet" flag has been renamed
to "user_creatable" a while ago.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
While trying to make KVM PR usable again, commit 5dfaa532ae introduced a
regression: the current compat_pvr value is passed to KVM instead of the
new one. This means that we always pass 0 instead of the max-cpu-compat
PVR during the initial machine reset. And at CAS time, we either pass
the PVR from the command line or even don't call kvmppc_set_compat() at
all, ie, the PCR will not be set as expected.
For example if we start a big endian fedora26 guest in power7 compat
mode on a POWER8 host, we get this in the guest:
$ cat /proc/cpuinfo
processor : 0
cpu : POWER7 (architected), altivec supported
clock : 4024.000000MHz
revision : 2.0 (pvr 004d 0200)
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
MMU : Hash
but the guest can still execute POWER8 instructions, and the following
program succeeds:
int main()
{
asm("vncipher 0,0,0"); // ISA 2.07 instruction
}
Let's pass the new compat_pvr to kvmppc_set_compat() and the program fails
with SIGILL as expected.
Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If we iterate over the full port range without successfully binding+listening
on the socket, we'll try the next address, whereupon we overwrite the slisten
file descriptor variable without closing it.
Rather than having two places where we open + close socket FDs on different
iterations of nested for loops, re-arrange the code to always open+close
within the same loop iteration.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
target-arm queue:
* arm_gicv3_its: Don't abort on table save failure
* arm_gicv3_its: Fix the VM termination in vm_change_state_handler()
* translate.c: Fix usermode big-endian AArch32 LDREXD and STREXD
* hw/arm: Mark the "fsl,imx31/25/6" devices with user_creatable = false
* arm: implement cache/shareability attribute bits for PAR registers
# gpg: Signature made Tue 07 Nov 2017 13:33:58 GMT
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171107:
hw/intc/arm_gicv3_its: Don't abort on table save failure
hw/intc/arm_gicv3_its: Fix the VM termination in vm_change_state_handler()
translate.c: Fix usermode big-endian AArch32 LDREXD and STREXD
hw/arm: Mark the "fsl,imx31" device with user_creatable = false
hw/arm: Mark the "fsl,imx25" device with user_creatable = false
hw/arm: Mark the "fsl,imx6" device with user_creatable = false
arm: implement cache/shareability attribute bits for PAR registers
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ITS is not fully properly reset at the moment. Caches are
not emptied.
After a reset, in case we attempt to save the state before
the bound devices have registered their MSIs and after the
1st level table has been allocated by the ITS driver
(device BASER is valid), the first level entries are still
invalid. If the device cache is not empty (devices registered
before the reset), vgic_its_save_device_tables fails with -EINVAL.
This causes a QEMU abort().
Cc: qemu-stable@nongnu.org
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: wanghaibin <wanghaibin.wang@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The commit cddafd8f35 ("hw/intc/arm_gicv3_its: Implement state save
/restore") breaks the backward compatibility with the older kernels
where vITS save/restore support is not available. The vmstate function
vm_change_state_handler() should not be registered if the running kernel
doesn't support ITS save/restore feature. Otherwise VM instance will be
killed whenever vmstate callback function is invoked.
Observed a virtual machine shutdown with QEMU-2.10+linux-4.11 when testing
the reboot command "virsh reboot <domain> --mode acpi" instead of reboot.
KVM Error: 'KVM_SET_DEVICE_ATTR failed: Group 4 attr 0x00000000000001'
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1509712671-16299-1-git-send-email-shankerd@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For AArch32 LDREXD and STREXD, architecturally the 32-bit word at the
lowest address is always Rt and the one at addr+4 is Rt2, even if the
CPU is big-endian. Our implementation does these with a single
64-bit store, so if we're big-endian then we need to put the two
32-bit halves together in the opposite order to little-endian,
so that they end up in the right places. We were trying to do
this with the gen_aa32_frob64() function, but that is not correct
for the usermode emulator, because there there is a distinction
between "load a 64 bit value" (which does a BE 64-bit access
and doesn't need swapping) and "load two 32 bit values as one
64 bit access" (where we still need to do the swapping, like
system mode BE32).
Fixes: https://bugs.launchpad.net/qemu/+bug/1725267
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1509622400-13351-1-git-send-email-peter.maydell@linaro.org
QEMU currently crashes when the user tries to instantiate the fsl,imx31
device manually:
$ aarch64-softmmu/qemu-system-aarch64 -M kzm -device fsl,,imx31
**
ERROR:/home/thuth/devel/qemu/tcg/tcg.c:538:tcg_register_thread:
assertion failed: (n < max_cpus)
Aborted (core dumped)
The kzm board (which is the one that uses this CPU type) only supports
one CPU, and the realize function of the "fsl,imx31" device also uses
serial_hds[] directly, so this device clearly can not be instantiated
twice and thus we should mark it with user_creatable = false.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1509519537-6964-4-git-send-email-thuth@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU currently crashes when the user tries to instantiate the fsl,imx25
device manually:
$ aarch64-softmmu/qemu-system-aarch64 -S -M imx25-pdk -device fsl,,imx25
**
ERROR:/home/thuth/devel/qemu/tcg/tcg.c:538:tcg_register_thread:
assertion failed: (n < max_cpus)
The imx25-pdk board (which is the one that uses this CPU type) only
supports one CPU, and the realize function of the "fsl,imx25" device
also uses serial_hds[] directly, so this device clearly can not be
instantiated twice and thus we should mark it with user_creatable = 0.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1509519537-6964-3-git-send-email-thuth@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This device causes QEMU to abort if the user tries to instantiate it:
$ qemu-system-aarch64 -M sabrelite -smp 1,maxcpus=2 -device fsl,,imx6
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
qemu-system-aarch64: -device fsl,,imx6: Device 'serial0' is in use
Aborted (core dumped)
The device uses serial_hds[] directly in its realize function, so it
can not be instantiated again by the user.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1509519537-6964-2-git-send-email-thuth@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On a successful address translation instruction, PAR is supposed to
contain cacheability and shareability attributes determined by the
translation. We previously returned 0 for these bits (in line with the
general strategy of ignoring caches and memory attributes), but some
guest OSes may depend on them.
This patch collects the attribute bits in the page-table walk, and
updates PAR with the correct attributes for all LPAE translations.
Short descriptor formats still return 0 for these bits, as in the
prior implementation.
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 20171031223830.4608-1-Andrew.Baumann@microsoft.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
cocoa queue:
* make scrolling work in GUI monitor windows
* change ungrab to ctrl-alt-g (matching gtk)
* pass unused ctrl-alt combos to guest
# gpg: Signature made Tue 07 Nov 2017 10:15:00 GMT
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-cocoa-20171107:
ui/cocoa.m: Send ctrl-alt key combos to guest if QEMU isn't using them
ui/cocoa.m: move ungrab to ctrl-alt-g
ui/cocoa.m: Make scrolling work again in GUI monitor windows
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Merge build 2017/11/07 v1
# gpg: Signature made Tue 07 Nov 2017 10:14:49 GMT
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-build-2017-11-07-1:
build: remove use of MAKELEVEL optimization in submodule handling
build: delay check for empty git submodule list
build: don't fail if given a git submodule which does not exist
build: allow automatic git submodule updates to be disabled
build: don't create temporary files in source dir
build: allow setting a custom GIT binary for transparent proxying
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This fixes a bad errno returned to the guest and a trivial coding style nit.
# gpg: Signature made Mon 06 Nov 2017 18:09:24 GMT
# gpg: using RSA key 0x71D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Gregory Kurz <gregory.kurz@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3 4910 71D4 D5E5 822F 73D6
* remotes/gkurz/tags/for-upstream:
9pfs: fix v9fs_mark_fids_unreclaim() return value
9pfs: drop one user of struct V9fsFidState
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Send those ctrl-alt key combos that QEMU doesn't treat specially to
the guest rather than ignoring them.
All the case where we do special handling of ctrl-alt-X exit the
event handling using a "return" statement, so we can simply allow
the rest to fall through into the normal key handling by deleting
the now-spurious "else".
We take the opportunity to clean up some oddly-formatted and
now rather uninformative comments by removing them.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The return value of v9fs_mark_fids_unreclaim() is then propagated to
pdu_complete(). It should be a negative errno, not -1.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
MIPS patches 2017-11-06
Changes:
Update email addresses of Yongbok Kim, James Hogan and Paul Burton.
# gpg: Signature made Mon 06 Nov 2017 15:38:58 GMT
# gpg: using RSA key 0x2238EB86D5F797C2
# gpg: Good signature from "Yongbok Kim <yongbok.kim@mips.com>"
# gpg: aka "Yongbok Kim <yongbok.kim@imgtec.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 8600 4CF5 3415 A5D9 4CFA 2B5C 2238 EB86 D5F7 97C2
* remotes/yongbok/tags/mips-20171106:
MAINTAINERS: Update Paul Burton's email address
MAINTAINERS: Update James Hogan's email address
MAINTAINERS: Update Yongbok Kim's email address
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Makefile attempts to optimize the handling of submodules by using MAKELEVEL
to only check the submodule status when running from the top level make
invokation. This causes problems for people who are using a makefile of their
own to in turn invoke QEMU's makefile, as MAKELEVEL is already set to 1 (or
more) when QEMU's makefile runs.
This optimization should not really be needed, since the git-submodule.sh
script is already used to detect if a submodule update is required. This by
removing the MAKELEVEL check, we at most add an extra 'git-submodule.sh status'
call to each make level, the overhead of which is lost in noise of building
QEMU.
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We short circuit the git submodule update when passed an empty module list.
This accidentally causes the 'status' command to write to the status file. The
test needs to be delayed into the individual commands to avoid this premature
writing of the status file.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If going back in time in git history, across a commit that introduces a new
submodule, the 'git-submodule.sh' script will fail, causing rebuild to fail.
This is because config-host.mak contains a GIT_SUBMODULES variable that lists
a submodule that only exists in the later commit. config-host.mak won't get
repopulated until config.status is invoked, but make won't get this far due to
the submodule error.
This change makes 'git-submodule.sh' check whether each module is known to git
and drops any which are not present. A warning message will be printed when any
submodule is dropped in this manner.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some people building QEMU use VPATH builds where the source directory is on a
read-only volume. In such a case 'scripts/git-submodules.sh update' will always
fail and users are required to run it manually themselves on their original
writable source directory.
While this is already supported, it is nice to give users a command line flag
to configure to permanently disable automatic submodule updates, as it means
they won't get hard to diagnose failures from git-submodules.sh at an arbitrary
later date.
This patch thus introduces a flag '--disable-git-update' which will prevent
'make' from ever running 'scripts/git-submodules.sh update'. It will still run
the 'status' command to determine if a submodule update is needed, but when it
does this it'll simply stop and print a message instructing the developer what
todo. eg
$ ./configure --target-list=x86_64-softmmu --disable-git-update
...snip...
$ make
GEN config-host.h
GEN trace/generated-tcg-tracers.h
GEN trace/generated-helpers-wrappers.h
GEN trace/generated-helpers.h
GEN trace/generated-helpers.c
GEN module_block.h
GIT submodule checkout is out of date. Please run
scripts/git-submodule.sh update ui/keycodemapdb
from the source directory checkout /home/berrange/src/virt/qemu
make: *** [Makefile:31: git-submodule-update] Error 1
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are cases where users do VPATH builds with the source directory being on
a read-only volume. In such a case they have to manually run the command
'git-submodule.sh ...modules...' ahead of time. When checking for status we
should not then write into the source dir.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users can't run a bare 'git' command, due to need for a transparent
proxying solution such as 'tsocks'. This adds an argument to configure to
let users specify such a thing:
./configure --with-git="tsocks git"
The submodule script is also updated to give the user a hint about using this
flag, if we fail to checkout modules.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
GCC 4.9 and newer stopped warning for missing braces around the
"universal" C zero initializer {0}. One such initializer sneaked
into scsi/qemu-pr-helper.c and is breaking the build with such
older GCC versions.
Detect the lack of support for the idiom, and disable the warning
in that case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Legacy PCI device assignment has been removed from Linux in 4.12,
and had been deprecated 2 years ago there. We can remove it from
QEMU as well.
The ROM loading code was shared with Xen PCI passthrough, so move
it to hw/xen.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Queued tcg patches
# gpg: Signature made Fri 03 Nov 2017 08:37:58 GMT
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20171103:
cpu-exec: Exit exclusive region on longjmp from step_atomic
tcg/s390x: Use constant pool for prologue
tcg: Allow constant pool entries in the prologue
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit ac03ee5331 narrowed the scope of the exclusive
region so it only covers when we're executing the TB, not when
we're generating it. However it missed that there is more than
one execution path out of cpu_tb_exec -- if the atomic insn
causes an exception then the code will longjmp out, skipping
the code to end the exclusive region. This causes QEMU to hang
the next time the CPU calls start_exclusive(), waiting for
itself to exit the region.
Move the "end the region" code out to the end of the
function so that it is run for both normal exit and also
for exit-via-longjmp. We have to use a volatile bool flag
to decide whether we need to end the region, because we
can longjump out of the codegen as well as the execution.
(For some reason this only reproduces for me with a clang
optimized build, not a gcc debug build.)
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: ac03ee5331
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1509640536-32160-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rather than have separate code only used for guest_base,
rely on a recent change to handle constant pool entries.
Cc: qemu-s390x@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Both ARMv6 and AArch64 currently may drop complex guest_base values
into the constant pool. But generic code wasn't expecting that, and
the pool is not emitted. Correct that.
Tested-by: Emilio G. Cota <cota@braap.org>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since 927128222b QEMU depends of keycodemapdb, which uses the python 'csv'
module from stdlib to parse keymaps.csv.
Without this package the build fails:
GEN ui/input-keymap-linux-to-qcode.c
Traceback (most recent call last):
File "ui/keycodemapdb/tools/keymap-gen", line 15, in <module>
import csv
ImportError: No module named csv
GEN ui/input-keymap-qcode-to-qnum.c
Traceback (most recent call last):
File "ui/keycodemapdb/tools/keymap-gen", line 15, in <module>
import csv
ImportError: No module named csv
[...]
CC ui/input-keymap.o
ui/input-keymap.c:8:44: fatal error: ui/input-keymap-linux-to-qcode.c: No such file or directory
make: *** [ui/input-keymap.o] Error 1
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
qemu-sparc update
# gpg: Signature made Tue 31 Oct 2017 17:43:11 GMT
# gpg: using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed:
sun4m: change TYPE_SUN4M_IOMMU macro from "iommu" to "sun4m-iommu"
sun4m_iommu: remove legacy sparc_iommu_memory_rw() function
sparc32_dma: switch over to using IOMMU memory region and DMA API
sun4m: implement IOMMU translation using IOMMU memory region
sparc32_dma: add len to esp/le DMA memory tracing
sparc32_dma: remove is_ledma hack and replace with memory region alias
sparc32_dma: introduce new SPARC32_DMA type container object
sparc32_dma: make lance device child of ledma device
lance: move TYPE_LANCE and SysBusPCNetState from lance.c to lance.h
sparc32_dma: make esp device child of espdma device
esp: move TYPE_ESP and SysBusESPState from esp.c to esp.h
sparc32_dma: use object link instead of qdev property to pass IOMMU reference
sun4m_iommu: move TYPE_SUN4M_IOMMU declaration to sun4m.h
sun4m: move DMA device wiring from sparc32_dma_init() to sun4m_hw_init()
sparc32_dma: move type declarations from sparc32_dma.c to sparc32_dma.h
sparc32_dma: split esp and le into separate DMA devices
sparc32_dma: rename SPARC32_DMA type to SPARC32_DMA_DEVICE
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The allwinner code is only needed for the allwinner board (for which
we also have a separate CONFIG_ALLWINNER_A10 config switch), so it
does not make sense that we compile this for all the other boards
that need AHCI, too. Let's move it to a separate file that is only
compiled when CONFIG_ALLWINNER_A10 is set.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1508784509-29377-1-git-send-email-thuth@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This is a legacy artifact from when the sun4m IOMMU implementation was
the only IOMMU available within QEMU.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This hack originated from before the memory region API was introduced, and
increased the size of the ledma DMA device to capture incorrect accesses
beyond the end of the ledma device. A full analysis can be found on Artyom's
blog at http://tyom.blogspot.co.uk/2010/10/bug-in-all-solaris-versions-after-57.html.
With the memory API we can now simply alias the incorrect access onto its
intended destination allowing us to remove the hack.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Create a new SPARC32_DMA container object (including an appropriate container
memory region) and add instances of the SPARC32_ESPDMA_DEVICE and
SPARC32_LEDMA_DEVICE as child objects. The benefit is that most of the gpio
wiring complexity between esp/espdma and lance/ledma is now hidden within the
SPARC32_DMA realize function.
Since the sun4m IOMMU is already QOMified we can find a reference to
it using object_resolve_path_type() allowing us to completely remove all external
references to the iommu pointer.
Finally we rework sun4m's sparc32_dma_init() to invoke the new SPARC32_DMA object
and wire up the remaining board memory regions/IRQs.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This enables them to be used outside of lance.c. We also update the comment to
refer to the SPARC32 lance device rather than the AMD PCNet-II device (of which
lance is a register-compatible subset).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
CC: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This makes it possible to reference the esp device from the espdma device as
required, and by wiring up the device ourselves in sun4m.c we can drop use
of the esp_init() function.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This enables us to remove the last remaining (opaque) qdev property. Whilst we
are here, also update iommu_init() to use TYPE_SUN4M_IOMMU instead of a
hardcoded string.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
By using the sysbus interface it is possible to wire up the esp/le devices
to the sun4m DMA controller directly during sun4m_hw_init() instead of
passing qemu_irqs into the sparc32_dma_init() function.
This is an intermediate step to allow further reorganisation as more logic
is moved into the relevant SPARC32 DMA devices; there will be a final
refactoring of sparc32_dma_init() once this work is complete.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Due to slight differences in behaviour accessing the registers for the
esp and le devices, create two separate SPARC32_DMA_DEVICE types and
update the sun4m machine to use.
Note that by using different device types we already know the size of
the register block and the value of is_ledma at init time, allowing us to
drop the SPARC32_DMA_DEVICE realize function and the is_ledma device
property.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Also update the function names to match as appropriate. While we're
here rename the type from sparc32_dma to sparc32-dma in order to
match the current QOM convention.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
target-arm queue:
* fix instruction-length bit in syndrome for WFI/WFE traps
* xlnx-zcu102: Specify the max number of CPUs
* msf2: Remove dead code reported by Coverity
* msf2: Wire up SYSRESETREQ in SoC for system reset
* hw/pci-host/gpex: Improve INTX to gsi routing error checking
# gpg: Signature made Tue 31 Oct 2017 13:10:02 GMT
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171031:
hw/pci-host/gpex: Improve INTX to gsi routing error checking
msf2: Wire up SYSRESETREQ in SoC for system reset
msf2: Remove dead code reported by Coverity
xlnx-zcu102: Specify the max number of CPUs
fix WFI/WFE length in syndrome register
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We exposed gpex_set_irq_num() for machines to set the INTx to
GSI routing. However if the machine forgets to call that
function we currently do not check the association was properly
done. Let's initialize gsi values to -1 and if this value is
found in gpex_route_intx_pin_to_irq, set the routing mode as
disabled.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1508776211-22175-1-git-send-email-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
WFI/E are often, but not always, 4 bytes long. When they are, we need to
set ARM_EL_IL_SHIFT in the syndrome register.
Pass the instruction length to HELPER(wfi), use it to decrement pc
appropriately and to pass an is_16bit flag to syn_wfx, which sets
ARM_EL_IL_SHIFT if needed.
Set dc->insn in both arm_tr_translate_insn and thumb_tr_translate_insn.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Message-id: alpine.DEB.2.10.1710241055160.574@sstabellini-ThinkPad-X260
[PMM: move setting of dc->insn for Thumb so it is correct for 32 bit insns]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Minimal implementation: for structured error only error_report error
message.
Note that test 83 is now more verbose, because the implementation
prints more warnings about unexpected communication errors; perhaps
future patches should tone things down by using trace messages
instead of traces, but the common case of successful communication
is no noisier than before.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-13-eblake@redhat.com>
An upcoming change to block/nbd-client.c will want to read the
tail of a structured reply chunk directly from the wire. Move
this function to make it easier.
Based on a patch from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-12-eblake@redhat.com>
In following patch nbd_receive_reply will be used both for simple
and structured reply header receiving.
NBDReply is altered into union of simple reply header and structured
reply chunk header, simple error translation moved to block/nbd-client
to be consistent with further structured reply error translation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-11-eblake@redhat.com>
The NBD spec permits including a human-readable error string if
structured replies are in force, so we might as well send the
client the message that we logged on any error.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-9-eblake@redhat.com>
Minimal implementation of structured read: one structured reply chunk,
no segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-8-eblake@redhat.com>
Consolidate the response for a non-zero-length option payload
into a new function, nbd_reject_length(). This check will
also be used when introducing support for structured replies.
Note that STARTTLS response differs based on time: if the connection
is still unencrypted, we set fatal to true (a client that can't
request TLS correctly may still think that we are ready to start
the TLS handshake, so we must disconnect); while if the connection
is already encrypted, the client is sending a bogus request but
is no longer at risk of being confused by continuing the connection.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-7-eblake@redhat.com>
[eblake: correct return value on STARTTLS]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Instead of making each caller check whether a transmission error
occurred, we can sink a common error check to the end of the loop.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-6-eblake@redhat.com>
[eblake: squash in compiler warning fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When the server is read-only, we were already reporting an error
message for NBD_CMD_WRITE_ZEROES, but failed to set errp for a
similar NBD_CMD_WRITE. This will matter more once structured
replies allow the server to propagate the errp information back
to the client. While at it, use an error message that makes a
bit more sense if viewed on the client side.
Note that when using qemu-io to test qemu-nbd behavior, it is
rather difficult to convince qemu-io to send protocol violations
(such as a read beyond bounds), because we have a lot of active
checking on the client side that a qemu-io request makes sense
before it ever goes over the wire to the server. The case of a
client attempting a write when the server is started as
'qemu-nbd -r' is one of the few places where we can easily test
error path handling, without having to resort to hacking in known
temporary bugs to either the server or client. [Maybe we want a
future patch to the client to do up-front checking on writes to a
read-only export, the way it does up-front bounds checking; but I
don't see anything in the NBD spec that points to a protocol
violation in our current behavior.]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-5-eblake@redhat.com>
Upcoming patches will implement the NBD structured reply
extension [1] for both client and server roles. Declare the
constants, structs, and lookup routines that will be valuable
whether the server or client code is backported in isolation.
This includes moving one constant from an internal header to
the public header, as part of the structured read processing
will be done in block/nbd-client.c rather than nbd/client.c.
[1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md
Based on patches from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-4-eblake@redhat.com>
This is needed in preparation for structured reply handling,
as we will be performing the translation from NBD error to
system errno value higher in the stack at block/nbd-client.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-3-eblake@redhat.com>
NBD errors were originally sent over the wire based on Linux errno
values; but not all the world is Linux, and not all platforms share
the same values. Since a number isn't very easy to decipher on all
platforms, update the trace messages to include the name of NBD
errors being sent/received over the wire. Tweak the trace messages
to be at the point where we are using the NBD error, not the
translation to the host errno values.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-2-eblake@redhat.com>
If a CPU selected with the "cpu" command is hot-unplugged then "info cpus"
causes QEMU to exit:
(qemu) device_del cpu1
(qemu) info cpus
qemu:qemu_cpu_kick_thread: No such process
This happens because "cpu" stores the pointer to the selected CPU into
the monitor structure. When the CPU is hot-unplugged, we end up with a
dangling pointer. The "info cpus" command then does:
hmp_info_cpus()
monitor_get_cpu_index()
mon_get_cpu()
cpu_synchronize_state() <--- called with dangling pointer
This could cause a QEMU crash as well.
This patch switches the monitor to store the QOM path instead of a
pointer to the current CPU. The path is then resolved when needed.
If the resolution fails, we assume that the CPU was removed and the
path is resetted to the default (ie, path of first_cpu).
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150822818243.26242.12993827911736928961.stgit@bahia.lan>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
s390x: fixups for 2.11
- missing \r in the BIOS console output
- CPU type name is now "s390x-cpu"
- fixup for the host-model on z14 and older machine versions
# gpg: Signature made Mon 30 Oct 2017 08:34:15 GMT
# gpg: using RSA key 0x117BBC80B5A61C7C
# gpg: Good signature from "Christian Borntraeger (IBM) <borntraeger@de.ibm.com>"
# Primary key fingerprint: F922 9381 A334 08F9 DBAB FBCA 117B BC80 B5A6 1C7C
* remotes/borntraeger/tags/s390x-20171030:
s390-*.img: update s390 bios with latest fixes
s390-ccw: print carriage return with new lines
s390x/kvm: use cpu model for gscb on compat machines
target/s390x: change CPU type name to "s390x-cpu"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
migration/next for 20171029
# gpg: Signature made Sun 29 Oct 2017 13:07:43 GMT
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg: aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/juanquintela/tags/migration/20171029:
tests: check that migration parameters are really assigned
tests: Don't abuse global_qtest
tests: Factorize out migrate_test_start/end
tests: Refactor setting of parameters/capabilities
tests: rename postcopy-test to migration-test
migration: Make xbzrle_cache_size a migration parameter
migration: No need to return the size of the cache
migration: Don't play games with the requested cache size
migration: Make sure that we pass the right cache size
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
x86/cpu/numa queue, 2017-10-27
# gpg: Signature made Fri 27 Oct 2017 15:17:12 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-and-machine-pull-request: (39 commits)
x86: Skip check apic_id_limit for Xen
numa: fixup parsed NumaNodeOptions earlier
mips: r4k: replace cpu_model with cpu_type
mips: mipssim: replace cpu_model with cpu_type
mips: Magnum/Acer Pica 61: replace cpu_model with cpu_type
mips: fulong2e: replace cpu_model with cpu_type
mips: malta/boston: replace cpu_model with cpu_type
mips: use object_new() instead of gnew()+object_initialize()
sparc: leon3: use generic cpu_model parsing
sparc: sparc: use generic cpu_model parsing
sparc: sun4u/sun4v/niagara: use generic cpu_model parsing
sparc: cleanup cpu type name composition
tricore: use generic cpu_model parsing
tricore: cleanup cpu type name composition
unicore32: use generic cpu_model parsing
unicore32: cleanup cpu type name composition
xtensa: lx60/lx200/ml605/kc705: use generic cpu_model parsing
xtensa: sim: use generic cpu_model parsing
xtensa: cleanup cpu type name composition
sh4: remove SuperHCPUClass::name field
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
includes
7618c0aefe ("s390-ccw: print carriage return with new lines")
a8fbbf1db7 ("s390: set DHCP client architecure id for netboot")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The sclp console in the s390 bios writes raw data,
leading console emulators (such as virsh console) to
treat a new line ('\n') as just a new line instead
of as a Unix line feed. Because of this, output
appears in a "stair case" pattern.
Let's print \r\n on every occurrence of a new line
in the string passed to write to amend this issue.
This is in sync with the guest Linux code in
drivers/s390/char/sclp_vt220.c which also does a line feed
conversion in the console part of the driver.
This fixes the s390-ccw and s390-netboot output like
$ virsh start test --console
Domain test started
Connected to domain test
Escape character is ^]
Network boot starting...
Using MAC address: 02:01:02:03:04:05
Requesting information via DHCP: 010
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Message-Id: <1509120893-28054-1-git-send-email-walling@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Starting a guest with
<os>
<type arch='s390x' machine='s390-ccw-virtio-2.9'>hvm</type>
</os>
<cpu mode='host-model'/>
on an IBM z14 results in
"qemu-system-s390x: Some features requested in the CPU model are not
available in the configuration: gs"
This is because guarded storage is fenced for compat machines that did
not have guarded storage support. While this prevents future migration
abort (by not starting the guest at all), not being able to start a
"host-model" guest is very much unexpected. As it turns out, even if we
would modify libvirt to not expand the cpu model to contain "gs" for
compat machines, it cannot guarantee that a migration will succeed. For
example if the kernel changes its features (or the user has nested=1 on
one host but not on the other) the migration will fail nevertheless. So
instead of fencing "gs" for machines <= 2.9 lets allow it for all
machine types that support the CPU model. This will make "host-model"
runnable all the time, while relying on the CPU model to reject invalid
migration attempts. We also need to change the migration for guarded
storage.
Additional discussions about host-model are still pending but are out
of scope of this patch.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
For now, e.g. host-s390-cpu wasn't exposed to the user. cpu-add, -cpu
and the CPU model qmp interfaces didn't care about the actual type,
as that information was hidden.
This changed with CPU hotplug via device_add. Now the type is visible to
the user. Before we get that supported in a stable version, this is our
last chance to change it.
So change it from "s390-cpu" to "s390x-cpu", to match the architecture
name. Example names are then e.g. z14-s390x-cpu or qemu-s390x-cpu.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171020115803.14093-1-david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
As we have two guests running, just pass always who we want to send a
message to. Once there, refactor return_or_event() into wait_command.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Instead of repeating the code, we are going to bo more tests on this file
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Right now it is a variable in MigrationState instead of a
MigrationParameter. The change allows to set it as the rest of the
Migration parameters, from the command line, with
query_migration_paramters, set_migrate_parameters, etc.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
After the previous commits, we make sure that the value passed is
right, or we just drop an error. So now we return if there is one
error or we have setup correctly the value passed.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
Improve error messasge
Return 0 always for success
Now that we check that the value passed is a power of 2, we don't need
to play games when comparing what is the size that is going to take
the cache.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Instead of passing silently round down the number of pages, make it an
error that the cache size is not a power of 2.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
numa 'mem' option with suffix or without one is possible
only on CLI/HMP. Instead of fixing up special suffix less
CLI case deep in parse_numa_node() do it earlier right
after option is parsed into NumaNodeOptions with OptVisistor
so that the rest of the code would use valid values in
NumaNodeOptions and won't have to reparse QemuOpts.
It will help to isolate CLI/HMP parts in parse_numa() and
split out parsed NumaNodeOptions processing into separate
function that could be reused by QMP handler where we have
only NumaNodeOptions and don't need any fixups.
While at it reuse qemu_strtosz_MiB() instead of manually
checking for suffixes.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507801198-98182-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
object_initialize() is intended for inplace initialization of
objects, but here it's first allocated with g_new0() and then
initialized with object_initialize(). QEMU already has API
to do this (object_new), so do object creation with suitable
for usecase API.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-36-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
introduce TRICORE_CPU_TYPE_NAME macro and use it to construct
cpu type names. While at it move cpu type_infos into one
array and register it directly with type_init_from_array()
instead of custom tricore_cpu_register_types()/cpu_register()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-30-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
the field contains upper-cased cpu model name and is used
for printing supported cpu model names for '-cpu help'.
Considering that cpu model lookup in superh_cpu_class_by_name()
is case-insensitive, we can drop upper-casing when
printing supported cpus list and use cpu type directly
to do the same by cutting out SUPERH_CPU_TYPE_SUFFIX from
typename.
It allows to remove SuperHCPUClass::name, which practically
duplicates names defined by TYPE_SH*_CPU definitions and
simplify sh*_class_init()/SuperHCPUClass a bit.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-24-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
currently for sh4 cpu_model argument for '-cpu' option
could be either 'cpu model' name or cpu_typename.
however typically '-cpu' takes 'cpu model' name and
cpu type for sh4 target isn't advertised publicly
('-cpu help' prints only 'cpu model' names) so we
shouldn't care about this use case (it's more of a bug).
1. Drop '-cpu cpu_typename' to align with the rest of
targets.
2. Compose searched for typename from cpu model and use
it with object_class_by_name() directly instead of
over-complicated
object_class_get_list()
g_slist_find_custom() + superh_cpu_name_compare()
With #1 droped, #2 could be used for both lookups which
simplifies superh_cpu_class_by_name() quite a bit.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-23-git-send-email-imammedo@redhat.com>
[ehabkost: Include fixup sent by Igor]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
introduce SUPERH_CPU_TYPE_NAME macro and use it to construct
cpu type names. While at it move cpu type_infos into one
array and register it directly with type_init_from_array()
instead of custom superh_cpu_register_types()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-22-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
It 'works' with default CPU only because of bug in
moxie_cpu_class_by_name() where it treats cpu_model
as type name and default cpu_model also happens to be
type name. But specifying explicitly cpu on CLI,
ex: '-cpu MoxieLite', makes QEMU fail since
moxie_cpu_class_by_name() doesn't traslate cpu_model
to cpu type and fails to find corresponding object class.
Fix moxie_cpu_class_by_name() to do proper
cpu_model -> cpu type
translation and fix default cpu_model to be cpu_model
instead of being typename.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-15-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Introduce ALPHA_CPU_TYPE_NAME macro to replace rather ununique
TYPE macro that alpha uses. With new macro it will follow
the same naming convention as other targets.
While at it put scattered TypeInfo into one array which places
type desriptions at one place and reduces code a bit.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-5-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Xen 2017/10/26
# gpg: Signature made Thu 26 Oct 2017 23:57:16 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg: aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20171026-tag:
xen: Log errno rather than return value
xen: dont try setting max grants multiple times
xen: add a global indicator for grant copy being available
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block layer patches
# gpg: Signature made Thu 26 Oct 2017 14:02:54 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (35 commits)
iotests: Add cluster_size=64k to 125
qcow2: Always execute preallocate() in a coroutine
qcow2: Fix unaligned preallocated truncation
qcow2: Emit errp when truncating the image tail
iotests: Filter actual image size in 184 and 191
iotests: Pull _filter_actual_image_size from 67/87
iotests: Add test for dataplane mirroring
qcow2: Use BDRV_SECTOR_BITS instead of its literal value
qemu-img.1: Image invalidation on qemu-img commit
qemu-io: Relax 'alloc' now that block-status doesn't assert
qcow2: Reduce is_zero() rounding
block: Reduce bdrv_aligned_preadv() rounding
block: Align block status requests
qemu-img: Change img_compare() to be byte-based
qemu-img: Change img_rebase() to be byte-based
qemu-img: Change compare_sectors() to be byte-based
qemu-img: Change check_empty_sectors() to byte-based
qemu-img: Drop redundant error message in compare
qemu-img: Add find_nonzero()
qemu-img: Speed up compare on pre-allocated larger file
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In one case we misconstrue a BOOL return as an HRESULT, and in the
other case we don't check the BOOL return from LookupAccountSidW()
before extracting the HRESULT from GetLastError(). Both can lead to
getNameByStringSID() misreporting an error.
Reported-by: Chen Hanxiao <chenhanxiao@gmail.com>
Suggested-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
xen_modified_memory() sets errno to communicate what went wrong so log
this rather than the return value which is not interesting.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Trying to call xengnttab_set_max_grants() with the same file handle
might fail on some kernels, as this operation is allowed only once.
This is a problem for the qdisk backend as blk_connect() can be
called multiple times for a domain, e.g. in case grub-xen is being
used to boot it.
So instead of letting the generic backend code open the gnttab device
do it in blk_connect() and close it again in blk_disconnect.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
The Xen qdisk backend needs to test whether grant copy operations is
available in the kernel. Unfortunately this collides with using
xengnttab_set_max_grants() on some kernels as this operation has to
be the first one after opening the gnttab device.
In order to solve this problem test for the availability of grant copy
in xen_be_init() opening the gnttab device just for that purpose and
closing it again afterwards. Advertise the availability via a global
flag and use that flag in the qdisk backend.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Block patches
# gpg: Signature made Thu Oct 26 15:01:20 2017 CEST
# gpg: using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-10-26:
iotests: Add cluster_size=64k to 125
qcow2: Always execute preallocate() in a coroutine
qcow2: Fix unaligned preallocated truncation
qcow2: Emit errp when truncating the image tail
iotests: Filter actual image size in 184 and 191
iotests: Pull _filter_actual_image_size from 67/87
iotests: Add test for dataplane mirroring
qcow2: Use BDRV_SECTOR_BITS instead of its literal value
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tests 067 and 087 filter the actual image size because it depends on the
host filesystem (and is not part of the respective test). Since this is
generally true, we should have a common filter function for this, so
let's pull out the sed line from both tests into such a function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009163456.485-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
BDRV_SECTOR_BITS is defined to be 9 in block.h (and BDRV_SECTOR_SIZE
is calculated from that), but there are still a couple of places where
we are using the literal value instead of the macro.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20171009153856.20387-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
qemu-img commit invalidates all images between base and top. This
should be mentioned in the man page.
Suggested-by: Ping Li <pingl@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously, the alloc command required that input parameters be
sector-aligned and clamped to 32 bits, because the underlying
bdrv_is_allocated used a 32-bit parameter and asserted aligned
inputs. But now that we have fixed block status to report a
64-bit bytes value, and to properly round requests on behalf of
guests, we can pass any values, and can use qemu-io to add
coverage that our rounding is correct regardless of the guest
alignment constraints.
Update iotest 177 to intentionally probe block status at
unaligned boundaries as well as with a bytes value that does not
map to 32-bit sectors, which also required tweaking the image
prep to leave an unallocated portion to the image under test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that bdrv_is_allocated accepts non-aligned inputs, we can
remove the TODO added in earlier refactoring.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that bdrv_is_allocated accepts non-aligned inputs, we can
remove the TODO added in commit d6a644bb.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Any device that has request_alignment greater than 512 should be
unable to report status at a finer granularity; it may also be
simpler for such devices to be guaranteed that the block layer
has rounded things out to the granularity boundary (the way the
block layer already rounds all other I/O out). Besides, getting
the code correct for super-sector alignment also benefits us
for the fact that our public interface now has byte granularity,
even though none of our drivers have byte-level callbacks.
Add an assertion in blkdebug that proves that the block layer
never requests status of unaligned sections, similar to what it
does on other requests (while still keeping the generic helper
in place for when future patches add a throttle driver). Note
that iotest 177 already covers this (it would fail if you use
just the blkdebug.c hunk without the io.c changes). Meanwhile,
we can drop assertions in callers that no longer have to pass
in sector-aligned addresses.
There is a mid-function scope added for 'count' and 'longret',
for a couple of reasons: first, an upcoming patch will add an
'if' statement that checks whether a driver has an old- or
new-style callback, and can conveniently use the same scope for
less indentation churn at that time. Second, since we are
trying to get rid of sector-based computations, wrapping things
in a scope makes it easier to group and see what will be
deleted in a final cleanup patch once all drivers have been
converted to the new-style callback.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the continuing quest to make more things byte-based, change
the internal iteration of img_compare(). We can finally drop the
TODO assertions added earlier, now that the entire algorithm is
byte-based and no longer has to shift from bytes to sectors.
Most of the change is mechanical ('total_sectors' becomes
'total_size', 'sector_num' becomes 'offset', 'nb_sectors' becomes
'chunk', 'progress_base' goes from sectors to bytes); some of it
is also a cleanup (sectors_to_bytes() is now unused, loss of
variable 'count' added earlier in commit 51b0a488).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the continuing quest to make more things byte-based, change
the internal iteration of img_rebase(). We can finally drop the
TODO assertion added earlier, now that the entire algorithm is
byte-based and no longer has to shift from bytes to sectors.
Most of the change is mechanical ('num_sectors' becomes 'size',
'sector' becomes 'offset', 'n' goes from sectors to bytes); some
of it is also a cleanup (use of MIN() instead of open-coding,
loss of variable 'count' added earlier in commit d6a644bb).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the continuing quest to make more things byte-based, change
compare_sectors(), renaming it to compare_buffers() in the
process. Note that one caller (qemu-img compare) only cares
about the first difference, while the other (qemu-img rebase)
cares about how many consecutive sectors have the same
equal/different status; however, this patch does not bother to
micro-optimize the compare case to avoid the comparisons of
sectors beyond the first mismatch. Both callers are always
passing valid buffers in, so the initial check for buffer size
can be turned into an assertion.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Continue on the quest to make more things byte-based instead of
sector-based.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If a read error is encountered during 'qemu-img compare', we
were printing the "Error while reading offset ..." message twice;
this was because our helper function was awkward, printing output
on some but not all paths. Fix it to consistently report errors
on all paths, so that the callers do not risk a redundant message,
and update the testsuite for the improved output.
Further simplify the code by hoisting the conversion from an error
message to an exit code into the helper function, rather than
repeating that logic at all callers (yes, the helper function is
now less generic, but it's a net win in lines of code).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
During 'qemu-img compare', when we are checking that an allocated
portion of one file is all zeros, we don't need to waste time
computing how many additional sectors after the first non-zero
byte are also non-zero. Create a new helper find_nonzero() to do
the check for a first non-zero sector, and rebase
check_empty_sectors() to use it.
The new interface intentionally uses bytes in its interface, even
though it still crawls the buffer a sector at a time; it is robust
to a partial sector at the end of the buffer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Compare the following images with all-zero contents:
$ truncate --size 1M A
$ qemu-img create -f qcow2 -o preallocation=off B 1G
$ qemu-img create -f qcow2 -o preallocation=metadata C 1G
On my machine, the difference is noticeable for pre-patch speeds,
with more than an order of magnitude in difference caused by the
choice of preallocation in the qcow2 file:
$ time ./qemu-img compare -f raw -F qcow2 A B
Warning: Image size mismatch!
Images are identical.
real 0m0.014s
user 0m0.007s
sys 0m0.007s
$ time ./qemu-img compare -f raw -F qcow2 A C
Warning: Image size mismatch!
Images are identical.
real 0m0.341s
user 0m0.144s
sys 0m0.188s
Why? Because bdrv_is_allocated() returns false for image B but
true for image C, throwing away the fact that both images know
via lseek(SEEK_HOLE) that the entire image still reads as zero.
From there, qemu-img ends up calling bdrv_pread() for every byte
of the tail, instead of quickly looking for the next allocation.
The solution: use block_status instead of is_allocated, giving:
$ time ./qemu-img compare -f raw -F qcow2 A C
Warning: Image size mismatch!
Images are identical.
real 0m0.014s
user 0m0.011s
sys 0m0.003s
which is on par with the speeds for no pre-allocation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As long as we are querying the status for a chunk smaller than
the known image size, we are guaranteed that a successful return
will have set pnum to a non-zero size (pnum is zero only for
queries beyond the end of the file). Use that to slightly
simplify the calculation of the current chunk size being compared.
Likewise, we don't have to shrink the amount of data operated on
until we know we have to read the file, and therefore have to fit
in the bounds of our buffer. Also, note that 'total_sectors_over'
is equivalent to 'progress_base'.
With these changes in place, sectors_to_process() is now dead code,
and can be removed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually moving away from sector-based interfaces, towards
byte-based. In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.
Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated. Likewise, since it a byte interface allows
an offset mapping that might not be sector aligned, split the mapping
out of the return value and into a pass-by-reference parameter. For
now, the io.c layer still assert()s that all uses are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status in the drivers.
For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status(), plus
updates for the new split return interface. But some code,
particularly bdrv_block_status(), gets a lot simpler because it no
longer has to mess with sectors. Likewise, mirror code no longer
computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop
an assertion about alignment because the loop no longer depends on
alignment (never mind that we don't really have a driver that
reports sub-sector alignments, so it's not really possible to test
the effect of sub-sector mirroring). Fix a neighboring assertion to
use is_power_of_2 while there.
For ease of review, bdrv_get_block_status() was tackled separately.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change); and as with its public counterpart,
rename to bdrv_co_block_status() and split the offset return, to
make the compiler enforce that we catch all uses. For now, we
assert that callers and the return value still use aligned data,
but ultimately, this will be the function where we hand off to a
byte-based driver callback, and will eventually need to add logic
to ensure we round calls according to the driver's
request_alignment then touch up the result handed back to the
caller, to start permitting a caller to pass unaligned offsets.
Note that we are now prepared to accepts 'bytes' larger than INT_MAX;
this is okay as long as we clamp things internally before violating
any 32-bit limits, and makes no difference to how a client will
use the information (clients looping over the entire file must
already be prepared for consecutive calls to return the same status,
as drivers are already free to return shorter-than-maximal status
due to any other convenient split points, such as when the L2 table
crosses cluster boundaries in qcow2).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually moving away from sector-based interfaces, towards
byte-based. In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.
Changing the name of the function from bdrv_get_block_status() to
bdrv_block_status() ensures that the compiler enforces that all
callers are updated. For now, the io.c layer still assert()s that
all callers are sector-aligned, but that can be relaxed when a later
patch implements byte-based block status in the drivers.
There was an inherent limitation in returning the offset via the
return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which
means an offset can only be mapped for sector-aligned queries (or,
if we declare that non-aligned input is at the same relative position
modulo 512 of the answer), so the new interface also changes things to
return the offset via output through a parameter by reference rather
than mashed into the return value. We'll have some glue code that
munges between the two styles until we finish converting all uses.
For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status(), coupled
with the tweak in calling convention. But some code, particularly
bdrv_is_allocated(), gets a lot simpler because it no longer has to
mess with sectors.
For ease of review, bdrv_get_block_status_above() will be tackled
separately.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Continue by converting
an internal function (no semantic change), and simplifying its
caller accordingly.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the internal
loop iteration of zeroing a device to track by bytes instead of
sectors (although we are still guaranteed that we iterate by steps
that are sector-aligned).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change), and rename it to is_zero() in the
process.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables). Update the signature of
bdrv_round_to_clusters() to uniformly use int64_t, matching
the signature already chosen for bdrv_is_allocated and the
fact that off_t is also a signed type, then adjust clients
according to the required fallout (even where the result could
now exceed 32 bits, no client is directly assigning the result
into a 32-bit value without breaking things into a loop first).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Not all callers care about which BDS owns the mapping for a given
range of the file, or where the zeroes lie within that mapping. In
particular, bdrv_is_allocated() cares more about finding the
largest run of allocated data from the guest perspective, whether
or not that data is consecutive from the host perspective, and
whether or not the data reads as zero. Therefore, doing subsequent
refinements such as checking how much of the format-layer
allocation also satisfies BDRV_BLOCK_ZERO at the protocol layer is
wasted work - in the best case, it just costs extra CPU cycles
during a single bdrv_is_allocated(), but in the worst case, it
results in a smaller *pnum, and forces callers to iterate through
more status probes when visiting the entire file for even more
extra CPU cycles.
This patch only optimizes the block layer (no behavior change when
want_zero is true, but skip unnecessary effort when it is false).
Then when subsequent patches tweak the driver callback to be
byte-based, we can also pass this hint through to the driver.
Tweak BdrvCoGetBlockStatusData to declare arguments in parameter
order, rather than mixing things up (minimizing padding is not
necessary here).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Not all callers care about which BDS owns the mapping for a given
range of the file. This patch merely simplifies the callers by
consolidating the logic in the common call point, while guaranteeing
a non-NULL file to all the driver callbacks, for no semantic change.
The only caller that does not care about pnum is bdrv_is_allocated,
as invoked by vvfat; we can likewise add assertions that the rest
of the stack does not have to worry about a NULL pnum.
Furthermore, this will also set the stage for a future cleanup: when
a caller does not care about which BDS owns an offset, it would be
nice to allow the driver to optimize things to not have to return
BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented
allocation (for example, it's fairly easy to create a qcow2 image
where consecutive guest addresses are not at consecutive host
addresses), the current contract requires bdrv_get_block_status()
to clamp *pnum to the limit where host addresses are no longer
consecutive, but allowing a NULL file means that *pnum could be
set to the full length of known-allocated data.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This changes test case 191 to include a backing image that has
backing_fmt set in the image file, but is referenced by node name in the
qemu command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When referring to a backing file of an image via node name
bdrv_open_backing_file would add the 'driver' option to the option list
filling it with the backing format driver. This breaks construction of
the backing chain via -blockdev, as bdrv_open_inherit reports an error
if both 'reference' and 'options' are provided.
$ qemu-img create -f raw /tmp/backing.raw 64M
$ qemu-img create -f qcow2 -F raw -b /tmp/backing.raw /tmp/test.qcow2
$ qemu-system-x86_64 \
-blockdev driver=file,filename=/tmp/backing.raw,node-name=backing \
-blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing
qemu-system-x86_64: -blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do not require the submodule, but use it if present. Allow the
command-line to override system or git submodule either way.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tpm 2017/10/24 v1
# gpg: Signature made Wed 25 Oct 2017 06:06:55 BST
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-10-24-1:
tpm: print buffers received from TPM when debugging
vl: remove unnecessary #ifdef CONFIG_TPM
tpm: remove unnecessary #ifdef CONFIG_TPM
tpm: add stubs
tpm: add missing include
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
we can get the network interface statistics inside a virtual machine by
guest-network-get-interfaces command. it is very useful for us tomonitor
and analyze network traffic.
Signed-off-by: ZhiPeng Lu <lu.zhipeng@zte.com.cn>
* don't rely on sizeof(wchar[]) for wchar[] indexing
* avoid camelCase variable names
* fix up getline() usage
* condensed commit subject line
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When VM is in a heavy IO, if the command "guest-fsfreeze-freeze"
is executed, VSS may timeout when trying to hold writes.
Inside guest, Event ID 12298(VSS_ERROR_HOLD_WRITES_TIMEOUT)
is logged in the Event Viewer.
At that time, if we call AbortBackup, qga may hang forever.
This patch will solve this issue.
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
TCG patch queue
# gpg: Signature made Wed 25 Oct 2017 10:30:18 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20171025: (51 commits)
translate-all: exit from tb_phys_invalidate if qht_remove fails
tcg: Initialize cpu_env generically
tcg: enable multiple TCG contexts in softmmu
tcg: introduce regions to split code_gen_buffer
translate-all: use qemu_protect_rwx/none helpers
osdep: introduce qemu_mprotect_rwx/none
tcg: allocate optimizer temps with tcg_malloc
tcg: distribute profiling counters across TCGContext's
tcg: introduce **tcg_ctxs to keep track of all TCGContext's
gen-icount: fold exitreq_label into TCGContext
tcg: define tcg_init_ctx and make tcg_ctx a pointer
tcg: take tb_ctx out of TCGContext
translate-all: report correct avg host TB size
exec-all: rename tb_free to tb_remove
translate-all: use a binary search tree to track TBs in TBContext
tcg: Remove CF_IGNORE_ICOUNT
tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
cpu-exec: lookup/generate TB outside exclusive region during step_atomic
tcg: check CF_PARALLEL instead of parallel_cpus
target/sparc: check CF_PARALLEL instead of parallel_cpus
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Even though there is only one monitor, and thus no race on this
global data object, there is also no point in having it. We can
just as well record the decision in the read_memory_function that
we select.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If configured, prefer this over our rather dated copy of the
GPLv2-only binutils. This will be especially apparent with
the proposed vector extensions to TCG, as disas/i386.c does
not handle AVX.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The Capstone disassembler has its own big-endian fixup.
Doing this twice does not work, of course. Move our current
fixup from target/arm/cpu.c to disas/arm.c.
This makes read_memory_inner_func unused and can be removed.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Two or more threads might race while invalidating the same TB. We currently
do not check for this at all despite taking tb_lock, which means we would
wrongly invalidate the same TB more than once. This bug has actually been
hit by users: I recently saw a report on IRC, although I have yet to see
the corresponding test case.
Fix this by using qht_remove as the synchronization point; if it fails,
that means the TB has already been invalidated, and therefore there
is nothing left to do in tb_phys_invalidate.
Note that this solution works now that we still have tb_lock, and will
continue working once we remove tb_lock.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1508445114-4717-1-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is identical for each target. So, move the initialization to
common code. Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.
This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.
In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.
Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.
TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.
Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.
Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:
Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
use_vis3_instructions
Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is groundwork for supporting multiple TCG contexts.
The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.
What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).
We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.
The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).
* -smp 1, 1 region (entire buffer):
qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367
That is, 8 flushes.
* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:
qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368
Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.
* -smp 8, static partitioning of 8 regions (10 MB per region):
qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364
That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The helpers require the address and size to be page-aligned, so
do that before calling them.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.
Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:
Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):
exec time (s) Relative slowdown wrt original (%)
---------------------------------------------------------------
original 20.213321616 0.
tcg_malloc 20.441130078 1.1270214
TCGContext 20.477846517 1.3086662
g_malloc 20.780527895 2.8061013
The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
in TCGContext. This is simple enough but it isn't faster than using
tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
tcg_optimize is executed.
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is groundwork for supporting multiple TCG contexts.
To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:
1) Consolidate profile info into its own struct, TCGProfile, which
TCGContext also includes. Note that tcg_table_op_count is brought
into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.
This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
The core of this patch is this change to tcg/tcg.h:
> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;
Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the
corresponding translated code") we are not fully utilizing
code_gen_buffer for translated code, and therefore are
incorrectly reporting the amount of translated code as well as
the average host TB size. Address this by:
- Making the conscious choice of misreporting the total translated code;
doing otherwise would mislead users into thinking "-tb-size" is not
honoured.
- Expanding tb_tree_stats to accurately count the bytes of translated code on
the host, and using this for reporting the average tb host size,
as well as the expansion ratio.
In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.
For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.
The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.
Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.
Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:
Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):
- Before:
8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% )
16,974 context-switches # 0.002 M/sec ( +- 0.12% )
0 cpu-migrations # 0.000 K/sec
10,125 page-faults # 0.001 M/sec ( +- 1.23% )
35,144,901,879 cycles # 4.367 GHz ( +- 0.14% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% )
10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% )
192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% )
8.640869419 seconds time elapsed ( +- 0.57% )
- After:
8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% )
17,016 context-switches # 0.002 M/sec ( +- 0.40% )
0 cpu-migrations # 0.000 K/sec
18,769 page-faults # 0.002 M/sec ( +- 0.45% )
35,660,956,120 cycles # 4.378 GHz ( +- 1.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% )
10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% )
195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% )
8.828660235 seconds time elapsed ( +- 0.38% )
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These flags are used by target/*/translate.c,
and affect code generation.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.
While at it, merge cpu_exec_step into cpu_exec_step_atomic.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.
Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.
Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.
Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:
FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)
perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES
Then manually fixed the few errors that checkpatch reported.
Compile-tested for all targets.
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We were generating code during tb_invalidate_phys_page_range,
check_watchpoint, cpu_io_recompile, and (seemingly) discarding
the TB, assuming that it would magically be picked up during
the next iteration through the cpu_exec loop.
Instead, record the desired cflags in CPUState so that we request
the proper TB so that there is no more magic.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.
Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Using the offset of a temporary, relative to TCGContext, rather than
its index means that we don't use 0. That leaves offset 0 free for
a NULL representation without having to leave index 0 unused.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When we used structures for TCGv_*, we needed a macro in order to
perform a comparison. Now that we use pointers, this is just clutter.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.
The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration. This should allow the compiler to use low-overhead
looping constructs on some hosts.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This avoids having to allocate external memory for each temporary.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the same time, drop the TCGContext argument and use tcg_ctx instead.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This avoids needing to test the index of a temp against nb_globals.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries. The result is actually a bit
smaller and should have slightly more cache locality.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
else file including "sysemu/tpm.h" fails to compile:
In file included from qemu/stubs/tpm.c:2:0:
qemu/include/sysemu/tpm.h:36:19: error: implicit declaration of function ‘object_resolve_path_type’ [-Werror=implicit-function-declaration]
Object *obj = object_resolve_path_type("", TYPE_TPM_TIS, NULL);
^~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
input: fixes for ui input code and ps/2 keyboard (mostly sysrq key)
# gpg: Signature made Mon 23 Oct 2017 10:19:22 BST
# gpg: using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138
* remotes/kraxel/tags/input-20171023-pull-request:
ui: pull in latest keycodemapdb
ui: normalize the 'sysrq' key into the 'print' key
ps2: fix scancodes sent for Ctrl+Pause key combination
ps2: fix scancodess sent for Pause key in AT set 1
ps2: fix scancodes sent for Shift/Ctrl+Print key combination
ps2: fix scancodes sent for Alt-Print key combination (aka SysRq)
ui: use correct union field for key number
ui: fix crash with sendkey and raw key numbers
input: use hex in ps2 keycode trace events
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
OpenRISC SMP patchset 20171021
# gpg: Signature made Fri 20 Oct 2017 22:51:16 BST
# gpg: using RSA key 0xC3B31C2D5E6627E4
# gpg: Good signature from "Stafford Horne <shorne@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: D9C4 7354 AEF8 6C10 3A25 EFF1 C3B3 1C2D 5E66 27E4
* remotes/shorne/tags/openrisc-20171021-smp-pr:
openrisc: Only kick cpu on timeout, not on update
openrisc: Initial SMP support
openrisc/cputimer: Perparation for Multicore
target/openrisc: Make coreid and numcores variable
openrisc/ompic: Add OpenRISC Multicore PIC (OMPIC)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We now report errors also when we finish migration, not only on info
migrate. We plan to use this error from several places, and we want
the first error to happen to win, so we add an mutex to order it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch adds ability to track down already received
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, and for recovery after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about received pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already received pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).
Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Just for placing auxilary operations inside helper,
auxilary operations like: track received pages,
notify about copying operation in futher patches.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Rearrange the bitmap initialization and the first sync. Since at it,
make sure the locks are taken/released in correct order (I moved RCU
unlock upper - though it may not affect much).
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Let's further simplify ram_init_all() and ram_save_cleanup() by abstract
all the XBZRLE related codes into their own functions.
When allocating xbzrle cache, we are always very careful on -ENOMEM;
which makes sense. Replacing the last g_malloc0() with g_try_malloc0(),
then refactor the logic a bit.
This patch should be fixing some memory leaks when some memory
allocation failed for XBZRLE in the past.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
There are two Mutexes that are created but not yet destroyed for
RAMState. Fix that.
Since we are at it, provide helper function to clean up RAMState.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The old ram_state_init() is not really initializing the RAMState only,
but including lots of other stuff that is RAM-related. Renaming it to
ram_init_all(). Instead, provide a real ram_state_init().
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add pause-before-switchover support for postcopy.
After starting postcopy it will transition
active->pre-switchover->postcopy_active
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If a migration_cancel is issued during the new paused state,
kick the pause_sem to get to unpause so it can cancel.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Wait for a semaphore before completing the migration,
if the previously added capability was enabled.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add two statuses for use when the 'pause-before-switchover'
capability is enabled.
'pre-switchover' is the state that we wait in for management
to allow us to continue.
'device' is the state we enter while serialising the devices
after management gives us the OK.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When 'pause-before-switchover' is enabled, the outgoing migration
will pause before invalidating the block devices and serializing
the device state.
At this point the management layer gets the chance to clean up any
device jobs or other device users before the migration completes.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Once there, take a total size instead of the size of the pages. We
move the check that the new_size is bigger than one page from
xbzrle_cache_resize().
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix typo spotted by Peter Xu
It was not used at all since commit:
27af7d6ea5
which replaced its use by the dirty sync count.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Latest keycodemapdb has a fix for Sun keyboard Pause mapping
and backcompat fix for QEMU's treatment of 0xb7 as an alternative
to 0x54 for triggering Print/SysRq
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-10-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'sysrq' key was mistakenly added to QEMU to deal with incorrect handling
of the 'print' key in the ps2 device:
commit f2289cb692
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date: Wed Jun 4 10:14:16 2008 +0000
Add sysrq to key names known by "sendkey".
Adding sysrq keycode to the table enabling running sysrq debugging in
the guest via the monitor sendkey command, like:
(qemu) sendkey alt-sysrq-t
Tested on x86-64 target and Linux guest.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
The ps2 device is now fixed wrt modifiers and the 'print' key. Further the
handling of the 'sysrq' key has some problems of its own, documented in the
previous commit. To cleanup this mess, we convert any use of 'sysrq' into
'print' prior to dispatching the event to device models.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-9-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'Pause' key is special in the AT set 1 / set 2 scancode definitions.
An unmodified 'Pause' key is supposed to send
AT Set 1: e1 1d 45 91 9d c5 (Down) <nothing> (Up)
AT Set 2: e1 14 77 e1 f0 14 f0 77 (Down) <nothing> (Up)
which QEMU gets right. When combined with Ctrl (both left and right variants),
a different sequence is expected
AT Set 1: e0 46 e0 c6 (Down) <nothing> (Up)
AT Set 2: e0 7e e0 f0 73 (Down) <nothing> (Up)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The ps2 device was previously fixed to send the special Pause/Print
scancode sequences in:
commit 8c10e0baf0
Author: Hervé Poussineau <hpoussin@reactos.org>
Date: Thu Sep 15 22:06:26 2016 +0200
ps2: use QEMU qcodes instead of scancodes
The sequence used for Pause had a small typo in the AT set 1, with a 0xe1
accidentally changed to 0x91. This is not immediately visible with Linux
guests since they run the ps2 device with AT set 2 scancodes.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-7-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'Print' key is special in the AT set 1 / set 2 scancode definitions.
An unmodified 'Print' key is supposed to send
AT Set 1: e0 2a e0 37 (Down) e0 b7 e0 aa (Up)
AT Set 2: e0 12 e0 7c (Down) e0 f0 7c e0 f0 12 (Up)
which QEMU gets right. When combined with Shift/Ctrl (both left and right
variants), the leading two bytes should be dropped, resulting in
AT Set 1: e0 37 (Down) e0 b7 (Up)
AT Set 2: e0 7c (Down) e0 f0 7c (Up)
This difference is pretty benign, since of all the operating systems I have
checked (Linux, FreeBSD and OpenStack), none bother to check the leading two
bytes anyway. This change none the less makes the ps2 device better follow real
hardware behaviour.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-6-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'Print' key is special in the AT set 1 / set 2 scancode definitions.
An unmodified 'Print' key is supposed to send
AT Set 1: e0 2a e0 37 (Down) e0 b7 e0 aa (Up)
AT Set 2: e0 12 e0 7c (Down) e0 f0 7c e0 f0 12 (Up)
which QEMU gets right. When pressed in combination with the 'Alt_L' or 'Alt_R'
keys (which signify SysRq), the scancodes are required to follow a different
scheme. With Alt_L, the expected sequences are
AT set 1: 38, 54 (Down) d4, b8 (Up)
AT set 2: 11, 84 (Down) f0 84, f0 11 (Up)
And with Alt_R
AT set 1: e0 38, 54 (Down) d4, e0 b8 (Up)
AT set 2: e0 11, 84 (Down) f0 84, f0 e0 11 (Up)
It is actually slightly more complicated than that, because (according results
of 'showkey -s', keyboards will in fact first release the currently pressed
modifier before sending the sequence above (which effectively re-presses &
then releases the modifier) and finally re-press the original modifier
afterwards. IOW, with Alt_L we need to send
AT set 1: b8, 38, 54 (Down) d4, b8, 38 (Up)
AT set 2: f0 11, 11, 84 (Down) f0 84, f0 11, 11 (Up)
And with Alt_R
AT set 1: e0 b8, e0 38, 54 (Down) d4, e0 b8, e0 38 (Up)
AT set 2: e0 f0 11, e0 11, 84 (Down) f0 84, e0 f0 11, e0 11 (Up)
The AT set 3 scancodes have no special handling for Alt-Print.
Rather than fixing the handling of the 'print' key in the ps2 driver to consider
the Alt modifiers, way back, a patch was commited that defined an extra 'sysrq'
key name:
commit f2289cb692
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date: Wed Jun 4 10:14:16 2008 +0000
Add sysrq to key names known by "sendkey".
Adding sysrq keycode to the table enabling running sysrq debugging in
the guest via the monitor sendkey command, like:
(qemu) sendkey alt-sysrq-t
Tested on x86-64 target and Linux guest.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
With this patch QEMU would send
AT set 1: 38, 54 (Down) d4, b8 (Up)
AT set 2: 11, 84 (Down) f0 84, f0 11 (Up)
but this doesn't match what actual real keyboards send, as it is not releasing
the original modifier & pressing it again afterwards. In addition the original
problem remains, and a new problem was added:
- The sequence 'alt-print-t' is still broken, acting as if 'print-t' was
requested
- The sequence 'sysrq-t' is broken, injecting an undefine scancode sequence
tot he guest os (bare 0x54)
To deal with this mess we make these changes to the ps2 code, so that we track
the state of modifier keys (Alt, Shift, Ctrl - both left & right). Then we can
vary what scancodes are sent for Q_KEY_CODE_PRINT according to the Alt key
modifier state
Interestingly, it appears that of operating systems I've checked (Linux, FreeBSD
and OpenSolaris), none of them actually bother to validate the full sequences
for a unmodified 'Print' key. They all just ignore the leading "e0 2a" and
trigger based off "e0 37" alone. The latter two byte sequence is what keyboards
send with 'Print' is combined with 'Shift' or 'Ctrl' modifiers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20171019142848.572-5-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The code converting key numbers to QKeyCode in the 'input-send-event'
command mistakenly accessed the key->u.qcode union field instead of
the key->u.number field. This is harmless because the fields use the
same size datatype in both cases, but none the less it should be fixed
to avoid confusion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171019142848.572-4-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Previously we enforced that all key events are using QKeyCodes
at time they are sent:
commit af07e5ff02
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Fri Sep 29 11:12:00 2017 +0100
ui: convert key events to QKeyCodes immediately
This commit forget to fix the code for the legacy 'sendkey'
command which still accepts key numbers from the user, which
then need converting to QKeyCodes
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171019142848.572-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We don't need qemu-keymap when we build only linux-user qemu.
When we compile in static mode, the libxkbcommon is detected
by configure if the shared one is available, but cannot
be linked if the static version is not available.
As we don't need it for qemu-linux-user, and we generally need
a static link to use it in a chroot, disable qemu-keymap in
this case.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-id: 20171019191606.14129-1-laurent@vivier.eu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Previously we were kicking the cpu on every update. This caused
problems noticeable in SMP configurations where one CPU got pinned
continuously servicing timer exceptions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Wire in ompic and add basic support for SMP. The OpenRISC is special in
that interrupts for devices are routed to each core's PIC. This is
achieved using the qemu_irq_split utility, but this currently limits
OpenRISC to 2 cores.
This models the reference architecture described in the OpenRISC spec
1.2 proposal.
https://github.com/stffrdhrn/doc/raw/arch-1.2-proposal/openrisc-arch-1.2-rev0.pdf
The changes to the intialization of the sim include:
CPU Reset
o Reset each cpu to the bootstrap PC rather than only a single cpu as
done before.
o During Kernel loading the bootstrap PC is saved in a static global.
Network Initialization
o Connect the interrupt to each CPU
o Use more simple sysbus_mmio_map() rather than memory_region_add_subregion()
Sim Initialization
o Initialize the pic and tick timer per cpu
o Wire in the OMPIC if SMP is enabled
o Wire the serial irq to each CPU using qemu_irq_split()
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
In order to support multicore system we move some of the previously
static state variables into the state of each core.
On the other hand in order to allow timers to be synced between each
code the ttcr (tick timer count register) is moved out of the core.
This is not as per real hardware spec which has a separate timer counter
per core, but it seems the most simple way to keep each clock in sync.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Previously coreid and numcores were hard coded as 0 and 1 respectively
as OpenRISC QEMU did not have multicore support.
Multicore support is now being added so these registers need to have
configured values.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Add OpenRISC Multicore PIC which handles inter processor interrupts
(IPI) between cores. In OpenRISC all device interrupts are routed to
each core enabling this device to be simple.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
The last big chunk of s390x changes:
- (experimental) smp support under tcg
- provide the virtio-input devices for virtio-ccw
- improve error handling in the css code
- enable some simple virtio tests for s390x
- low-address protection in tcg
- some more cleanups and fixes
# gpg: Signature made Fri 20 Oct 2017 12:49:22 BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20171020: (46 commits)
s390x/tcg: low-address protection support
accel/tcg: allow to invalidate a write TLB entry immediately
tests: Enable the very simple virtio tests on s390x, too
libqtest: Add qtest_[v]startf()
s390x: refactor error handling for MSCH handler
s390x: refactor error handling for HSCH handler
s390x: refactor error handling for CSCH handler
s390x: refactor error handling for XSCH handler
s390x: improve error handling for SSCH and RSCH
s390x/css: IO instr handler ending control
s390x: move s390x_new_cpu() into board code
s390x: fix cpu object referrence leak in s390x_new_cpu()
s390x/event-facility: variable-length event masks
s390x/MAINTAINERS: add mailing list
virtio-ccw: Add the virtio-input devices for CCW bus
target/s390x: special handling when starting a CPU with WAIT PSW
s390x/tcg: refactor stfl(e) to use s390_get_feat_block()
s390x/tcg: unlock NMI
s390x/cpumodel: allow to enable SENSE RUNNING STATUS for qemu
s390x/tcg: switch to new SIGP handling code
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Fri 20 Oct 2017 07:30:45 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-pull-request:
docker: Fix PATH for ccache
docker: fix out-of-tree 'make docker-test-build@debian-powerpc-cross'
docker: allow running from srcdir != builddir build
docker: cleanup temp directory after test
docker: Don't allocate tty unless DEBUG=1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is a neat way to implement low address protection, whereby
only the first 512 bytes of the first two pages (each 4096 bytes) of
every address space are protected.
Store a tec of 0 for the access exception, this is what is defined by
Enhanced Suppression on Protection in case of a low address protection
(Bit 61 set to 0, rest undefined).
We have to make sure to to pass the access address, not the masked page
address into mmu_translate*().
Drop the check from testblock. So we can properly test this via
kvm-unit-tests.
This will check every access going through one of the MMUs.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171016202358.3633-3-david@redhat.com>
[CH: restored error message for access register mode]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Background: s390x implements Low-Address Protection (LAP). If LAP is
enabled, writing to effective addresses (before any translation)
0-511 and 4096-4607 triggers a protection exception.
So we have subpage protection on the first two pages of every address
space (where the lowcore - the CPU private data resides).
By immediately invalidating the write entry but allowing the caller to
continue, we force every write access onto these first two pages into
the slow path. we will get a tlb fault with the specific accessed
addresses and can then evaluate if protection applies or not.
We have to make sure to ignore the invalid bit if tlb_fill() succeeds.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171016202358.3633-2-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
We have several callers that were formatting the argument strings
themselves; consolidate this effort by adding new convenience
functions directly in libqtest, and update some call-sites that
can benefit from it.
Note that the new functions qtest_startf() and qtest_vstartf()
behave more like qtest_init() (the caller must assign global_qtest
after the fact, rather than getting it implicitly set). This helps
us prepare for future patches that get rid of the global variable,
by explicitly highlighting which tests still depend on it now.
Signed-off-by: Eric Blake <eblake@redhat.com>
[thuth: Dropped the hunks that do not apply cleanly to qemu master
yet and added the missing g_free(args) in qtest_vstartf()]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1508336428-20511-2-git-send-email-thuth@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Simplify the error handling of the SSCH and RSCH handler avoiding
arbitrary and cryptic error codes being used to tell how the instruction
is supposed to end. Let the code detecting the condition tell how it's
to be handled in a less ambiguous way. It's best to handle SSCH and RSCH
in one go as the emulation of the two shares a lot of code.
For passthrough this change isn't pure refactoring, but changes the way
kernel reported EFAULT is handled. After clarifying the kernel interface
we decided that EFAULT shall be mapped to unit exception. Same goes for
unexpected error codes and absence of required ORB flags.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20171017140453.51099-4-pasic@linux.vnet.ibm.com>
Tested-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
[CH: cosmetic changes]
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
CSS code needs to tell the IO instruction handlers located in ioinst.c
how the emulated instruction should be ended. Currently this is done by
returning generic (POSIX) error codes, and mapping them to outcomes like
condition codes. This makes bugs easy to create and hard to recognize.
As a preparation for moving away from (mis)using generic error codes for
flow control let us introduce a type which tells the instruction
handler function how to end the instruction, in a more straight-forward
and less ambiguous way.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20171017140453.51099-3-pasic@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
[CH: cosmetic changes]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
object_new() returns cpu with refcnt == 1 and after realize
refcnt == 2*. s390x_new_cpu() as an owner of the first refcnt
should have released it on exit in both cases (on error and
success) to avoid it leaking. Do so for both cases.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1508247680-98800-2-git-send-email-imammedo@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
When we try to start a CPU with a WAIT PSW, we have to take care that
TCG will actually try to continue executing instructions.
We must therefore really only unhalt the CPU if we don't have a WAIT
PSW. Also document the special order for restart interrupts, which
load a new PSW and change the state to operating.
To keep KVM working, simply don't have a look at the WAIT bit when
loading the PSW. Otherwise the behavior of a restart interrupt when
a CPU stopped would be changed.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-31-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Refactor it to use s390_get_feat_block(). Directly write into the mapped
lowcore with stfl and make sure it is really only compiled if needed.
While at it, add an alignment check for STFLE and avoid
potential_page_fault() by properly restoring the CPU state.
Due to s390_get_feat_block(), we will now also indicate the
"Configuration-z-architectural-mode", which is with new SIGP code the
right thing to do.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-30-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This effectively enables experimental SMP support. Floating interrupts are
still a mess, so allow it but print a big warning. There also seems
to be a problem with CPU hotplug (after the main loop started).
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-27-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[CH: changed insn-data.def as pointed out by Richard]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Implement them like KVM implements/handles them. Both can only be
triggered via SIGP instructions. RESET has (almost) the lowest priority if
the CPU is running, and the highest if the CPU is STOPPED. This is handled
in SIGP code already. On delivery, we only have to care about the
"CPU running" scenario.
STOP is defined to be delivered after all other interrupts have been
delivered. Therefore it has the actual lowest priority.
As both can wake up a CPU if sleeping, indicate them correctly to
external code (e.g. cpu_has_work()).
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-25-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
We want to use the same code base for TCG, so let's cleanly factor it
out.
The sigp mutex is currently not really needed, as everything is
protected by the iothread mutex. But this could change later, so leave
it in place and initialize it properly from common code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-17-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
KVM handles the wait PSW itself and triggers a WAIT ICPT in case it
really wants to sleep (disabled wait).
This will later allow us to change the order of loading a restart
interrupt and setting a CPU to OPERATING on SIGP RESTART without
changing KVM behavior.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-11-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
If we encounter a WAIT PSW, we have to halt immediately. Using
cpu_loop_exit() at this point feels wrong. Simply leaving
cs->exception_index set doesn't result in an immediate stop.
This is also necessary to properly handle SIGP STOP interrupts later.
The CPU_INTERRUPT_HALT will be processed immediately and properly set
the CPU to halted (also resetting cs->exception_index to EXCP_HLT)
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-10-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Going to OPERATING here looks wrong. A CPU should even never be
!OPERATING at this point. Unhalting will already be done in
cpu_handle_halt() if there is work, so we can drop this statement
completely.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-8-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Currently, enabling/disabling of interrupts is not really supported.
Let's improve interrupt handling code by explicitly checking for
deliverable interrupts only. This is the first step. Checking for
external interrupt subclasses will be done next.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-5-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
There are still some leftovers from old virtio interrupts in there.
Most importantly, we don't have to queue service interrupts anymore.
Just like KVM, we can simply multiplex the SCLP service interrupts and
avoid the queue.
Also, now only valid parameters/cpu_addr will be stored on service
interrupts.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-3-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
External interrupts are currently all handled like floating external
interrupts, they are queued. Let's prepare for a split of floating
and local interrupts by turning INTERRUPT_EXT into a mask.
While we can have various floating external interrupts of one kind, there
is usually only one (or a fixed number) of the local external interrupts.
So turn INTERRUPT_EXT into a mask and properly indicate the kind of
external interrupt. Floating interrupts will have to moved out of
one CPU instance later once we have SMP support.
The only floating external interrupts used right now are SERVICE
interrupts, so let's use that name. Following patches will clean up
SERVICE interrupt injection.
This get's rid of the ugly special handling for cpu timer and clock
comparator interrupts. And we really only store the parameters as
defined by the PoP.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928203708.9376-2-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Calling do_subchannel_work with no function control flags set in SCSW is
a programming error. Currently we handle this differently in
do_subchannel_work_virtual and do_subchannel_work_passthrough. Let's be
consistent and guard with a common assert against this programming error.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20171004154144.88995-2-pasic@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Merge tpm 2017/10/19 v1
# gpg: Signature made Thu 19 Oct 2017 16:42:39 BST
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-10-19-1: (21 commits)
tpm: move recv_data_callback to TPM interface
tpm: add a QOM TPM interface
tpm-tis: fold TPMTISEmuState in TPMState
tpm-tis: remove tpm_tis.h header
tpm-tis: move TPMState to TIS header
tpm: remove locty_data from TPMState
tpm-emulator: fix error handling
tpm: add TPMBackendCmd to hold the request state
tpm: remove locty argument from receive_cb
tpm: remove needless cast
tpm: remove unused TPMBackendCmd
tpm: remove configure_tpm() hop
tpm: remove init() class method
tpm: remove TPMDriverOps
tpm: move TPMSizedBuffer to tpm_tis.h
tpm: remove tpm_register_driver()
tpm: replace tpm_get_backend_driver() to drop be_drivers
tpm: lookup tpm backend class in tpm_driver_find_by_type()
tpm: make tpm_get_backend_driver() static
tpm-tis: remove RAISE_STS_IRQ
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
gcc warning:
/qemu/util/oslib-posix.c:304:11: error:
variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’
[-Werror=clobbered]
Fix also some related data types:
numpages, hpagesize are used as pointer offset.
Always use size_t for them and also for the derived
numpages_per_thread and size_per_thread.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 20171016202912.1117-1-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before bcd7f06f57 we source /etc/profile
so the PATH included the right paths to ccache binaries. Now we need to
update $PATH explicitly from run script.
Keep the old /usr/lib around just so that in the future, ccache from 32
bit images will just work.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20171018073841.30062-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The existence of tty in the container seems to urge gcc into colorizing
the errors, but the escape chars will clutter the report once turned
into email replies on patchew. Move -t to debug mode.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20171013011954.9975-1-famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
This was introduced by:
commit aef45d51d1
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Fri Sep 29 11:11:56 2017 +0100
build: automatically handle GIT submodule checkout for dtc
On my system, I see the following with a fresh clone:
% ./configure --disable-gtk --target-list=aarch64-softmmu
% make -j8
GEN aarch64-softmmu/config-devices.mak.tmp
GEN config-host.h
mkdir -p dtc/libfdt
GIT ui/keycodemapdb dtc
mkdir -p dtc/tests
GEN qemu-options.def
[snip]
GEN migration/trace.h
make: *** [git-submodule-update] Error 1
make: *** Waiting for unfinished jobs....
Upon closer inspection, the root cause of the error is:
% git submodule update --init ui/keycodemapdb dtc
fatal: destination path 'dtc' already exists and is not an empty directory.
Clone of 'git://git.qemu-project.org/dtc.git' into submodule path 'dtc' failed
This patch fixes this race condition by forcing the 'dtc/%' rule which caused
'dtc' to be non-empty to wait on '.git-submodule-status'.
Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1508352023-28591-1-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This will simplify backend / interface objects relationship, so the
frontend interface will simply have to implement the TPM QOM interface.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
The previous patch cleaned up a bit error handling, and exposed an
existing bug: error_report_err() could be called with a NULL error.
Instead, make tpm_emulator_set_locality() set the error.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This is the seabios update for qemu 2.11. Well, almost, seabios is in
freeze for the upcoming 1.11 release. This updates seabios to current
git master snapshot, and it will be updated again to 1.11 final before
the 2.11 release.
With this two-step seabios gets some more wide testing before the actual
release and the update to 1.11 final (which will most likely happen
after qemu freeze) should have bugfix patches only.
git shortlog
============
Aleksandr Bezzubikov (3):
pci: refactor pci_find_capapibilty to get bdf as the first argument instead of the whole pci_device
pci: add QEMU-specific PCI capability structure
pci: enable RedHat PCI bridges to reserve additional resources on PCI init
Ben Warren (5):
QEMU DMA: Add DMA write capability
romfile-loader: Switch to using named structs
QEMU fw_cfg: Add command to write back address of file
QEMU fw_cfg: Add functions for accessing files by key
QEMU fw_cfg: Write fw_cfg back on S3 resume
Daniel Verkamp (5):
nvme: support NVMe 1.0 controllers
nvme: extend command timeout to 5 seconds
nvme: fix reversed loop condition in cmd_readwrite
nvme: fix extraction of status code bits
nvme: fix copy-paste mistake in comment
Filippo Sironi (1):
nvme: Use the Maximum Queue Entries Supported (MQES) to initialize I/O queues
Gerd Hoffmann (7):
usb: add hub portmap
usb-xhci: use hub portmap
std: add cp437 to unicode map
kbd: make enqueue_key public, add ascii_to_keycode
romfile: add support for constant files.
paravirt: serial console configuration.
add serial console support
Igor Mammedov (1):
drop "etc/boot-cpus" fw_cfg file and reuse legacy QEMU_CFG_NB_CPUS
Jason Wang (1):
virtio: IOMMU support
Julian Stecklina (2):
block: add NVMe boot support
nvme: fix out of memory behavior
Julius Werner (1):
coreboot: Adapt to upstream CBMEM console changes
Kevin O'Connor (26):
usb: Make usb_time_sigatt variable static
tpm: Add comment banners to tcg.c separating major parts of spec
tpm: Don't call tpm_set_failure() from tpm12_get_capability()
tpm: Move code around in tcgbios.c to keep like code together
acpi: Generalize find_fadt() and find_tcpa_by_rsdp() into find_acpi_table()
tpm: Don't call tpm_build_and_send_cmd() from tpm20_stirrandom()
tpm: Rework tpm_build_and_send_cmd() into tpm_simple_cmd()
ps2port: Disable keyboard/mouse prior to resetting ps2 controller
docs: Note release dates for 1.10.1 and 1.10.2
resume: Don't attempt to use generic reboot mechanisms on QEMU
boot: Increase description size in boot menu
src: Minor - remove tab characters that slipped into SeaBIOS C code
NVMe: Allow NVMe to be enabled on real hardware
smm: Backup and restore A20 on an SMI based mode switch
stacks: Make sure to initialize Call16Data
stacks: Don't update the A20 settings if they haven't changed
stacks: There is no need to disable NMI if it is already disabled
vga: Fix bug in stdvga_get_linesize()
docs: Fix typos in Memory_Model.md
tcgbios: Fix use of unitialized variable
boot: Rename drive_g to drive
disk: Don't require the 'struct drive_s' to be in the f-segment
block: Rename disk_op_s->drive_gf to drive_fl
virtio: Allocate drive_s storage in low memory
xhci: Build TRBs directly in xhci_trb_queue()
xhci: Verify the device is still present in xhci_cmd_submit()
Ladi Prosek (1):
ahci: Set upper 32-bit registers to zero
Patrick Rudolph (4):
SeaVGABios/cbvga: Advertise correct pixel format
SeaVGABIOS/vbe: Query driver for scanline pitch v2
SeaVGABios/cbvga: Use active mode to clear screen
SeaVGABios/cbvga: Advertise compatible VESA modes
Paul Menzel (1):
vgasrc: Increase debug level
Petr Berky (1):
config: Add function to check if fw_cfg exists
Ricardo Ribalda Delgado (1):
serialio: Support for mmap serial ports
Roman Kagan (11):
blockcmd: accept only disks and CD-ROMs
blockcmd: generic SCSI luns enumeration
virtio-scsi: enumerate luns with REPORT LUNS
esp-scsi: enumerate luns with REPORT LUNS
usb-uas: enumerate luns with REPORT LUNS
pvscsi: fix the comment about lun enumeration
mpt-scsi: try to enumerate luns with REPORT LUNS
lsi-scsi: reset in case of a serious problem
lsi-scsi: try to enumerate luns with REPORT LUNS
blockcmd: start REPORT_LUNS with the smallest buffer
Revert "lsi-scsi: reset in case of a serious problem"
Stefan Berger (1):
tpm: Log TPM 2 digest structure in little endian format
Youness Alaoui (1):
nvme: Enable NVMe support for non-qemu hardware
Zeh, Werner (1):
ahci: Disable Native Command Queueing
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Commit 8d93297 introduced a bug whereby non-inbuilt NICs are realized before
setting the default MAC address causing an assert. Switch NIC creation
over from pci_create_simple() to pci_create() which works exactly the
same except omitting the realize as originally intended.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
This patch updates the sun4u model to being much closer to a real Ultra 5
by moving devices behind the 2 simba PCI bridges (A and B) as found on real
hardware.
The most noticeable change introduced by this patchset is that in-built devices
are no longer attached to the PCI root bus, but instead behind PCI bridge A.
Along with this the interrupt routing is updated accordingly to match the
official documentation.
Since the existing code currently bypasses the PCI bridge interrupt
swizzling, the interrupt mapping functions are reorganised so that
pci_pbm_map_irq() is used by the PCI bridges and pci_apb_map_irq() is
used by the PCI host bridge.
Behind the sabre PCI host bridge, the PCI IO space now needs to be
split into two separate halves at 0x8000000. Therefore we also setup a new
PCI IO space region of increased size on the PCI host bridge and enable
32-bit PCI IO accesses to allow IO accesses to reach devices behind PCI
bridge B correctly.
As part of this change we also combine the onboard sunhme NIC and the ebus
into a single multi-function device as done on a real Ultra 5. For other
NICs the existing behaviour is preserved, i.e. we initialise them and
place them into the next free slot on PCI bus B.
Finally we mark the physically unavailable slots (plus slot 0 in busA) as
reserved to ensure that users can't plug devices into non-existent slots
which will break interrupt routing.
Note: since this commit changes PCI topology and interrupt routing, an
updated openbios-sparc64 binary is included with this commit containing the
associated changes to maintain bisectability.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Logical block size of a SCSI disk should never be larger than
physical block size. From an ATA/SCSI perspective, it makes no sense
to have the logical block size greater than the physical block size,
and it cannot even be effectively expressed in the command set. The
whole point of adding the physical block size to the ATA/SCSI command
set was to communicate a desire for a larger block size (than logical),
while maintaining backwards compatibility with legacy 512 byte block
size.
When setting logical_block_size > physical_block_size, QEMU cannot express
it in READ CAPACITY(16) output, and all it can do is set the physical
block exponent to 0 (i.e. logical_block_size == physical_block_size).
Reporting the error properly, however, is better.
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Message-Id: <1508185024-5840-1-git-send-email-mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DEVICE_DEL is currently emitted when a Device is unparented, as
opposed to when it is finalized. The main design motivation for this
seems to be that after unparent()/unrealize(), the Device is no
longer visible to the guest, and thus the operation is complete
from the perspective of management.
However, there are cases where remaining host-side cleanup is also
pertinent to management. The is generally handled by treating these
resources as aspects of the "backend", which can be managed via
separate interfaces/events, such as blockdev_add/del, netdev_add/del,
object_add/del, etc, but some devices do not have this level of
compartmentalization, namely vfio-pci, and possibly to lend themselves
well to it.
In the case of vfio-pci, the "backend" cleanup happens as part of
the finalization of the vfio-pci device itself, in particular the
cleanup of the VFIO group FD. Failing to wait for this cleanup can
result in tools like libvirt attempting to rebind the device to
the host while it's still being used by VFIO, which can result in
host crashes or other misbehavior depending on the host driver.
Deferring DEVICE_DEL still affords us the ability to manage backends
explicitly, while also addressing cases like vfio-pci's, so we
implement that approach here.
An alternative proposal involving having VFIO emit a separate event
to denote completion of host-side cleanup was discussed, but the
prevailing opinion seems to be that it is not worth the added
complexity, and leaves the issue open for other Device implementations
to solve in the future.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20171016222315.407-4-mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit abed886ec6.
This patch originally addressed an issue where a DEVICE_DELETED
event could be emitted (in device_unparent()) before a Device's
QemuOpts were cleaned up (in device_finalize()), leading to a
"duplicate ID" error if management attempted to immediately add
a device with the same ID in response to the DEVICE_DELETED event.
An alternative will be implemented in a subsequent patch where we
defer the DEVICE_DELETED event until device_finalize(), which would
also prevent the race, so we revert the original fix in preparation.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20171016222315.407-3-mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
device_unparent(dev, ...) is called when a device is unparented,
either directly, or as a result of a parent device being
finalized, and handles some final cleanup for the device. Part
of this includes emiting a DEVICE_DELETED QMP event to notify
management, which includes the device's path in the composition
tree as provided by object_get_canonical_path().
object_get_canonical_path() assumes the device is still connected
to the machine/root container, and will assert otherwise, but
in some situations this isn't the case:
If the parent is finalized as a result of object_unparent(), it
will still be attached to the composition tree at the time any
children are unparented as a result of that same call to
object_unparent(). However, in some cases, object_unparent()
will complete without finalizing the parent device, due to
lingering references that won't be released till some time later.
One such example is if the parent has MemoryRegion children (which
take a ref on their parent), who in turn have AddressSpace's (which
take a ref on their regions), since those AddressSpaces get cleaned
up asynchronously by the RCU thread.
In this case qdev:device_unparent() may be called for a child Device
that no longer has a path to the root/machine container, causing
object_get_canonical_path() to assert.
Fix this by storing the canonical path during realize() so the
information will still be available for device_unparent() in such
cases.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20171016222315.407-2-mdroth@linux.vnet.ibm.com>
[Clear dev->canonical_path at the post_realize_fail label, which is
cleaner. Suggested by David Gibson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
libmultipath has recently changed its API. The new API supports multi-threaded
clients better. Unfortunately there is no backwards-compatibility, so we just
switch to the new one. Running QEMU compiled with the new library on the old
library will likely crash, while doing the opposite will cause QEMU not to
start at all (because udev, get_multipath_config and put_multipath_config
are undefined).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aligned 8-byte memory writes by a 64-bit target on a 64-bit host should
always turn into atomic 8-byte writes on the host, however a write
write watchpoint would end up tearing the 8-byte write into two 4-byte
writes in access_with_adjusted_size().
Reported-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aligned 8-byte memory writes by a 64-bit target on a 64-bit host should
always turn into atomic 8-byte writes on the host, however if we missed
in the softmmu, and the TLB line was marked as not dirty, then we
would end up tearing the 8-byte write into two 4-byte writes in
access_with_adjusted_size().
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-Id: <20171013181913.7556-1-Andrew.Baumann@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Attributes are not updated via region_add()/region_del(). Attribute changes
lead to a delete first, followed by a new add.
If this would ever not be the case, we would get an error when trying to
register the new slot.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171016144302.24284-6-david@redhat.com>
Tested-by: Joe Clifford <joeclifford@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It might be confusing for some listener implementations that implement
both, region_add and log_start (e.g. KVM) if we call log_start before an
actual region was added using region_add.
This makes current KVM code trigger an assertion
("kvm_section_update_flags: error finding slot"). So let's just reverse
the order instead of tolerating log_start on yet unknown regions.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171016144302.24284-2-david@redhat.com>
Tested-by: Joe Clifford <joeclifford@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The TARGET_MTIOCTOP/TARGET_MTIOCGET/TARGET_MTIOCPOS values
were being defined in terms of host struct types, but
these structures are such that their size might differ
on different hosts. Switch to using a target struct
definition instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
This adds the -dfilter support to linux-user. There is a minor
checkpatch complaint about formatting which I've ignored for aesthetic
reasons.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
ppc patch queue 2017-10-17
Here's the currently accumulated set of ppc patches for qemu.
* The biggest set here is the ppc parts of Igor Mammedov's cleanups
to cpu model handling
* The above also includes a generic patches which are required as
prerequisites for the ppc parts. They don't seem to have been
merged by Eduardo yet, so I hope they're ok to include here.
* Apart from that it's basically just assorted bug fixes and cleanups
# gpg: Signature made Tue 17 Oct 2017 05:20:03 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.11-20171017: (34 commits)
spapr_cpu_core: rewrite machine type sanity check
spapr_pci: fail gracefully with non-pseries machine types
spapr: Correct RAM size calculation for HPT resizing
ppc: pnv: consolidate type definitions and batch register them
ppc: pnv: drop PnvChipClass::cpu_model field
ppc: pnv: define core types statically
ppc: pnv: drop PnvCoreClass::cpu_oc field
ppc: pnv: normalize core/chip type names
ppc: pnv: use generic cpu_model parsing
ppc: spapr: use generic cpu_model parsing
ppc: move ppc_cpu_lookup_alias() before its first user
ppc: spapr: use cpu model names as tcg defaults instead of aliases
ppc: spapr: register 'host' core type along with the rest of core types
ppc: spapr: use cpu type name directly
ppc: spapr: define core types statically
ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr()
ppc: spapr: replace ppc_cpu_parse_features() with cpu_parse_cpu_model()
ppc: 40p/prep: replace cpu_model with cpu_type
ppc: virtex-ml507: replace cpu_model with cpu_type
ppc: replace cpu_model with cpu_type on ref405ep,taihu boards
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Merge QIO 2017/10/16 v1
# gpg: Signature made Mon 16 Oct 2017 17:10:54 BST
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qio-2017-10-16-1:
io: fix mem leak in websock error path
io: add trace points for websocket HTTP protocol headers
io: cope with websock 'Connection' header having multiple values
io: get rid of bounce buffering in websock write path
io: pass a struct iovec into qio_channel_websock_encode
io: get rid of qio_channel_websock_encode helper method
io: simplify websocket ping reply handling
io: monitor encoutput buffer size from websocket GSource
sockets: Handle race condition between binds to the same port
sockets: factor out create_fast_reuse_socket
sockets: factor out a new try_bind() function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This fixes a potential data leak to the guest.
# gpg: Signature made Mon 16 Oct 2017 16:08:25 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: use g_malloc0 to allocate space for xattr
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
egl_texture_blit() blits a texture, simliar to egl_fb_blit() but by
rendering the texture to the screen instead of using a framebuffer blit.
egl_texture_blend() renders a texture with alpha blending, will be used
to render the cursor to the screen.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010135453.6704-6-kraxel@redhat.com
Add vertex shader which flips the texture upside down while blitting it.
Add argument to qemu_gl_run_texture_blit() to enable flipping.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010135453.6704-4-kraxel@redhat.com
With the upcoming dmabuf support in qemu there will be more users of the
shaders than just console-gl.c. So rename ConsoleGLState to
QemuGLShader, rename some functions too, move code from console-gl.c to
shaders.c.
No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010135453.6704-3-kraxel@redhat.com
This patch adds support for dma-bufs to the qemu console interfaces.
It adds a new "struct QemuDmaBuf" to represent a dmabuf with accociated
metatdata (size, format). It adds three functions (and
DisplayChangeListenerOps operations) to set a dma-buf as display
scanout, as cursor and to release a dmabuf.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010135453.6704-2-kraxel@redhat.com
Commit "3d90c62548 vga: stop passing pointers to vga_draw_line*
functions" is incomplete. It doesn't handle the case that the vga
rendering code tries to create a shared surface, i.e. a pixman image
backed by vga video memory. That can not work in case the guest display
wraps from end of video memory to the start. So force shadowing in that
case. Also adjust the snapshot region calculation.
Can trigger with cirrus only, when programming vbe modes using the bochs
api (stdvga, also qxl and virtio-vga in vga compat mode) wrap arounds
can't happen.
Fixes: CVE-2017-13672
Fixes: 3d90c62548
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010141323.14049-3-kraxel@redhat.com
This makes the code easier to understand and it is consistent with what
we already do for PHBs.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently crashes when the user tries to add an spapr-pci-host-bridge
on a non-pseries machine:
$ qemu-system-ppc64 -M ppce500 -device spapr-pci-host-bridge,index=1
hw/ppc/spapr_pci.c:1535:spapr_phb_realize:
Object 0x1003dacae60 is not an instance of type spapr-machine
Aborted (core dumped)
The same thing happens with the deprecated but still available child type
spapr-pci-vfio-host-bridge.
Fix both by checking the machine type with object_dynamic_cast().
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to prevent the guest from forcing the allocation of large amounts
of qemu memory (or host kernel memory, in the case of KVM HV), we limit
the size of Hashed Page Table (HPT) it is allowed to allocated, based on
its RAM size.
However, the current calculation is not correct: it only adds up the size
of plugged memory, ignoring the base memory size. This patch corrects it.
While we're there, use get_plugged_memory_size() instead of directly
calling pc_existing_dimms_capacity(). The only difference is that it
will abort on failure, which is right: a failure here indicates something
wrong within qemu.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
deduce core type directly from chip type instead of
maintaining type mapping in PnvChipClass::cpu_model.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pnv core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
deduce cpu type directly from core type instead of
maintaining type mapping in PnvCoreClass::cpu_oc and doing
extra cpu_model parsing in pnv_core_class_init()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
typically for cpus/core type names following convention is used
new_type_prefix-superclass_typename
make PNV core/chip to follow common convention.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use common cpu_model prasing in vl.c and set default cpu_model
using generic MachineClass::default_cpu_type.
Beside of switching to generic infrastructure it solves several
issues.
* ppc_cpu_class_by_name() is used to deal with lower/upper case
and alias translations into actual cpu type, which fixes
'-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0'
usecases which error out with:
'invalid CPU model 'FOO' for powernv machine'
* allows to switch to lower-case typenames in pnv chip/core name
(by convention typnames should be lower-case)
* replace aliased names /power8, power9, .../ with exact cpu model
names (i.e. typenames should be stable but aliases might decide to
point to other cpu model withi family or changed by kvm). It will
also help to simplify pnv_chip/core code and get rid of dependency
on cpu_model parsing.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Updated to make DD2.0 as default POWER9 chip]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use generic cpu_model parsing introduced by
(6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())
it allows to:
* replace sPAPRMachineClass::tcg_default_cpu with
MachineClass::default_cpu_type
* drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
one in vl.c
* simplify spapr_get_cpu_core_type() by removing
not needed anymore recurrsion since alias look up
happens earlier at vl.c and spapr_get_cpu_core_type()
works only with resulted from that cpu type.
* spapr no more needs to parse/depend on being phased out
MachineState::cpu_model, all tha parsing done by generic
code and target specific callback.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
next commit will drop ppc_cpu_lookup_alias() declaration from header
and make it static which will break its last user ppc_cpu_class_by_name()
since ppc_cpu_class_by_name() defined before ppc_cpu_lookup_alias().
To avoid this move ppc_cpu_lookup_alias() right before
ppc_cpu_class_by_name().
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.
Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0
Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DEFINE_TYPES() will help to simplify following routine patterns:
static void foo_register_types(void)
{
type_register_static(&foo1_type_info);
type_register_static(&foo2_type_info);
...
}
type_init(foo_register_types)
or
static void foo_register_types(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(type_infos); i++) {
type_register_static(&type_infos[i]);
}
}
type_init(foo_register_types)
with a single line
DEFINE_TYPES(type_infos)
where types have static definition which could be consolidated in
a single array of TypeInfo structures.
It saves us ~6-10LOC per use case and would help to replace
imperative foo_register_types() there with declarative style of
type registration.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LMB removal is completed only when the spapr_lmb_release callback
is called after all DRCs of the dimm are detached. During this
time, it is possible that a unplug request for the same dimm
arrives, trying to detach DRCs that were detached by the guest
in the first unplug_request.
BQL doesn't help in this case - the lock will prevent any concurrent
removal from happening until the end of spapr_memory_unplug_request
only. What happens is that the second unplug_request ends up calling
spapr_drc_detach in a DRC that were detached already, causing an
assert error in spapr_drc_detach (e.g
https://bugs.launchpad.net/qemu/+bug/1718118).
spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
to be detached by the guest. When there are no more DRCs left, this
structure is deleted and the pc-dimm unplug handler is called to
finish the process.
This patch reuses the sPAPRDIMMState to allow unplug_request to know
if there is an ongoing unplug process for a given dimm, aborting the
unplug request in this case, by doing the following changes:
- in spapr_lmb_release callback, move the dimm state removal to the
end, after pc-dimm unplug handler. With this change we can check for
the existence of the dimm state to see if the unplug process is
done.
- use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
to check if the dimm state exists. If positive, there is an unplug
operation already in progress for this dimm, meaning that we should
abort it and warn the user about it.
Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For POWER ISA v3.0, the XER bit CA32 needs to be set by the shift
right algebraic instructions whenever the CA bit is to be set. This
change affects the following instructions:
* Shift Right Algebraic Word (sraw[.])
* Shift Right Algebraic Word Immediate (srawi[.])
* Shift Right Algebraic Doubleword (srad[.])
* Shift Right Algebraic Doubleword Immediate (sradi[.])
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At the moment the only POWER9 model which is listed in qemu is v1.0 (aka
"DD1"). This is a very early (read, buggy) version which will never be
released to the public - it was included in qemu only for the convenience
of those doing bringup on the early silicon. For bonus points, we actually
had its PVR incorrect in the table (0x004e0000 instead of 0x004e0100). We
also never actually implemented the differences in behaviour (read, bugs)
that marked DD1 in qemu.
Now that we know the PVR for the substantially better v2.0 (DD2) chip,
include it and make it the default POWER9 in qemu. For the time being we
leave the DD1 definition in place for the poor souls (read, me) who still
need to work with DD1 hardware.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The CAS buffer is provided by SLOF. A broken SLOF could pass a silly
size: either smaller than the diff header, in which case the current
code will try to allocate 16 Exabytes of memory and g_malloc0() will
abort, or bigger than the maximum memory provisioned for SLOF (ie,
40 Megabytes), which doesn't make sense. Both cases indicate that
SLOF has a bug.
Let's print out an explicit error message and exit since rebooting as
we do with other errors would only result in a reset loop.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix format specifier that broke 32-bit builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We don't have any 460 or 460F CPUs in QEMU, so the init functions
are just dead code. Let's simply remove them (translate_init.c
is already big enough without them).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The offset of the root node is guaranteed to be 0.
This doesn't fix anything, it's just trivial cleanup of the two
remaining places where this was done under hw/ppc.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 4f7265f "ppc/ide/macio: Add missing registers" added two extra macio
registers but forgot to add them to the corresponding VMStateDescription.
The version number is bumped accordingly, although this will have little
effect given that the Mac machines are practically unmigratable.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using filter-mirror like the example below where the interface
'ndev0' does not exist on the host, QEMU crashes into segmentation
fault.
$ qemu-system-x86_64 -S -machine pc -netdev user,id=ndev0 -object filter-mirror,id=test-object,netdev=ndev0
This happens because the function filter_mirror_setup() does not check
if the device actually exists and still keep on processing calling
qemu_chr_find(). This patch fixes this issue.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The struct OrIRQState has an unused member field in_irqs.
This is a legacy of earlier versions of the patch; the
code that used it was dropped from the final version of
the code that went into master, but we forgot to delete
the no-longer-used struct field. Do so now.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This fixes a compiler warning:
/qemu/io/channel-websock.c:163:5: error:
function might be possible candidate for ‘gnu_printf’ format attribute
[-Werror=suggest-attribute=format]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Comments explaining why we include a header tend to go bad. This
one's almost comical: not only doesn't qemu-options.hx use
MAP_POPULATE anymore (since commit ef36fa1, v2.0.0, 2013), even the
include it applies to got moved away in commit 02d0e09 (v2.7.0).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The header file was introduced by fbcc3e5 ("qemu-thread: optimize QemuLockCnt
with futexes on Linux", 2017-01-16) without header guards. Add them.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Remove trailing whitespace in qemu-doc.texi, as it causes
reproducibility issues depending on the echo implementation
used by the Makefile.
Reported-By: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Some m68k, qtest and config improvements
# gpg: Signature made Mon 16 Oct 2017 13:38:03 BST
# gpg: using RSA key 0x2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg: aka "Thomas Huth <thuth@redhat.com>"
# gpg: aka "Thomas Huth <huth@tuxfamily.org>"
# gpg: aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* remotes/huth/tags/pull-request-2017-10-16:
default-configs: Enable CONFIG_VMXNET3_PCI only on x86
tests/prom-env: Bump the timeout, and test pseries only in slow mode
tests: use g_new() family of functions
M68K: use g_new() family of functions
hw/m68k: Replace fprintf(stderr, "*\n" with error_report()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
pc, pci, virtio: fixes, features
A bunch of fixes all over the place.
A new vmcore device - the user interface around it is still somewhat
controversial, but I feel most of the code is fine, suggestions can be
addressed by adding patches on top.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 15 Oct 2017 04:02:23 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (26 commits)
tests/pxe: Test more NICs when running in SPEED=slow mode
pc: remove useless hot_add_cpu initialisation
isapc: Remove unnecessary migration compatibility code
virtio-pci: Replace modern_as with direct access to modern_bar
virtio: fix descriptor counting in virtqueue_pop
hw/gen_pcie_root_port: make IO RO 0 on IO disabled
pci: Validate interfaces on base_class_init
xen/pt: Mark TYPE_XEN_PT_DEVICE as hybrid
pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices
pci: Add INTERFACE_PCIE_DEVICE to all PCIe devices
pci: Add interface names to hybrid PCI devices
pci: conventional-pci-device and pci-express-device interfaces
PCI: PCIe access should always be little endian
virtio/pci/migration: Convert to VMState
hw/pci-bridge/pcie_pci_bridge: properly handle MSI unavailability case
pci: allow 32-bit PCI IO accesses to pass through the PCI bridge
virtio/vhost: reset dev->log after syncing
MAINTAINERS: add Dump maintainers
scripts/dump-guest-memory.py: add vmcoreinfo
kdump: set vmcoreinfo location
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Besides being more correct, arbitrarily long instruction allow the
generation of a translation block that spans three pages. This
confuses the generator and even allows ring 3 code to poison the
translation block cache and inject code into other processes that are
in guest ring 3.
This is an improved (and more invasive) fix for commit 30663fd ("tcg/i386:
Check the size of instruction being translated", 2017-03-24). In addition
to being more precise (and generating the right exception, which is #GP
rather than #UD), it distinguishes better between page faults and too long
instructions, as shown by this test case:
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
int main()
{
char *x = mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0);
memset(x, 0x66, 4096);
x[4096] = 0x90;
x[4097] = 0xc3;
char *i = x + 4096 - 15;
mprotect(x + 4096, 4096, PROT_READ|PROT_WRITE);
((void(*)(void)) i) ();
}
... which produces a #GP without the mprotect, and a #PF with it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These take care of advancing s->pc, and will provide a unified point
where to check for the 15-byte instruction length limit.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This should be done by all target and, since commit 53f6672bcf
("gen-icount: use tcg_ctx.tcg_env instead of cpu_env", 2017-06-30),
is causing the NIOS2 target to hang.
This is because the test for "should I exit to the main loop"
was being done with the correct offset to the icount decrementer,
but using TCG temporary 0 (the frame pointer) rather than the
env pointer.
Cc: qemu-stable@nongnu.org
Cc: Marek Vasut <marex@denx.de>
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is used by all PPC targets; we can give the directory its own
Makefile.objs file, and include it directly from target/ppc.
target/s390 can do the same when it starts using it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Coverity pointed out the 'date' is not free()d in the error
path
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The noVNC server sends a header "Connection: keep-alive, Upgrade" which
fails our simple equality test. Split the header on ',', trim whitespace
and then check for 'upgrade' token.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently most outbound I/O on the websock channel gets copied into the
rawoutput buffer, and then immediately copied again into the encoutput
buffer, with a header prepended. Now that qio_channel_websock_encode
accepts a struct iovec, we can trivially remove this bounce buffering
and write directly to encoutput.
In doing so, we also now correctly validate the encoutput size against
the QIO_CHANNEL_WEBSOCK_MAX_BUFFER limit.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of requiring use of another Buffer, pass a struct iovec
into qio_channel_websock_encode, which gives callers more
flexibility in how they process data.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qio_channel_websock_encode method is only used in one place,
everything else calls qio_channel_websock_encode_buffer directly.
It can also be pushed up a level into the qio_channel_websock_writev
method, since every other caller of qio_channel_websock_write_wire
has already filled encoutput.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We must ensure we don't get flooded with ping replies if the outbound
channel is slow. Currently we do this by keeping the ping reply in a
separate temporary buffer and only writing it if the encoutput buffer
is completely empty. This is overly pessimistic, as it is reasonable
to add a ping reply to the encoutput buffer even if it has previous
data in it, as long as that previous data doesn't include a ping
reply.
To track this better, put the ping reply directly into the encoutput
buffer, and then record the size of encoutput at this time in
pong_remain. As we write encoutput to the underlying channel, we
can decrement the pong_remain counter. Once it hits zero, we can
accept further ping replies for transmission.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The websocket GSource is monitoring the size of the rawoutput
buffer to determine if the channel can accepts more writes.
The rawoutput buffer, however, is merely a temporary staging
buffer before data is copied into the encoutput buffer. Thus
its size will always be zero when the GSource runs.
This flaw causes the encoutput buffer to grow without bound
if the other end of the underlying data channel doesn't
read data being sent. This can be seen with VNC if a client
is on a slow WAN link and the guest OS is sending many screen
updates. A malicious VNC client can act like it is on a slow
link by playing a video in the guest and then reading data
very slowly, causing QEMU host memory to expand arbitrarily.
This issue is assigned CVE-2017-15268, publically reported in
https://bugs.launchpad.net/qemu/+bug/1718964
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If an offset of ports is specified to the inet_listen_saddr function(),
and two or more processes tries to bind from these ports at the same time,
occasionally more than one process may be able to bind to the same
port. The condition is detected by listen() but too late to avoid a failure.
This function is called by socket_listen() and used
by all socket listening code in QEMU, so all cases where any form of dynamic
port selection is used should be subject to this issue.
Add code to close and re-establish the socket when this
condition is observed, hiding the race condition from the user.
Also clean up some issues with error handling to allow more
accurate reporting of the cause of an error.
This has been developed and tested by means of the
test-listen unit test in the previous commit.
Enable the test for make check now that it passes.
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Another refactoring step to prepare for fixing the problem
exposed with the test-listen test in the previous commit
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A refactoring step to prepare for the problem
exposed by the test-listen test in the previous commit.
Simplify and reorganize the IPv6 specific extra
measures and move it out of the for loop to increase
code readability. No semantic changes.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
nbd patches for 2017-10-14
- Marc-André Lureau - NBD: use g_new() family of functions
- Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read
# gpg: Signature made Sun 15 Oct 2017 01:38:47 BST
# gpg: using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2017-10-14:
nbd: header constants indenting
nbd/server: simplify reply transmission
nbd/server: refactor nbd_co_send_simple_reply parameters
nbd/server: do not use NBDReply structure
nbd/server: structurize simple reply header sending
nbd: rename some simple-request related objects to be _simple_
block/nbd-client: refactor nbd_co_receive_reply
block/nbd-client: assert qiov len once in nbd_co_request
NBD: use g_new() family of functions
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We were defining TARGET_FS_IOC_GETFLAGS and TARGET_FS_IOC_SETFLAGS
using the host 'long' type in the size field, which meant that
they had the wrong values if the host and guest had different
sized longs. Switch to abi_long instead.
This fixes a bug where these ioctls don't work on 32-bit guests
on 64-bit hosts (and makes the LTP test 'setxattr03' pass
where it did not previously.)
Reported-by: pgndev <pgnet.dev@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
We had a check using TARGET_VIRT_ADDR_SPACE_BITS to make sure
that the allocation coming in from the command-line option was
not too large, but that didn't include target-specific knowledge
about other restrictions on user-space.
Remove several target-specific hacks in linux-user/main.c.
For MIPS and Nios, we can replace them with proper adjustments
to the respective target's TARGET_VIRT_ADDR_SPACE_BITS definition.
For ARM, we had no existing ifdef but I suspect that the current
default value of 0xf7000000 was chosen with this in mind. Define
a workable value in linux-user/arm/, and also document why the
special case is required.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20170708025030.15845-3-rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Most of the users of page_set_flags offset (page, page + len) as
the end points. One might consider this an error, since the other
users do supply an endpoint as the last byte of the region.
However, the first thing that page_set_flags does is round end UP
to the start of the next page. Which means computing page + len - 1
is in the end pointless. Therefore, accept this usage and do not
assert when given the exact size of the vm as the endpoint.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170708025030.15845-2-rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The 32-bit ARM validate_guest_space() check tests whether the
specified -R value leaves enough space for us to put the
commpage in at 0xffff0f00. However it was incorrectly doing
a <= check for the check against (guest_base + guest_size),
which meant that it wasn't permitting the guest space to
butt right up against the commpage.
Fix the comparison, so that -R values all the way up to 0xffff0000
work correctly.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Since O_TMPFILE might differ between guest and host,
add it to the bitmask_transtbl. While at it, fix the definitions
of O_DIRECTORY etc which should arm32 according to kernel sources.
This fixes open14 and openat03 ltp testcases. Fixes:
https://bugs.launchpad.net/qemu/+bug/1709170
The gd_gl_area_scanout_texture must destroy framebuffer if there is
no texture id instead of no framebuffer id.
The effect was a black screen with "-vga virtio -display gtk,gl=on"
options.
The bug was introduce by a4f113fd "gtk: use framebuffer helper functions."
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Message-id: 20171002124052.13829-1-anthoine.bourgeois@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Always use QKeyCode in the InputKeyEvent struct, by converting key
numbers to QKeyCode at the time the event is created. This allows
the code processing / consuming key events to assume QKeyCode is
used. The only place we accept a key number in the InputKeyEvent
struct is with QMP commands sent by the user.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170929101201.21039-6-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Replace the number_to_qcode, qcode_to_number and linux_to_qcode
tables with automatically generated tables.
Missing entries in linux_to_qcode now fixed:
KEY_LINEFEED -> Q_KEY_CODE_LF
KEY_KPEQUAL -> Q_KEY_CODE_KP_EQUALS
KEY_COMPOSE -> Q_KEY_CODE_COMPOSE
KEY_AGAIN -> Q_KEY_CODE_AGAIN
KEY_PROPS -> Q_KEY_CODE_PROPS
KEY_UNDO -> Q_KEY_CODE_UNDO
KEY_FRONT -> Q_KEY_CODE_FRONT
KEY_COPY -> Q_KEY_CODE_COPY
KEY_OPEN -> Q_KEY_CODE_OPEN
KEY_PASTE -> Q_KEY_CODE_PASTE
KEY_CUT -> Q_KEY_CODE_CUT
KEY_HELP -> Q_KEY_CODE_HELP
KEY_MEDIA -> Q_KEY_CODE_MEDIASELECT
In addition, some fixes:
- KEY_PLAYPAUSE now maps to Q_KEY_CODE_AUDIOPLAY, instead of
KEY_PLAYCD. KEY_PLAYPAUSE is defined across almost all scancodes
sets, while KEY_PLAYCD only appears in AT set1, so the former is
a more useful mapping.
Missing entries in qcode_to_number now fixed:
Q_KEY_CODE_AGAIN -> 0x85
Q_KEY_CODE_PROPS -> 0x86
Q_KEY_CODE_UNDO -> 0x87
Q_KEY_CODE_FRONT -> 0x8c
Q_KEY_CODE_COPY -> 0xf8
Q_KEY_CODE_OPEN -> 0x64
Q_KEY_CODE_PASTE -> 0x65
Q_KEY_CODE_CUT -> 0xbc
Q_KEY_CODE_LF -> 0x5b
Q_KEY_CODE_HELP -> 0xf5
Q_KEY_CODE_COMPOSE -> 0xdd
Q_KEY_CODE_KP_EQUALS -> 0x59
Q_KEY_CODE_MEDIASELECT -> 0xed
In addition, some fixes:
- Q_KEY_CODE_MENU was incorrectly mapped to the compose
scancode (0xdd) and is now mapped to 0x9e
- Q_KEY_CODE_FIND was mapped to 0xe065 (Search) instead
of to 0xe041 (Find)
- Q_KEY_CODE_HIRAGANA was mapped to 0x70 (Katakanahiragana)
instead of of 0x77 (Hirigana)
- Q_KEY_CODE_PRINT was mapped to 0xb7 which is not a defined
scan code in AT set 1, it is now mapped to 0x54 (sysrq)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170929101201.21039-5-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The https://gitlab.com/keycodemap/keycodemapdb/ repo contains a
data file mapping between all the different scancode/keycode/keysym
sets that are known, and a tool to auto-generate lookup tables for
different combinations.
It is used by GTK-VNC, SPICE-GTK and libvirt for mapping keys.
Using it in QEMU will let us replace many hand written lookup
tables with auto-generated tables from a master data source,
reducing bugs. Adding new QKeyCodes will now only require the
master table to be updated, all ~20 other tables will be
automatically updated to follow.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170929101201.21039-4-berrange@redhat.com
[ kraxel: fix build ]
[ kraxel: switch repo to qemu.git mirror ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When building the tarball to pass into the docker/vm test image,
the code relies on the git submodules being checked out in the
main checkout.
ie if the developer has not run 'git submodule update --init dtc'
many of the docker tests will fail due to the libfdt package not
being present in the test images. Patchew manually checks out the
dtc submodule in the main git checkout, but this is a bad idea.
When running tests we want to have a predictable set of submodules
included in the source that's tested. The build environment is
completely independent of the developers host OS, so the submodules
the developer has checked out should not be considered relevant for
the tests.
This changes the archive-source.sh script so that it clones the
current git checkout into a temporary directory, checks out a
fixed set of submodules, builds the tarball and finally removes
the temporary git clone.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170929101201.21039-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Currently if DTC is required by configure and not available in the host
OS install, we exit with an error message telling the user to checkout a
git submodule or install the library.
This introduces automatic handling of the git submodule checkout process
and enables it for dtc. This only runs if building from GIT, so users of
release tarballs still need the system library install. The current state
of the git checkout is stashed in .git-submodule-status, and a helper
program is used to determine if this state matches the desired submodule
state. A dependency against 'Makefile' ensures that the submodule state
is refreshed at the start of the build process
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170929101201.21039-2-berrange@redhat.com
[ kraxel: use /bin/sh not bash for scripts/git-submodule.sh ]
[ kraxel: fix Makefile dependencies ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
[fixup] Makefile dep
9p back-end first queries the size of an extended attribute,
allocates space for it via g_malloc() and then retrieves its
value into allocated buffer. Race between querying attribute
size and retrieving its could lead to memory bytes disclosure.
Use g_malloc0() to avoid it.
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Merge tpm 2017/10/04 v3
# gpg: Signature made Fri 13 Oct 2017 12:37:07 BST
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-10-04-3:
specs: Describe the TPM support in QEMU
tpm: Move tpm_cleanup() to right place
tpm: Added support for TPM emulator
tpm-passthrough: move reusable code to utils
tpm-backend: Move realloc_buffer() implementation to tpm-tis model
tpm-backend: Add new API to read backend TpmInfo
tpm-backend: Made few interface methods optional
tpm-backend: Initialize and free data members in it's own methods
tpm-backend: Move thread handling inside TPMBackend
tpm-backend: Remove unneeded member variable from backend class
tpm: Use EMSGSIZE instead of EBADMSG to compile on OpenBSD
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The device can not be instantiated on many non-x86 and just prints
some error messages, e.g.:
$ qemu-system-ppc64 -device vmxnet3 -M g3beige
[vmxnet3][WR][vmxnet3_init_msix]: Failed to initialize MSI-X, error -95
[vmxnet3][WR][vmxnet3_pci_realize]: Failed to initialize MSI-X, configuration is inconsistent.
Since vmxnet3 is a para-virtualized device that is only useful on x86,
it should also only be enabled on the x86 targets.
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
If QEMU has been compiled with the flags --enable-tcg-interpreter and
--enable-debug, the guest is running incredibly slow. The prom-env
test is approximately 10 times slower than normal in this case, and
it takes up to 500 seconds until the test with the pseries machine
finishs. While we should still look for ways to speed up the test
on the pseries machine here, let's bump the timeout to 600 seconds to
allow the test to pass with this unusal configuration already now.
Also move the pseries test into the "slow" category - since it is
really a very slow test.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Replace a large number of the fprintf(stderr, "*\n" calls with
error_report(). The functions were renamed with these commands and then
compiler issues where manually fixed.
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
Some lines where then manually tweaked to pass checkpatch.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Thomas Huth <huth@tuxfamily.org>
[thuth: Remove "qemu:" prefix from strings]
Signed-off-by: Thomas Huth <thuth@redhat.com>
# gpg: Signature made Thu 12 Oct 2017 21:52:28 BST
# gpg: using RSA key 0xDAE8E10975969CE5
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>"
# gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5
* remotes/elmarco/tags/vu-pull-request:
libvhost-user: Support VHOST_USER_SET_SLAVE_REQ_FD
libvhost-user: Update and fix feature and request lists
vhost-user-bridge: Only process received packets on started queues
libvhost-user: vu_queue_started
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The pxe-test is a very good test to excercise NICs, thus we should use
it to test all NICs that can be used by the BIOS for booting via network.
However, to avoid that the default testing time increases too much, the
additional NICs are only tested in the "make check SPEED=slow" mode.
The virtio-net NIC on ppc64 is now also only tested in slow mode, since
the test on ppc64 is really quite slow and we've got test coverage for
virtio-net in big endian mode now on s390x, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since 4458fb3a79 (pc: Eliminate pc_default_machine_options()),
hot_add_cpu is set in pc_machine_class_init(), so we don't
need to set it in pc_q35_machine_options(), pc_i440fx_machine_options()
and xenfv_machine_options(), except to clear it in
pc_i440fx_1_4_machine_opt().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't touch isapc when we change guest ABI and add new entries
to PC_COMPAT_* or new PCMachineClass compat flags. This means
isapc never guaranteed guest ABI and cross-QEMU-version live
migration compatibility. There's no point in keeping code for
kvm-pv-eoi and APIC ID compatibility in pc_init_isa().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The modern bar is accessed now via yet another address space created just
for that purpose and it does not really need FlatView and dispatch tree
as it has a single memory region so it is just a waste of memory. Things
get even worse when there are dozens or hundreds of virtio-pci devices -
since these address spaces are global, changing any of them triggers
rebuilding all address spaces.
This replaces indirect accesses to the modern BAR with a simple lookup
and direct calls to memory_region_dispatch_read/write.
This is expected to save lots of memory at boot time after applying:
[Qemu-devel] [PULL 00/32] Misc changes for 2017-09-22
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While changing the s/g list allocation, commit 3b3b0628
also changed the descriptor counting to count iovec entries
as split by cpu_physical_memory_map(). Previously only the
actual descriptor entries were counted and the split into
the iovec happened afterwards in virtqueue_map().
Count the entries again instead to avoid erroneous
"Looped descriptor" errors.
Reported-by: Hans Middelhoek <h.middelhoek@ospito.nl>
Link: https://forum.proxmox.com/threads/vm-crash-with-memory-hotplug.35904/
Fixes: 3b3b062821 ("virtio: slim down allocation of VirtQueueElements")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
IO_LIMIT and IO_BASE registers should not be writable if
gen_pcie_root_port's io-reserve property is set to 0.
The COMMAND register should have the IO flag read only.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make sure we don't forget to add the Conventional PCI or PCI
Express interface names on PCI device classes in the future.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Revieed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
xen-pt doesn't set the is_express field, but is supposed to be
able to handle PCI Express devices too. Mark it as hybrid.
Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of
TYPE_PCI_DEVICE, except:
1) The ones that already have INTERFACE_PCIE_DEVICE set:
* base-xhci
* e1000e
* nvme
* pvscsi
* vfio-pci
* virtio-pci
* vmxnet3
2) base-pci-bridge
Not all PCI bridges are Conventional PCI devices, so
INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes
that are actually Conventional PCI:
* dec-21154-p2p-bridge
* i82801b11-bridge
* pbm-bridge
* pci-bridge
The direct subtypes of base-pci-bridge not touched by this patch
are:
* xilinx-pcie-root: Already marked as PCIe-only.
* pcie-pci-bridge: Already marked as PCIe-only.
* pcie-port: all non-abstract subtypes of pcie-port are already
marked as PCIe-only devices.
3) megasas-base
Not all megasas devices are Conventional PCI devices, so the
interface names are added to the subclasses registered by
megasas_register_types(), according to information in the
megasas_devices[] array.
"megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add
INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas".
Acked-by: Alberto Garcia <berto@igalia.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The following devices support both PCI Express and Conventional
PCI, by including special code to handle the QEMU_PCI_CAP_EXPRESS
flag and/or conditional pcie_endpoint_cap_init() calls:
* vfio-pci (is_express=1, but legacy PCI handled by
vfio_populate_device())
* vmxnet3 (is_express=0, but PCIe handled by vmxnet3_realize())
* pvscsi (is_express=0, but PCIe handled by pvscsi_realize())
* virtio-pci (is_express=0, but PCIe handled by
virtio_pci_dc_realize(), and additional legacy PCI code at
virtio_pci_realize())
* base-xhci (is_express=1, but pcie_endpoint_cap_init() call
is conditional on pci_bus_is_express(dev->bus)
* Note that xhci does not clear QEMU_PCI_CAP_EXPRESS like the
other hybrid devices
Cc: Dmitry Fleytman <dmitry@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Those two interfaces will be used to indicate which device types
support Conventional PCI or PCI Express buses. Management
software will be able to use the qom-list-types QMP command to
query that information.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCIe busses are always little endian, so set the endianness of the
memory region to little endian rather than native such that operations
work as expected on big endian targets.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Convert the 'modern_state' part of virtio-pci to modern migration
macros.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
QEMU with the pcie-pci-bridge device crashes if the guest board doesn't support MSI,
e.g. 'qemu-system-ppc64 -M prep -device pcie-pci-bridge'.
This is caused by wrong pcie-pci-bridge instantiation error handling. This patch fixes this issue
by falling back to legacy INTx if MSI is not available.
Also set the bridge's 'msi' property default value to 'auto' in order to trigger errors
only when user explicitly set msi=on.
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Whilst the underlying PCI bridge implementation supports 32-bit PCI IO
accesses, unfortunately they are truncated at the legacy 64K limit.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_log_put() is called to decomission the dirty log between qemu and
a vhost device when stopping the device. Such a call can happen from
migration_completion().
Present code sets dev->log_size to zero too early in vhost_log_put(),
causing the sync check to always return false. As a consequence, the
last pass on the dirty bitmap never happens at the end of migration.
If a vhost device was busy (writing to guest memory) until the last
moments before vhost_virtqueue_stop(), this error will result in guest
memory corruption (at least) following migrations.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a vmcoreinfo ELF note in the dump if vmcoreinfo device has the
memory location details.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
kdump header provides offset and size of the vmcoreinfo content,
append it if available (skip the ELF note header).
crash-7.1.9 was the first version that started looking in the
vmcoreinfo data for phys_base instead of in the kdump_sub_header.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the guest note is VMCOREINFO, try to get phys_base from it.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Read the guest ELF PT_NOTE from guest memory when fw_cfg
etc/vmcoreinfo entry provides the location, and write it as an
additional note in the dump.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
See docs/specs/vmcoreinfo.txt for details.
"etc/vmcoreinfo" fw_cfg entry is added when using "-device vmcoreinfo".
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reintroduce the write callback that was removed when write support was
removed in commit 023e314856.
Contrary to the previous callback implementation, the write_cb
callback is called whenever a write happened, so handlers must be
ready to handle partial write as necessary.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ioh3420_interrupts_init() pass error message to local_err, then
propagate it to errp by error_propagate(), which is not necessary.
So eliminate it and pass errp directly instead of local_err.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On commit f8cd1b02 ("pci: Convert to realize"), no error_set*()
call was added for the pcie_chassis_add_slot() error case.
pcie_chassis_add_slot() errors get ignored, making QEMU crash
later. e.g.:
$ qemu-system-x86_64 -device ioh3420 -device xio3130-downstream
qemu-system-x86_64: memory.c:2166: memory_region_del_subregion: Assertion `subregion->container == mr' failed.
Aborted (core dumped)
Fix it by reporting the error using error_setg().
Fixes: f8cd1b0201
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
NBDReply structure will be upgraded in future patches to handle both
simple and structured replies and will be used only in the client
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-6-vsementsov@virtuozzo.com>
[eblake: rebase to tweaks earlier in series]
Signed-off-by: Eric Blake <eblake@redhat.com>
BlockDriverState has a bdrv_co_drain() callback but no equivalent for
the end of the drain. The throttle driver (block/throttle.c) needs a way
to mark the end of the drain in order to toggle io_limits_disabled
correctly, thus bdrv_co_drain_end is needed.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds a description of the current TPM support in QEMU
to the specs.
Several public specs are referenced via their landing page on the
trustedcomputinggroup.org website.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
This change introduces a new TPM backend driver that can communicate with
swtpm(software TPM emulator) using unix domain socket interface. QEMU talks to
the TPM emulator using QEMU's socket-based chardev backend device.
Swtpm uses two Unix sockets for communications, one for plain TPM commands and
responses, and one for out-of-band control messages. QEMU passes the data
socket to be used over the control channel.
The swtpm and associated tools can be found here:
https://github.com/stefanberger/swtpm
The swtpm's control channel protocol specification can be found here:
https://github.com/stefanberger/swtpm/wiki/Control-Channel-Specification
Usage:
# setup TPM state directory
mkdir /tmp/mytpm
chown -R tss:root /tmp/mytpm
/usr/bin/swtpm_setup --tpm-state /tmp/mytpm --createek
# Ask qemu to use TPM emulator with given tpm state directory
qemu-system-x86_64 \
[...] \
-chardev socket,id=chrtpm,path=/tmp/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis,tpmdev=tpm0 \
[...]
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
TPM configuration options are backend implementation details and shall not be
part of base TPMBackend object, and these shall not be accessed directly outside
of the class, hence added a new interface method, get_tpm_options() to
TPMDriverOps., which shall be implemented by the derived classes to return
configured tpm options.
A new tpm backend api - tpm_backend_query_tpm() which uses _get_tpm_options() to
prepare TpmInfo.
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This allows backend implementations left optional interface methods.
For mandatory methods assertion checks added.
Took the opportunity to remove unused methods:
- tpm_backend_get_desc()
- TPMDriverOps->handle_startup_error
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger<stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Initialize and free TPMBackend data members in it's own instance_init() and
instance_finalize methods.
Took the opportunity to remove unneeded destroy() method from TpmDriverOps
interface as TPMBackend is a Qemu Object, we can use object_unref() inplace of
tpm_backend_destroy() to free the backend object, hence removed destroy() from
TPMDriverOps interface.
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Move thread handling inside TPMBackend, this way backend implementations need
not to maintain their own thread life cycle, instead they needs to implement
'handle_request()' class method that always been called from a thread.
This change made tpm_backend_int.h kind of useless, hence removed it.
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
TPMDriverOps inside TPMBackend is not required, as it is supposed to be a class
member. The only possible reason for keeping in TPMBackend was, to get the
backend type in tpm.c where dedicated backend api, tpm_backend_get_type() is
present.
Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
EBADMSG was only added to OpenBSD very recently. To make QEMU compilable
on older OpenBSD versions use EMSGSIZE instead when a mismatch between
number of received bytes and message size indicated in the header was
found.
Return -EMSGSIZE and convert all other errnos in the same functions to
return the negative errno.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
target-arm queue:
* v8M: SG, BLXNS, secure-return
* v8M: fixes for coverity issues in previous patches
* arm: fix armv7m_init() declaration to match definition
* watchdog/aspeed: fix variable type to store reload value
# gpg: Signature made Thu 12 Oct 2017 17:02:49 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171012:
nvic: Fix miscalculation of offsets into ITNS array
nvic: Add missing 'break'
target/arm: Implement SG instruction corner cases
target/arm: Support some Thumb insns being always unconditional
target-arm: Simplify insn_crosses_page()
target/arm: Pull Thumb insn word loads up to top level
target-arm: Don't check for "Thumb2 or M profile" for not-Thumb1
target/arm: Implement secure function return
target/arm: Implement BLXNS
target/arm: Implement SG instruction
target/arm: Add M profile secure MMU index values to get_a32_user_mem_index()
arm: fix armv7m_init() declaration to match definition
watchdog/aspeed: fix variable type to store reload value
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This calculation of the first exception vector in
the ITNS<n> register being accessed:
int startvec = 32 * (offset - 0x380) + NVIC_FIRST_IRQ;
is incorrect, because offset is in bytes, so we only want
to multiply by 8.
Spotted by Coverity (CID 1381484, CID 1381488), though it is
not correct that it actually overflows the buffer, because
we have a 'startvec + i < s->num_irq' guard.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507650856-11718-1-git-send-email-peter.maydell@linaro.org
The common situation of the SG instruction is that it is
executed from S&NSC memory by a CPU in NS state. That case
is handled by v7m_handle_execute_nsc(). However the instruction
also has defined behaviour in a couple of other cases:
* SG instruction in NS memory (behaves as a NOP)
* SG in S memory but CPU already secure (clears IT bits and
does nothing else)
* SG instruction in v8M without Security Extension (NOP)
These can be implemented in translate.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-10-git-send-email-peter.maydell@linaro.org
A few Thumb instructions are always unconditional even inside an
IT block (as opposed to being UNPREDICTABLE if used inside an
IT block): BKPT, the v8M SG instruction, and the A profile
HLT (debug halt) instruction.
This means we need to suppress the jump-over-instruction-on-condfail
code generation (though the IT state still advances as usual and
subsequent insns in the IT block may be conditional).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-9-git-send-email-peter.maydell@linaro.org
Recent changes have left insn_crosses_page() more complicated
than it needed to be:
* it's only called from thumb_tr_translate_insn() so we know
for certain that we're looking at a Thumb insn
* the caller's check for dc->pc >= dc->next_page_start - 3
means that dc->pc can't possibly be 4 aligned, so there's
no need to check that (the check was partly there to ensure
that we didn't treat an ARM insn as Thumb, I think)
* we now have thumb_insn_is_16bit() which lets us do a precise
check of the length of the next insn, rather than opencoding
an inaccurate check
Simplify it down to just loading the first half of the insn
and calling thumb_insn_is_16bit() on it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-8-git-send-email-peter.maydell@linaro.org
Refactor the Thumb decode to do the loads of the instruction words at
the top level rather than only loading the second half of a 32-bit
Thumb insn in the middle of the decode.
This is simple apart from the awkward case of Thumb1, where the
BL/BLX prefix and suffix instructions live in what in Thumb2 is the
32-bit insn space. To handle these we decode enough to identify
whether we're looking at a prefix/suffix that we handle as a 16 bit
insn, or a prefix that we're going to merge with the following suffix
to consider as a 32 bit insn. The translation of the 16 bit cases
then moves from disas_thumb2_insn() to disas_thumb_insn().
The refactoring has the benefit that we don't need to pass the
CPUARMState* down into the decoder code any more, but the major
reason for doing this is that some Thumb instructions must be always
unconditional regardless of the IT state bits, so we need to know the
whole insn before we emit the "skip this insn if the IT bits and cond
state tell us to" code. (The always unconditional insns are BKPT,
HLT and SG; the last of these is 32 bits.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-7-git-send-email-peter.maydell@linaro.org
The code which implements the Thumb1 split BL/BLX instructions
is guarded by a check on "not M or THUMB2". All we really need
to check here is "not THUMB2" (and we assume that elsewhere too,
eg in the ARCH(6T2) test that UNDEFs the Thumb2 insns).
This doesn't change behaviour because all M profile cores
have Thumb2 and so ARM_FEATURE_M implies ARM_FEATURE_THUMB2.
(v6M implements a very restricted subset of Thumb2, but we
can cross that bridge when we get to it with appropriate
feature bits.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-6-git-send-email-peter.maydell@linaro.org
Secure function return happens when a non-secure function has been
called using BLXNS and so has a particular magic LR value (either
0xfefffffe or 0xfeffffff). The function return via BX behaves
specially when the new PC value is this magic value, in the same
way that exception returns are handled.
Adjust our BX excret guards so that they recognize the function
return magic number as well, and perform the function-return
unstacking in do_v7m_exception_exit().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1507556919-24992-5-git-send-email-peter.maydell@linaro.org
I've recently seen this with valgrind while running the HMP tester:
==22373== Conditional jump or move depends on uninitialised value(s)
==22373== at 0x4A41FD: arm_disas_set_info (cpu.c:504)
==22373== by 0x3867A7: monitor_disas (disas.c:390)
==22373== by 0x38E80E: memory_dump (monitor.c:1339)
==22373== by 0x38FA43: handle_hmp_command (monitor.c:3123)
==22373== by 0x38FB9E: qmp_human_monitor_command (monitor.c:613)
==22373== by 0x4E3124: qmp_marshal_human_monitor_command (qmp-marshal.c:1736)
==22373== by 0x769678: do_qmp_dispatch (qmp-dispatch.c:104)
==22373== by 0x769678: qmp_dispatch (qmp-dispatch.c:131)
==22373== by 0x38B734: handle_qmp_command (monitor.c:3853)
==22373== by 0x76ED07: json_message_process_token (json-streamer.c:105)
==22373== by 0x78D40A: json_lexer_feed_char (json-lexer.c:323)
==22373== by 0x78D4CD: json_lexer_feed (json-lexer.c:373)
==22373== by 0x38A08D: monitor_qmp_read (monitor.c:3895)
And indeed, in monitor_disas, the read_memory_inner_func variable was
not initialized, but arm_disas_set_info() expects this to be NULL
or a valid pointer. Let's properly set this to NULL in the
INIT_DISASSEMBLE_INFO to fix it in all functions that use the
disassemble_info struct.
Fixes: f7478a92dd ("Fix Thumb-1 BE32 execution")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1506524313-20037-1-git-send-email-thuth@redhat.com>
heterogeneous cpus are not supported and hotplugging different
cpu model crashes QEMU:
qemu-system-x86_64 -cpu qemu64 -smp 1,maxcpus=2
(qemu) device_add host-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=foo
(qemu) info cpus
error: failed to get MSR 0x38d
qemu-system-x86_64: target/i386/kvm.c:2121: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)
Gracefully fail hotplug process in case of user mistake.
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507638879-200718-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch let address_space_get_iotlb_entry() to use the newly
introduced page_mask parameter in flatview_do_translate(). Then we
will be sure the IOTLB can be aligned to page mask, also we should
nicely support huge pages now when introducing a764040.
Fixes: a764040 ("exec: abstract address_space_do_translate()")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20171010094247.10173-3-maxime.coquelin@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function is originally used for flatview_space_translate() and what
we care about most is (xlat, plen) range. However for iotlb requests, we
don't really care about "plen", but the size of the page that "xlat" is
located on. While, plen cannot really contain this information.
A simple example to show why "plen" is not good for IOTLB translations:
E.g., for huge pages, it is possible that guest mapped 1G huge page on
device side that used this GPA range:
0x100000000 - 0x13fffffff
Then let's say we want to translate one IOVA that finally mapped to GPA
0x13ffffe00 (which is located on this 1G huge page). Then here we'll
get:
(xlat, plen) = (0x13fffe00, 0x200)
So the IOTLB would be only covering a very small range since from
"plen" (which is 0x200 bytes) we cannot tell the size of the page.
Actually we can really know that this is a huge page - we just throw the
information away in flatview_do_translate().
This patch introduced "page_mask" optional parameter to capture that
page mask info. Also, I made "plen" an optional parameter as well, with
some comments for the whole function.
No functional change yet.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20171010094247.10173-2-maxime.coquelin@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The tcp_chr_free_connection & tcp_chr_disconnect methods both
skip all of their cleanup work unless the 's->connected' flag
is set. This flag is set when the incoming client connection
is ready to use. Crucially this is *after* the TLS handshake
has been completed. So if the TLS handshake fails and we try
to cleanup the failed client, all the cleanup is skipped as
's->connected' is still false.
The only important thing that should be skipped in this case
is sending of the CHR_EVENT_CLOSED, because we never got as
far as sending the corresponding CHR_EVENT_OPENED. Every other
bit of cleanup can be robust against being called even when
s->connected is false.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20171005155057.7664-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Linux kernel will query the ATA IDENTITY DEVICE data, word 217
to determine the rotations per minute of the disk. If this has
the value 1, it is taken to be an SSD and so Linux sets the
'rotational' flag to 0 for the I/O queue and will stop using that
disk as a source of random entropy. Other operating systems may
also take into account rotation rate when setting up default
behaviour.
Mgmt apps should be able to set the rotation rate for virtualized
block devices, based on characteristics of the host storage in use,
so that the guest OS gets sensible behaviour out of the box. This
patch thus adds a 'rotation-rate' parameter for 'ide-hd' device
types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20171004114008.14849-3-berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Linux kernel will query the SCSI "Block device characteristics"
VPD to determine the rotations per minute of the disk. If this has
the value 1, it is taken to be an SSD and so Linux sets the
'rotational' flag to 0 for the I/O queue and will stop using that
disk as a source of random entropy. Other operating systems may
also take into account rotation rate when setting up default
behaviour.
Mgmt apps should be able to set the rotation rate for virtualized
block devices, based on characteristics of the host storage in use,
so that the guest OS gets sensible behaviour out of the box. This
patch thus adds a 'rotation-rate' parameter for 'scsi-hd' and
'scsi-block' device types. For the latter, this parameter will be
ignored unless the host device has TYPE_DISK.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20171004114008.14849-2-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
stgit produces patch files that lack the ".patch" extensions. Others
might be using ".diff" too. But since we are already limiting source files
to only a handful of extensions, we can reuse that in the mode selection
code.
While at it, do not match "../foo" as a branch name.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All scripts that use the QEMUMachine and QEMUQtestMachine classes
(device-crash-test, tests/migration/*, iotests.py, basevm.py)
already configure logging.
The basicConfig() call inside QEMUMachine.__init__() is being
kept just to make sure a script would still work if it didn't
configure logging.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171005172013.3098-4-ehabkost@redhat.com>
Reviewed-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Use logging module for the QMP debug messages. The only scripts
that set debug=True are iotests.py and guestperf/engine.py, and
they already call logging.basicConfig() to set up logging.
Scripts that don't configure logging are safe as long as they
don't need debugging output, because debug messages don't trigger
the "No handlers could be found for logger" message from the
Python logging module.
Scripts that already configure logging but don't use debug=True
(e.g. scripts/vm/basevm.py) will get QMP debugging enabled for
free.
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Fam Zheng <famz@redhat.com>
Cc: "Philippe Mathieu-Daudé" <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171005172013.3098-3-ehabkost@redhat.com>
Reviewed-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Just setting level=DEBUG when debug is enabled is not enough: we
need to set up a log handler if we want debug messages generated
using logging.getLogger(...).debug() to be printed.
This was not a problem before because logging.debug() calls
logging.basicConfig() implicitly, but it's safer to not rely on
that.
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Fam Zheng <famz@redhat.com>
Cc: "Philippe Mathieu-Daudé" <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170927130339.21444-4-ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Queued TCG patches
# gpg: Signature made Tue 10 Oct 2017 20:23:12 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20171010:
tcg/mips: delete commented out extern keyword.
tcg: define TCG_HIGHWATER
util: move qemu_real_host_page_size/mask to osdep.h
tcg: take .helpers out of TCGContext
tci: move tci_regs to tcg_qemu_tb_exec's stack
exec-all: extract tb->tc_* into a separate struct tc_tb
translate-all: define and use DEBUG_TB_CHECK_GATE
translate-all: define and use DEBUG_TB_INVALIDATE_GATE
exec-all: introduce TB_PAGE_ADDR_FMT
translate-all: define and use DEBUG_TB_FLUSH_GATE
exec-all: bring tb->invalid into tb->cflags
tcg: consolidate TB lookups in tb_lookup__cpu_state
tcg: remove addr argument from lookup_tb_ptr
tcg/mips: constify tcg_target_callee_save_regs
tcg/i386: constify tcg_target_callee_save_regs
cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
translate-all: make have_tb_lock static
exec-all: fix typos in TranslationBlock's documentation
tcg: fix corruption of code_time profiling counter upon tb_flush
cputlb: bring back tlb_flush_count under !TLB_DEBUG
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
It is unneeded in the VusDev device structure, and also simplify a bit
the code.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This file implements a bridge from the vu_init API of libvhost-user to
GSource, so that libvhost-user can be used inside a GLib main loop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
- PLOG is unused
- code is compiled out unless debug is enabled
- logging is too verbose
- you can pipe to ts to have timestamp if needed, or use structured
logging with more recent glib
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Use the one from the source with casting, like any other glib source.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
There is no need to include hw/virtio/virtio-scsi.h, then the conflict
with SCSI_XFER enum goes away.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
It is confusing and could easily conflict with future versions.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
- use Vus prefix consistently
- use CamelCase, since that's glib & libvhost-user style
- avoid _t postfix, usually for system headers
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
There is no code to support more than 1 yet, no need for that today.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Instead of a preliminary check, add an assert to the function that has
the pre-condition.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Always remove the unix path when leaving the program (instead of when
freeing scsi_dev). Note that unix_sock_new() also unlink() exisiting
path before creating the socket.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Use g_new/g_free instead of plain malloc. This simplify a bit memory
handling since glib will abort if it cannot allocate.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This simplify a little bit memory management in the following patches.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
libvhost-user is meant to be free of glib dependency. Make sure it is
by droping qemu/osdep.h (which included glib.h)
This fixes a bad malloc()/g_free() pair.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
These only depend on the host and therefore belong in the common
osdep, not in a target-dependent object.
While at it, query the host during an init constructor, which guarantees
the page size will be well-defined throughout the execution of the program.
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
Compile-tested for all targets on an x86_64 host.
Suggested-by: Richard Henderson <rth@twiddle.net>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This prevents bit rot by ensuring the debug code is compiled when
building a user-mode target.
Unfortunately the helpers are user-mode-only so we cannot fully
get rid of the ifdef checks. Add a comment to explain this.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
And fix the following warning when DEBUG_TB_INVALIDATE is enabled
in translate-all.c:
CC mipsn32-linux-user/accel/tcg/translate-all.o
/data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
/data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=]
printf("protecting code page: 0x" TARGET_FMT_lx "\n",
^
cc1: all warnings being treated as errors
/data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This gets rid of some ifdef checks while ensuring that the debug code
is compiled, which prevents bit rot.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.
This change paves the way to having a common "tb_lookup" function.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reusing the have_tb_lock name, which is also defined in translate-all.c,
makes code reviewing unnecessarily harder.
Avoid potential confusion by renaming the local have_tb_lock variable
to something else.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Whenever there is an overflow in code_gen_buffer (e.g. we run out
of space in it and have to flush it), the code_time profiling counter
ends up with an invalid value (that is, code_time -= profile_getclock(),
without later on getting += profile_getclock() due to the goto).
Fix it by using the ti variable, so that we only update code_time
when there is no overflow. Note that in case there is an overflow
we fail to account for the elapsed coding time, but this is quite rare
so we can probably live with it.
"info jit" before/after, roughly at the same time during debian-arm bootup:
- before:
Statistics:
TB flush count 1
TB invalidate count 4665
TLB flush count 998
JIT cycles -615191529184601 (-256329.804 s at 2.4 GHz)
translated TBs 302310 (aborted=0 0.0%)
avg ops/TB 48.4 max=438
deleted ops/TB 8.54
avg temps/TB 32.31 max=38
avg host code/TB 361.5
avg search data/TB 24.5
cycles/op -42014693.0
cycles/in byte -121444900.2
cycles/out byte -5629031.1
cycles/search byte -83114481.0
gen_interm time -0.0%
gen_code time 100.0%
optim./code time -0.0%
liveness/code time -0.0%
cpu_restore count 6236
avg cycles 110.4
- after:
Statistics:
TB flush count 1
TB invalidate count 4665
TLB flush count 1010
JIT cycles 1996899624 (0.832 s at 2.4 GHz)
translated TBs 297961 (aborted=0 0.0%)
avg ops/TB 48.5 max=438
deleted ops/TB 8.56
avg temps/TB 32.31 max=38
avg host code/TB 361.8
avg search data/TB 24.5
cycles/op 138.2
cycles/in byte 398.4
cycles/out byte 18.5
cycles/search byte 273.1
gen_interm time 14.0%
gen_code time 86.0%
optim./code time 19.4%
liveness/code time 10.3%
cpu_restore count 6372
avg cycles 111.0
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried
the increment of tlb_flush_count under TLB_DEBUG. This results in
"info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.
Besides, under MTTCG tlb_flush_count is updated by several threads,
so in order not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.
This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.
Note that this change also requires updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.
All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch add a MachineClass element that can be set in the machine C
code to specify a list of supported CPU types. If the supported CPU
types are specified the user enter CPU (by -cpu at runtime) is checked
against the supported types and QEMU exits if they aren't supported.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <b8474e9d2e0a219d9bac901342f983b13d009301.1507059418.git.alistair.francis@xilinx.com>
[ehabkost: removed assert(), rewrote comment]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Block layer patches
# gpg: Signature made Fri 06 Oct 2017 16:52:59 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (54 commits)
block/mirror: check backing in bdrv_mirror_top_flush
qcow2: truncate the tail of the image file after shrinking the image
qcow2: fix return error code in qcow2_truncate()
iotests: Fix 195 if IMGFMT is part of TEST_DIR
block/mirror: check backing in bdrv_mirror_top_refresh_filename
block: support passthrough of BDRV_REQ_FUA in crypto driver
block: convert qcrypto_block_encrypt|decrypt to take bytes offset
block: convert crypto driver to bdrv_co_preadv|pwritev
block: fix data type casting for crypto payload offset
crypto: expose encryption sector size in APIs
block: use 1 MB bounce buffers for crypto instead of 16KB
iotests: Add test 197 for covering copy-on-read
block: Perform copy-on-read in loop
block: Add blkdebug hook for copy-on-read
iotests: Restore stty settings on completion
block: Uniform handling of 0-length bdrv_get_block_status()
qemu-io: Add -C for opening with copy-on-read
commit: Remove overlay_bs
qemu-iotests: Test commit block job where top has two parents
qemu-iotests: Allow QMP pretty printing in common.qemu
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target-arm:
* v8M: more preparatory work
* nvic: reset properly rather than leaving the nvic in a weird state
* xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
* sd: fix out-of-bounds check for multi block reads
* arm: Fix SMC reporting to EL2 when QEMU provides PSCI
# gpg: Signature made Fri 06 Oct 2017 16:58:15 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171006:
nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
target/arm: Factor out "get mmuidx for specified security state"
target/arm: Fix calculation of secure mm_idx values
target/arm: Implement security attribute lookups for memory accesses
nvic: Implement Security Attribution Unit registers
target/arm: Add v8M support to exception entry code
target/arm: Add support for restoring v8M additional state context
target/arm: Update excret sanity checks for v8M
target/arm: Add new-in-v8M SFSR and SFAR
target/arm: Don't warn about exception return with PC low bit set for v8M
target/arm: Warn about restoring to unaligned stack
target/arm: Check for xPSR mismatch usage faults earlier for v8M
target/arm: Restore SPSEL to correct CONTROL register on exception return
target/arm: Restore security state on exception return
target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
target/arm: Don't switch to target stack early in v7M exception return
nvic: Clear the vector arrays and prigroup on reset
hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
hw/sd: fix out-of-bounds check for multi block reads
arm: Fix SMC reporting to EL2 when QEMU provides PSCI
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For the SG instruction and secure function return we are going
to want to do memory accesses using the MMU index of the CPU
in secure state, even though the CPU is currently in non-secure
state. Write arm_v7m_mmu_idx_for_secstate() to do this job,
and use it in cpu_mmu_index().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-17-git-send-email-peter.maydell@linaro.org
In cpu_mmu_index() we try to do this:
if (env->v7m.secure) {
mmu_idx += ARMMMUIdx_MSUser;
}
but it will give the wrong answer, because ARMMMUIdx_MSUser
includes the 0x40 ARM_MMU_IDX_M field, and so does the
mmu_idx we're adding to, and we'll end up with 0x8n rather
than 0x4n. This error is then nullified by the call to
arm_to_core_mmu_idx() which masks out the high part, but
we're about to factor out the code that calculates the
ARMMMUIdx values so it can be used without passing it through
arm_to_core_mmu_idx(), so fix this bug first.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-16-git-send-email-peter.maydell@linaro.org
Implement the security attribute lookups for memory accesses
in the get_phys_addr() functions, causing these to generate
various kinds of SecureFault for bad accesses.
The major subtlety in this code relates to handling of the
case when the security attributes the SAU assigns to the
address don't match the current security state of the CPU.
In the ARM ARM pseudocode for validating instruction
accesses, the security attributes of the address determine
whether the Secure or NonSecure MPU state is used. At face
value, handling this would require us to encode the relevant
bits of state into mmu_idx for both S and NS at once, which
would result in our needing 16 mmu indexes. Fortunately we
don't actually need to do this because a mismatch between
address attributes and CPU state means either:
* some kind of fault (usually a SecureFault, but in theory
perhaps a UserFault for unaligned access to Device memory)
* execution of the SG instruction in NS state from a
Secure & NonSecure code region
The purpose of SG is simply to flip the CPU into Secure
state, so we can handle it by emulating execution of that
instruction directly in arm_v7m_cpu_do_interrupt(), which
means we can treat all the mismatch cases as "throw an
exception" and we don't need to encode the state of the
other MPU bank into our mmu_idx values.
This commit doesn't include the actual emulation of SG;
it also doesn't include implementation of the IDAU, which
is a per-board way to specify hard-coded memory attributes
for addresses, which override the CPU-internal SAU if they
specify a more secure setting than the SAU is programmed to.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-15-git-send-email-peter.maydell@linaro.org
Implement the register interface for the SAU: SAU_CTRL,
SAU_TYPE, SAU_RNR, SAU_RBAR and SAU_RLAR. None of the
actual behaviour is implemented here; registers just
read back as written.
When the CPU definition for Cortex-M33 is eventually
added, its initfn will set cpu->sau_sregion, in the same
way that we currently set cpu->pmsav7_dregion for the
M3 and M4.
Number of SAU regions is typically a configurable
CPU parameter, but this patch doesn't provide a
QEMU CPU property for it. We can easily add one when
we have a board that requires it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-14-git-send-email-peter.maydell@linaro.org
Add support for v8M and in particular the security extension
to the exception entry code. This requires changes to:
* calculation of the exception-return magic LR value
* push the callee-saves registers in certain cases
* clear registers when taking non-secure exceptions to avoid
leaking information from the interrupted secure code
* switch to the correct security state on entry
* use the vector table for the security state we're targeting
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-13-git-send-email-peter.maydell@linaro.org
Attempting to do an exception return with an exception frame that
is not 8-aligned is UNPREDICTABLE in v8M; warn about this.
(It is not UNPREDICTABLE in v7M, and our implementation can
handle the merely-4-aligned case fine, so we don't need to
do anything except warn.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-8-git-send-email-peter.maydell@linaro.org
ARM v8M specifies that the INVPC usage fault for mismatched
xPSR exception field and handler mode bit should be checked
before updating the PSR and SP, so that the fault is taken
with the existing stack frame rather than by pushing a new one.
Perform this check in the right place for v8M.
Since v7M specifies in its pseudocode that this usage fault
check should happen later, we have to retain the original
code for that check rather than being able to merge the two.
(The distinction is architecturally visible but only in
very obscure corner cases like attempting an invalid exception
return with an exception frame in read only memory.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-7-git-send-email-peter.maydell@linaro.org
On exception return for v8M, the SPSEL bit in the EXC_RETURN magic
value should be restored to the SPSEL bit in the CONTROL register
banked specified by the EXC_RETURN.ES bit.
Add write_v7m_control_spsel_for_secstate() which behaves like
write_v7m_control_spsel() but allows the caller to specify which
CONTROL bank to use, reimplement write_v7m_control_spsel() in
terms of it, and use it in exception return.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-6-git-send-email-peter.maydell@linaro.org
Now that we can handle the CONTROL.SPSEL bit not necessarily being
in sync with the current stack pointer, we can restore the correct
security state on exception return. This happens before we start
to read registers off the stack frame, but after we have taken
possible usage faults for bad exception return magic values and
updated CONTROL.SPSEL.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-5-git-send-email-peter.maydell@linaro.org
In the v7M architecture, there is an invariant that if the CPU is
in Handler mode then the CONTROL.SPSEL bit cannot be nonzero.
This in turn means that the current stack pointer is always
indicated by CONTROL.SPSEL, even though Handler mode always uses
the Main stack pointer.
In v8M, this invariant is removed, and CONTROL.SPSEL may now
be nonzero in Handler mode (though Handler mode still always
uses the Main stack pointer). In preparation for this change,
change how we handle this bit: rename switch_v7m_sp() to
the now more accurate write_v7m_control_spsel(), and make it
check both the handler mode state and the SPSEL bit.
Note that this implicitly changes the point at which we switch
active SP on exception exit from before we pop the exception
frame to after it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-4-git-send-email-peter.maydell@linaro.org
Currently our M profile exception return code switches to the
target stack pointer relatively early in the process, before
it tries to pop the exception frame off the stack. This is
awkward for v8M for two reasons:
* in v8M the process vs main stack pointer is not selected
purely by the value of CONTROL.SPSEL, so updating SPSEL
and relying on that to switch to the right stack pointer
won't work
* the stack we should be reading the stack frame from and
the stack we will eventually switch to might not be the
same if the guest is doing strange things
Change our exception return code to use a 'frame pointer'
to read the exception frame rather than assuming that we
can switch the live stack pointer this early.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-3-git-send-email-peter.maydell@linaro.org
Reset for devices does not include an automatic clear of the
device state (unlike CPU state, where most of the state
structure is cleared to zero). Add some missing initialization
of NVIC state that meant that the device was left in the wrong
state if the guest did a warm reset.
(In particular, since we were resetting the computed state like
s->exception_prio but not all the state it was computed
from like s->vectors[x].active, the NVIC wound up in an
inconsistent state that could later trigger assertion failures.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1506092407-26985-2-git-send-email-peter.maydell@linaro.org
The device uses serial_hds in its realize function and thus can't be
used twice. Apart from that, the comma in its name makes it quite hard
to use for the user anyway, since a comma is normally used to separate
the device name from its properties when using the "-device" parameter
or the "device_add" HMP command.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1506441116-16627-1-git-send-email-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block patches
# gpg: Signature made Fri Oct 6 16:30:57 2017 CEST
# gpg: using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-10-06:
block/mirror: check backing in bdrv_mirror_top_flush
qcow2: truncate the tail of the image file after shrinking the image
qcow2: fix return error code in qcow2_truncate()
iotests: Fix 195 if IMGFMT is part of TEST_DIR
block/mirror: check backing in bdrv_mirror_top_refresh_filename
block: support passthrough of BDRV_REQ_FUA in crypto driver
block: convert qcrypto_block_encrypt|decrypt to take bytes offset
block: convert crypto driver to bdrv_co_preadv|pwritev
block: fix data type casting for crypto payload offset
crypto: expose encryption sector size in APIs
block: use 1 MB bounce buffers for crypto instead of 16KB
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing
qemu's command line and _filter_testdir only afterwards. Therefore, if
the image format is part of the test directory path, _filter_testdir
will no longer apply and the actual output will differ from the
reference output even in case of success.
For example, TEST_DIR might be "/tmp/test-qcow2", in which case
_filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no
longer recognized as the TEST_DIR by _filter_testdir.
Fix this by not applying _filter_imgfmt in do_run_qemu() but in
run_qemu() instead, and only after _filter_testdir.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170927211334.3988-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.
Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.
Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
While current encryption schemes all have a fixed sector size of
512 bytes, this is not guaranteed to be the case in future. Expose
the sector size in the APIs so the block layer can remove assumptions
about fixed 512 byte sectors.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-3-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.
On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.
The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places). In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.
Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting. We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.
This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load. This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
$ (ulimit -S -v 1000000; ./check -qcow2 197)
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Improve our braindead copy-on-read implementation. Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G. While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read
Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits. Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.
Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup. This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').
It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember. But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer. So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.
I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
$ (ulimit -S -v 500000; ./check -qcow2 1)
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.
Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled. While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop). However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.
bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.
The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.
With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
QMP responses to certain commands can become quite long, which doesn't
only make reading them hard, but also means that the maximum line length
in patch emails can be exceeded. Allow tests to switch to QMP pretty
printing, which results in more, but shorter lines.
We also need to make sure to keep indentation in the response for this
to work as expected.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.
This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.
On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.
Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no good reason for bdrv_drop_intermediate() to know the active
layer above the subchain it is operating on - even more so, because
the assumption that there is a single active layer above it is not
generally true.
In order to prepare removal of the active parameter, use a BdrvChildRole
callback to update the backing file string in the overlay image instead
of directly calling bdrv_change_backing_file().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
"check" is full of qemu-iotests--specific details. Separating it
from "common" does not make much sense anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The variable is almost unused, and one of the two uses is actually
uninitialized.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The variable is used in "common" but defined only after the file
is sourced.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split "check" parts from tests part.
For the directory setup, the actual computation of directories goes
in "check", while the sanity checks go in the tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are never used by "check", with one exception that does not need
$QEMU_OPTIONS. Keep them in common.rc, which will be soon included only
by the tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of ./check failing when a binary is missing, we try each test
case now and each one fails with tons of test case diffs. Also, all the
variables were initialized by "check" prior to "common" being sourced,
and then (uselessly) checked for emptiness again in "check".
Centralize the search for programs in "common" (which will soon be
one with "check"), including the "realpath" invocation which can be done
just once in "check" rather than in the tests.
For qnio_server, move the detection to "common", simplifying
set_prog_path to stop handling the unused second argument, and
embedding the "realpath" pass.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some functions in common.rc are never used by the tests. Move
them out of that file and into common, which is already included
only by "check".
Code that actually *is* common to "check" and tests can be placed in
common.config.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This includes shell function, shell variables and command line options
(randomize.awk does not exist).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity. It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both callers already had bytes available, but were scaling to
sectors. Move the scaling to internal code. In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space. But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster. In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere. Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap. Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration. Both
callers were already using the result as a bool, so make that
explicit. Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.
Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later). Update the interface to
do the scaling internally instead.
In qcow2-bitmap, the code was specifically checking for an error
return of -1. To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value. It's easy enough to make both
changes at the same time, even though either one in isolation
would work.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.
Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration. Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap. It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size. The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere. Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies). A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.
Use is_power_of_2() while at it, instead of open-coding that.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion. A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.
The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors(). Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.
Note that bdrv_truncate() is still a bit awkward. We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back). But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We had several functions that no one is currently using, and which
use sector-based interfaces. I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:
bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap). We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.
Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.
Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers of bdrv_img_create() pass in a size, or -1 to read the
size from the backing file. We then set that size as the QemuOpt
default, which means we will reuse that default rather than the
final parameter to qemu_opt_get_size() several lines later. But
it is rather confusing to read subsequent checks of 'size == -1'
when it looks (without seeing the full context) like size defaults
to 0; it also doesn't help that a size of 0 is valid (for some
formats).
Rework the logic to make things more legible.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
s390x changes:
- support for IDA (indirect addressing in ccws) via ccw data stream
- support for extended TOD-Clock (z14 feature)
- various fixes and improvements all over the place
# gpg: Signature made Fri 06 Oct 2017 10:52:22 BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20171006: (33 commits)
hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
s390x/tcg: initialize machine check queue
s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
s390x/sclp: Mark the sclp device with user_creatable = false
s390/kvm: make TOD setting failures fatal for migration
s390/kvm: Support for get/set of extended TOD-Clock for guest
s390x/css: fix css migration compat handling
s390x: sort some devices into categories
s390x/tcg: make STFL store into the lowcore
s390x: introduce and use S390_MAX_CPUS
target/s390x: get rid of next_core_id
s390x/cpumodel: fix max STFL(E) bit number
s390x: raise CPU hotplug irq after really hotplugged
MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
s390x/3270: handle writes of arbitrary length
s390x/3270: IDA support for 3270 via CcwDataStream
Revert "s390x/ccw: create s390 phb conditionally"
s390x/tcg: make idte/ipte use the new _real mmu
s390x/tcg: make testblock use the new _real mmu
s390x/tcg: make stora(g) use the new _real mmu
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The "sclpquiesce" device is just an internal device that should not be
created by the user directly. Though it currently does not seem to cause
any obvious trouble when the user instantiates an additional device, let's
better mark it with user_creatable = false to avoid unexpected behavior,
e.g. because the quiesce notifier gets registered multiple times.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507193105-15627-1-git-send-email-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Just as for external interrupts and I/O interrupts, we need to
initialize mchk_index during cpu reset.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
A TYPE_SCLP_CPU_HOTPLUG device for handling cpu hotplug events
is already created by the sclp event facility. Adding a second
TYPE_SCLP_CPU_HOTPLUG device via -device sclp-cpu-hotplug creates
an ambiguity in raise_irq_cpu_hotplug(), leading to a crash once
a cpu is hotplugged.
To fix this, disallow creating a sclp-cpu-hotplug device manually.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The "sclp" device is just an internal device that can not be instantiated
by the users. If they try to use it, they only get a simple error message:
$ qemu-system-s390x -nographic -device sclp
qemu-system-s390x: Option '-device s390-sclp-event-facility' cannot be
handled by this machine
Since sclp_init() tries to create a TYPE_SCLP_EVENT_FACILITY which is
a non-pluggable sysbus device, there is really no way that the "sclp"
device can be used by the user, so let's set the user_creatable = false
accordingly.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507125199-22562-1-git-send-email-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
If we fail to set a proper TOD clock on the target system, this can
already result in some problematic cases. We print several warn messages
on source and target in that case.
If kvm fails to set a nonzero epoch index, then we must ultimately fail
the migration as this will result in a giant time leap backwards. This
patch lets the migration fail if we can not set the guest time on the
target.
On failure the guest will resume normally on the original host machine.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split failure change from epoch index change, minor fixups]
Message-Id: <20171004105751.24655-3-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Commit e996583eb3 ("s390x/css: activate ChannelSubSys migration",
2017-07-11) was supposed to enable css migration for virtio-ccw
machines starting 2.10, but it ended up effectively enabling it
only for 2.10 as the registration of the appropriate VMStateDescription
happens in ccw_machine_2_10_instance_options which does not get
called for machines more recent than 2_10.
Let us move the corresponding chunk of code (which conditionally enables
the migration based on the value of the corresponding class property) to
ccw_init, which is called for each virtio-ccw machine instance.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20171004110109.16525-1-pasic@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Using virtual memory access is wrong and will soon include low-address
protection checks, which is to be bypassed for STFL.
STFL is a privileged instruction and using LowCore requires
!CONFIG_USER_ONLY, so add the ifdef and move the declaration to the
right place.
This was originally part of a bigger STFL(E) refactoring.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170927170027.8539-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
core_id is not needed by linux-user, as the core_id a.k.a. CPU address
is only accessible from kernel space.
Therefore, drop next_core_id and make cpu_index get autoassigned again
for linux-user.
While at it, shield core_id and cpuid completely from linux-user. cpuid
can also only be queried from kernel space.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-5-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Let's move it into the machine, so we trigger the IRQ after setting
ms->possible_cpus (which SCLP uses to construct the list of
online CPUs).
This also fixes a problem reported by Thomas Huth, whereby qemu can be
crashed using the none machine
qemu-s390x-softmmu -M none -monitor stdio
-> device_add qemu-s390-cpu
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The problem is, that the current implementation places unrealistic and
arbitrary constraints on the length of writes to the device (that is the
outbound requests), by asserting ccw.count being such that that even the
worst case escaped payload will fit an more or less arbitrary sized
buffer. Actually on protocol level there is nothing to justify such
a limitation.
Another strange thing is the return value which more or less reflects
the size (written) after escaping instead of before escaping. This
is strange, because this return value is used to calculate SCSW.count.
Let us teach 3270 how to deal with arbitrary long writes.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reported-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-3-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Let us convert the 3270 code so it uses the recently introduced
CcwDataStream abstraction instead of blindly assuming direct data access.
This patch does not change behavior beyond introducing IDA support: for
direct data access CCWs everything stays as-is. (If there are bugs, they
are also preserved).
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-2-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This reverts commit d32bd032d8.
Turns out that old QEMUs always created a pci host bridge
and for many CPU models the migration from old QEMUs to new
QEMUs will fail with
qemu-system-s390x: Unknown savevm section or instance 'PCIBUS' 0
qemu-system-s390x: load of migration failed: Invalid argument
As a quick fix we will revert the commit and always create the
pci host bridge.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[fixed revert to keep the comment fixup, added a comment in the code]
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Message-Id: <20170928131831.81393-1-borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This makes it easy to access real addresses (prefix) and in addition
checks for valid memory addresses, which is missing when using e.g.
stl_phys().
We can later reuse it to implement low address protection checks (then
we might even decide to introduce yet another MMU for absolute
addresses, just for handling storage keys and low address protection).
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-3-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The architecture mandates the addresses to be accessed on the first
indirection level (that is, the data addresses without IDA, and the
(M)IDAW addresses with (M)IDA) to be checked against an CCW format
dependent limit maximum address. If a violation is detected, the storage
access is not to be performed and a channel program check needs to be
generated. As of today, we fail to do this check.
Let us stick even closer to the architecture specification.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-5-pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
env->psa is a 64bit value, while we copy 4 bytes into the save area,
resulting always in 0 getting stored.
Let's try to reduce such errors by using a proper structure. While at
it, use correct cpu->be conversion (and get_psw_mask()), as we will be
reusing this code for TCG soon.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170922140338.6068-1-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The STFLE bits for the MSA (extension) facilities simply indicate that
the respective instructions can be executed. The QUERY subfunction can then
be used to identify which features exactly are available.
Availability of subfunctions can also vary on real hardware. For now, we
simply implement a CPU model without any available subfunctions except
QUERY (which is always around).
As all MSA functions behave quite similarly, we can use one translation
handler for now. Prepare the code for implementation of actual subfunctions.
At least MSA is helpful for now, as older Linux kernels require this
facility when compiled for a z9 model. Allow to enable the facilities
for the qemu cpu model.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920153016.3858-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
HMP pull 2017-10-05
# gpg: Signature made Thu 05 Oct 2017 11:50:13 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-hmp-20171005:
hmp-commands-info: Change "@findex FOO" to "@findex info FOO"
hmp-commands-info: Move Texinfo stanzas to conventional place
hmp-commands-info: Fix "info rocker-FOO" misspellings
hmp: Fix unknown command for subtable
hmp: Missing handle_errors
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Merge qio 2017/10/04 v1
# gpg: Signature made Wed 04 Oct 2017 13:23:04 BST
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qio-2017-10-04-1:
io: add trace events for websockets frame handling
io: Attempt to send websocket close messages to client
io: Reply to ping frames
io: Ignore websocket PING and PONG frames
io: Allow empty websocket payload
io: Add support for fragmented websocket binary frames
io: Small updates in preparation for websocket changes
ui: Always remove an old VNC channel watch before adding a new one
io: use case insensitive check for Connection & Upgrade websock headers
io: include full error message in websocket handshake trace
io: send proper HTTP response for websocket errors
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
It is useful to trace websockets frame encoding/decoding when debugging
problems.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make a best effort attempt to close websocket connections according to
the RFC. Sends the close message, as room permits in the socket buffer,
and immediately closes the socket.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an immediate ping reply (pong) to the outgoing stream when a ping
is received. Unsolicited pongs are ignored.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keep pings and gratuitous pongs generated by web browsers from killing
websocket connections.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some browsers send pings/pongs with no payload, so allow empty payloads
instead of closing the connection.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Allows fragmented binary frames by saving the previous opcode. Handles
the case where an intermediary (i.e., web proxy) fragments frames
originally sent unfragmented by the client.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Gets rid of unnecessary bit shifting and performs proper EOF checking to
avoid a large number of repeated calls to recvmsg() when a client
abruptly terminates a connection (bug fix).
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When checking the value of the Connection and Upgrade HTTP headers
the websock RFC (6455) requires the comparison to be case insensitive.
The Connection value should be an exact match not a substring.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When the websocket handshake fails it is useful to log the real
error message via the trace points for debugging purposes.
Fixes bug: #1715186
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When any error occurs while processing the websockets handshake,
QEMU just terminates the connection abruptly. This is in violation
of the HTTP specs and does not help the client understand what they
did wrong. This is particularly bad when the client gives the wrong
path, as a "404 Not Found" would be very helpful.
Refactor the handshake code so that it always sends a response to
the client unless there was an I/O error.
Fixes bug: #1715186
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
NVIDIA has defined a specification for creating GPUDirect "cliques",
where devices with the same clique ID support direct peer-to-peer DMA.
When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest
(part of cuda-samples) determine which GPUs can support peer-to-peer
based on chipset and topology. When running in a VM, these tools have
no visibility to the physical hardware support or topology. This
option allows the user to specify hints via a vendor defined
capability. For instance:
<qemu:commandline>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/>
</qemu:commandline>
This enables two cliques. The first is a singleton clique with ID 0,
for the first hostdev defined in the XML (note that since cliques
define peer-to-peer sets, singleton clique offer no benefit). The
subsequent two hostdevs are both added to clique ID 1, indicating
peer-to-peer is possible between these devices.
QEMU only provides validation that the clique ID is valid and applied
to an NVIDIA graphics device, any validation that the resulting
cliques are functional and valid is the user's responsibility. The
NVIDIA specification allows a 4-bit clique ID, thus valid values are
0-15.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If the hypervisor needs to add purely virtual capabilties, give us a
hook through quirks to do that. Note that we determine the maximum
size for a capability based on the physical device, if we insert a
virtual capability, that can change. Therefore if maximum size is
smaller after added virt capabilities, use that.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If vfio_add_std_cap() errors then going to out prepends irrelevant
errors for capabilities we haven't attempted to add as we unwind our
recursive stack. Just return error.
Fixes: 7ef165b9a8 ("vfio/pci: Pass an error object to vfio_add_capabilities")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
After iothread is enabled internally inside QEMU with GMainContext, we
may encounter this warning when destroying the iothread:
(qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll:
assertion '!SOURCE_DESTROYED (source)' failed
The problem is that g_source_remove_poll() does not allow to remove one
source from array if the source is detached from its owner
context. (peterx: which IMHO does not make much sense)
Fix it on QEMU side by avoid calling g_source_remove_poll() if we know
the object is during destruction, and we won't leak anything after all
since the array will be gone soon cleanly even with that fd.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-6-peterx@redhat.com
[peterx: write the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When gcontext is used with iothread, the context will be destroyed
during iothread_stop(). That's not good since sometimes we would like
to keep the resources until iothread is destroyed, but we may want to
stop the thread before that point.
Delay the destruction of gcontext to iothread finalize. Then we can do:
iothread_stop(thread);
some_cleanup_on_resources();
iothread_destroy(thread);
We may need this patch if we want to run chardev IOs in iothreads and
hopefully clean them up correctly. For more specific information,
please see 2b316774f6 ("qemu-char: do not operate on sources from
finalize callbacks").
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-5-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
So that internal iothread users can explicitly stop one iothread without
destroying it.
Since at it, fix iothread_stop() to allow it to be called multiple
times. Before this patch we may call iothread_stop() more than once on
single iothread, while that may not be correct since qemu_thread_join()
is not allowed to run twice. From manual of pthread_join():
Joining with a thread that has previously been joined results in
undefined behavior.
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-4-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
IOThread is a general framework that contains IO loop environment and a
real thread behind. It's also good to be used internally inside qemu.
Provide some helpers for it to create iothreads to be used internally.
Put all the internal used iothreads into the internal object container.
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-3-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# gpg: Signature made Fri 29 Sep 2017 04:17:37 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-testing-pull-request:
docker: Don't mount ccache db if NOUSER=1
docker: test-block: Don't continue if build fails
tests/docker/run: don't source /etc/profile
docker: Fix test-mingw
docker: add installation to build tests
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On a modern server-class ppc host with the following CPU topology:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0,8,16,24
Off-line CPU(s) list: 1-7,9-15,17-23,25-31
Thread(s) per core: 1
If both KVM PR and KVM HV loaded and we pass:
-machine pseries,accel=kvm,kvm-type=PR -smp 8
We expect QEMU to warn that this exceeds the number of online CPUs:
Warning: Number of SMP cpus requested (8) exceeds the recommended
cpus supported by KVM (4)
Warning: Number of hotpluggable cpus requested (8) exceeds the
recommended cpus supported by KVM (4)
but nothing is printed...
This happens because on ppc the KVM_CAP_NR_VCPUS capability is VM
specific ndreally depends on the KVM type, but we currently use it
as a global capability. And KVM returns a fallback value based on
KVM HV being present. Maybe KVM on POWER shouldn't presume anything
as long as it doesn't have a VM, but in all cases, we should call
KVM_CREATE_VM first and use KVM_CAP_NR_VCPUS as a VM capability.
This patch hence changes kvm_recommended_vcpus() accordingly and
moves the sanity checking of smp_cpus after the VM creation.
It is okay for the other archs that also implement KVM_CAP_NR_VCPUS,
ie, mips, s390, x86 and arm, because they don't depend on the VM
being created or not.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <150600966286.30533.10909862523552370889.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On a server-class ppc host, this capability depends on the KVM type,
ie, HV or PR. If both KVM are present in the kernel, we will always
get the HV specific value, even if we explicitely requested PR on
the command line.
This can have an impact if we're using hugepages or a balloon device.
Since we've already created the VM at the time any user calls
kvm_has_sync_mmu(), switching to kvm_vm_check_extension() is
enough to fix any potential issue.
It is okay for the other archs that also implement KVM_CAP_SYNC_MMU,
ie, mips, s390, x86 and arm, because they don't depend on the VM being
created or not.
While here, let's cache the state of this extension in a bool variable,
since it has several users in the code, as suggested by Thomas Huth.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150600965332.30533.14702405809647835716.stgit@bahia.lan>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a library header, so angle brackets are more appropriate; also
move the line to before QEMU headers, as is recommended in HACKING.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20170920085952.3872-1-famz@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Valgrind detects an invalid read operation when hot-plugging of an
USB device fails:
$ valgrind x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598== Memcheck, a memory error detector
==30598== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30598== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==30598== Command: x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598==
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
==30598== Invalid read of size 8
==30598== at 0x60EF50: object_unparent (object.c:445)
==30598== by 0x580F0D: usb_try_create_simple (bus.c:346)
==30598== by 0x581BEB: usb_claim_port (bus.c:451)
==30598== by 0x582310: usb_qdev_realize (bus.c:257)
==30598== by 0x4CB399: device_set_realized (qdev.c:914)
==30598== by 0x60E26D: property_set_bool (object.c:1886)
==30598== by 0x61235E: object_property_set_qobject (qom-qobject.c:27)
==30598== by 0x61000F: object_property_set_bool (object.c:1162)
==30598== by 0x4567C3: qdev_device_add (qdev-monitor.c:630)
==30598== by 0x456D52: qmp_device_add (qdev-monitor.c:807)
==30598== by 0x470A99: hmp_device_add (hmp.c:1933)
==30598== by 0x3679C3: handle_hmp_command (monitor.c:3123)
The object_unparent() here is not necessary anymore since commit
69382d8b3e ("qdev: Fix object reference leak in case device.realize()
fails"), so let's remove it now.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1506526106-30971-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Currently, iothread_stop_all() makes all iothread objects unsafe
to be destroyed, because qemu_thread_join() ends up being called
twice.
To fix this, make iothread_stop() idempotent by checking
thread->stopped.
Fixes the following crash:
qemu-system-x86_64 -object iothread,id=iothread0 -monitor stdio -display none
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) quit
qemu: qemu_thread_join: No such process
Aborted (core dumped)
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170926130028.12471-1-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu uses wheel-up/down button events for mouse wheel input, however
linux applications typically want REL_WHEEL events.
This fixes wheel with linux guests. Tested with X11/wayland, and
windows virtio-input driver.
Based on a patch from Marc.
Added property to enable/disable wheel axis.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170926113243.26081-1-kraxel@redhat.com
Rename the functions to to say "setup" instead of "create" because they
support being called multiple times on the same egl framebuffer.
Properly delete unused textures, update function interfaces to support
this.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927115031.12063-1-kraxel@redhat.com
Handle the translation from vga chars to curses chars in curses_update()
instead of console_write_ch(). Purge any curses support bits from
ui/console.h include file.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927103811.19249-1-kraxel@redhat.com
The usual behaviour of /etc/profile is to set the default PATH for
users. This runs into problems when we have updated PATH in our
dockerfile e.g. to access a cross-compiler in a non-standard
location. It shouldn't be needed anyway as we inherit the env from the
image when it was setup.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170926133622.14991-1-alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Feature "dtc" is explicitly required by test-mingw, but is not detected
by the run script since we switched to archive-source.sh in b7f404201e.
Since it isn't available in the Fedora image which runs this test on
patchew, the way we get dtc is still from submodule.
archive-source.sh takes care of bundling the submodule files already, so
what we need to do is just checking if files are there. Makefile is
chosen because it is one that is unlikely to get renamed in the future.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170925082913.22089-1-famz@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Migration pull 2017-09-27
# gpg: Signature made Wed 27 Sep 2017 14:56:23 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20170927a:
migration: Route more error paths
migration: Route errors up through vmstate_save
migration: wire vmstate_save_state errors up to vmstate_subsection_save
migration: Check field save returns
migration: check pre_save return in vmstate_save_state
migration: pre_save return int
migration: disable auto-converge during bulk block migration
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ppc patch queue 2017-09-27
Contains
* a number of Mac machine type fixes
* a number of embedded machine type fixes (preliminary to adding the
Sam460ex board)
* a important fix for handling of migration with KVM PR
* assorted other minor fixes and cleanups
# gpg: Signature made Wed 27 Sep 2017 08:40:48 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.11-20170927: (26 commits)
macio: use object link between MACIO_IDE and MAC_DBDMA object
macio: pass channel into MACIOIDEState via qdev property
mac_dbdma: remove DBDMA_init() function
mac_dbdma: QOMify
mac_dbdma: remove unused IO fields from DBDMAState
spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
ppc/pnv: check for OPAL firmware file presence
ppc: remove all unused CPU definitions
ppc: remove unused CPU definitions
spapr_pci: make index property mandatory
macio: convert pmac_ide_ops from old_mmio
ppc/pnv: Improve macro parenthesization
spapr: introduce helpers to migrate HPT chunks and the end marker
ppc/kvm: generalize the use of kvmppc_get_htab_fd()
ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
ppc: Fix OpenPIC model
ppc/ide/macio: Add missing registers
ppc/mac: More rework of the DBDMA emulation
ppc/mac: Advertise a high clock frequency for NewWorld Macs
ppc: QOMify g3beige machine
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block layer patches
# gpg: Signature made Tue 26 Sep 2017 14:52:32 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (24 commits)
block/qcow2-bitmap: fix use of uninitialized pointer
qemu-iotests: add shrinking image test
qcow2: add shrink image support
qcow2: add qcow2_cache_discard
qemu-img: add --shrink flag for resize
iotests: fix 181: enable postcopy-ram capability on target
qemu-iotests: Test change-backing-file command
block: Fix permissions after bdrv_reopen()
block: reopen: Queue children after their parents
block: Base permissions on rw state after reopen
block: Add reopen queue to bdrv_check_perm()
block: Add reopen_queue to bdrv_child_perm()
qemu-io: Drop write permissions before read-only reopen
block: Clean up some bad code in the vvfat driver
block/throttle-groups.c: allocate RestartData on the heap
throttle: Assert that bkt->max is valid in throttle_compute_wait()
iotests: Print full path of bad output if mismatch
iotests: use virtio aliases for 067
iotests: use -ccw on s390x for 051
iotests: use -ccw on s390x for 040, 139, and 182
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.
Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.
Note: If you add an error exit in your pre_save you must emit
an error_report to say why.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
auto-converge and block migration currently do not play well together.
During block migration the auto-converge logic detects that ram
migration makes no progress and thus throttles down the vm until
it nearly stalls completely. Avoid this by disabling the throttling
logic during the bulk phase of the block migration.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Using a standard QOM object link we can pass a reference to the MAC_DBDMA
controller to the MACIO_IDE object which removes the last external parameter
to macio_ide_register_dma().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the reasons macio_ide_register_dma() needs to exist is because the
channel id isn't passed into the MACIO_IDE object. Pass in the channel id
using a qdev property to remove this requirement.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead we can now instantiate the MAC_DBDMA object directly within the
macio device. We also add the DBDMA device as a child property so that
it is possible to retrieve later.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These fields were used to manually handle IO requests that weren't aligned
to a sector boundary before this feature was supported by the block API.
Once the block API changed to support byte-aligned IO requests, the macio
controller was switched over to use it in commit be1e343 but these fields
were accidentally left behind. Remove them, including the initialisation
in DBDMA_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.
Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.
This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.
It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and exit before uselessly trying to load it if the file does not
exists.
Issue discovered by Coverity Scan.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Remove *all* unused CPU definitions as indicated by compile-time
`#if 0` constructs.
Signed-off-by: John Snow <jsnow@redhat.com>
[dwg: Removed some additional now-useless comments]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PHBs can be created with an index property, in which case the machine
code automatically sets all the MMIO windows at addresses derived from
the index. Alternatively, they can be manually created without index,
but the user has to provide addresses for all MMIO windows.
The non-index way happens to be more trouble than it's worth: it's
difficult to use, keeps requiring (potentially incompatible) changes
when some new parameter needs adding, and is awkward to check for
collisions. It currently even has a bug that prevents to use two
non-index PHBs because their child DRCs are all derived from the
same index == -1 value, and, thus, collide.
This patch hence makes the index property mandatory. As a consequence,
the PHB's memory regions and BUID are now always configured according
to the index, and it is no longer possible to set them from the command
line.
This DOES BREAK backwards compat, but we don't think the non-index
PHB feature was used in practice (at least libvirt doesn't) and the
simplification is worth it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Although none of the existing macro call-sites were broken,
it's always better to write macros that properly parenthesize
arguments that can be complex expressions, so that the intended
order of operations is not broken.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().
This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
=> only kvmppc_read_hptes() passes an actual index, all other users
pass 0
- add an errp argument to propagate error messages to the caller.
=> spapr migration code prints the error
=> hpte helpers pass &error_abort to keep the current behavior
of hw_error()
While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c138593380:
"This support updating htab managed by the hypervisor. Currently
we don't have any user for this feature. This actually bring the
store_hpte interface in-line with the load_hpte one. We may want
to use this when we want to emulate henter hcall in qemu for HV
kvm."
The above is still true today.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.
Let's patch kvmppc_get_htab_fd() accordingly.
While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The timing register exists on all variants of MacIO IDE, we just
store and return its value.
The interrupts register only exists on KeyLargo but it doesn't
hurt to have it. The lack of this register causes MacOS X to
hangs under some circumstances.
Both are 32-bit only. The HW might support smaller access sizes
but no known OS uses them.
Because the core IDE subsystem doesn't provide us with a way
to query the main (level) interrupt state, nor do we have a way
to know that DBDMA issued a (edge) interrupt, we reflect both
through a private pair of qirq's in order to maintain the
register state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This completely reworks the handling of the control register
according to my understanding of the HW and the spec.
It should (hopefully ... still testing) fix a number of issues
most notably cases of MacOS hanging.
Also update dbdma_unassigned_rw() and dbdma_unassigned_flush() to
have the expected behaviour now that flush is handled slightly
differently.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These registers are present in 440 SoCs (and maybe in others too) and
U-Boot accesses them when printing register info. We don't emulate
these but add them to avoid crashing when they are read or written.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The following capabilities are VM specific:
- KVM_CAP_PPC_SMT_POSSIBLE
- KVM_CAP_PPC_HTAB_FD
- KVM_CAP_PPC_ALLOC_HTAB
If both KVM HV and KVM PR are present, checking them always return
the HV value, even if we explicitely requested to use PR.
This has no visible effect for KVM_CAP_PPC_ALLOC_HTAB, because we also
try the KVM_PPC_ALLOCATE_HTAB ioctl which is only suppored by HV. As
a consequence, the spapr code doesn't even check KVM_CAP_PPC_HTAB_FD.
However, this will cause kvmppc_hint_smt_possible(), introduced by
commit fa98fbfcdf, to report several VSMT modes (eg, Available
VSMT modes: 8 4 2 1) whereas PR only support mode 1.
This patch fixes all three anyway to use kvm_vm_check_extension(). It
is okay since the VM is already created at the time kvm_arch_init() or
kvmppc_reset_htab() is called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BQL bug fix
# gpg: Signature made Mon 25 Sep 2017 23:14:48 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20170925:
accel/tcg/cputlb: avoid recursive BQL (fixes#1706296)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-2-pbutsykin@virtuozzo.com
[mreitz: Added a missing space to a warning]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Migration capabilities should be enabled on both source and
destination qemu processes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This involves a temporary read-write reopen if the backing file link in
the middle of a backing file chain should be changed and is therefore a
good test for the latest bdrv_reopen() vs. op blockers fixes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we switch between read-only and read-write, the permissions that
image format drivers need on bs->file change, too. Make sure to update
the permissions during bdrv_reopen().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We will calculate the required new permissions in the prepare stage of a
reopen. Required permissions of children can be influenced by the
changes made to their parents, but parents are independent from their
children. This means that permissions need to be calculated top-down. In
order to achieve this, queue parents before their children rather than
queuing the children first.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When new permissions are calculated during bdrv_reopen(), they need to
be based on the state of the graph as it will be after the reopen has
completed, not on the current state of the involved nodes.
This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue
from which the new flags are taken. This is then used for determining
the new bs->file permissions of format drivers as soon as we add the
code to actually pass a non-NULL reopen queue to the .bdrv_child_perm
callbacks.
While moving bdrv_is_writable(), make it static. It isn't used outside
block.c.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qemu-io provides a 'reopen' command that allows switching from writable
to read-only access. We need to make sure that we don't try to keep
write permissions to a BlockBackend that becomes read-only, otherwise
things are going to fail.
This requires a bdrv_drain() call because otherwise in-flight AIO
write requests could issue new internal requests while the permission
has already gone away, which would cause assertion failures. Draining
the queue doesn't break AIO requests in any new way, bdrv_reopen() would
drain it anyway only a few lines later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Remove the unnecessary home-grown redefinition of the assert() macro here,
and remove the unusable debug code at the end of the checkpoint() function.
The code there uses assert() with side-effects (assignment to the "mapping"
variable), which should be avoided. Looking more closely, it seems as it is
apparently also only usable for one certain directory layout (with a file
named USB.H in it) and thus is of no use for the rest of the world.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.
Cc: qemu-stable@nongnu.org
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bkt->max == 0 and bkt->burst_length > 1 then we could have a
division by 0 in throttle_do_compute_wait(). That configuration is
however not permitted and is already detected by throttle_is_valid(),
but let's assert it in throttle_compute_wait() to make it explicit.
Found by Coverity (CID: 1381016).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x.
Using virtio-scsi will implicitly pick the right device, so just
switch to that for simplicity.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block driver documentation is available in qemu-doc.html. It would be
convenient to have documentation for formats, protocols, and filter
drivers in a man page.
Extract the relevant part of qemu-doc.html into a new file called
docs/qemu-block-drivers.texi. This file can also be built as a
stand-alone document (man, html, etc).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A basic set of qemu options is initialised in ./common:
export QEMU_OPTIONS="-nodefaults -machine accel=qtest"
However, two test cases (172 and 186) overwrite QEMU_OPTIONS and neglect
to manually set '-machine accel=qtest'. Add the missing option for 172.
186 probably only copied the code from 172, it doesn't actually need to
overwrite QEMU_OPTIONS, so remove that in 186.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Add a firmware path config option to configure. Multiple directories
are accepted, with the usual colon as separator. Default value is
${prefix}/share/qemu-firmware. The path is searched in addition to the
current search path (typically ${prefix}/share/qemu).
This prepares qemu for the planned split of the prebuilt firmware blobs
into a separate project.
Distributions can also use this to get rid of the firmware symlink farm
and add -- for example -- /usr/share/seabios to the firmware path
instead.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-3-kraxel@redhat.com
Add helper function to add a directory to the qemu search path, so we
don't duplicate the checks. Add a check for duplicate entries, so we
stop trying to open files twice.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-2-kraxel@redhat.com
QEMU currently aborts if you try to use the device at the command
line:
$ ppc64-softmmu/qemu-system-ppc64 -S -machine prep -device pc87312
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
qemu-system-ppc64: -device pc87312: Device 'parallel0' is in use
Aborted (core dumped)
It uses parallel_hds in its realize function, so I can not be
instantiated by the user again.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This is required to be removed on SmartOS (Illumos).
As of now there are no alternative supported SunOS distributions.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If QEMU has been compiled with the flags --enable-tcg-interpreter and
--enable-debug, the guest is running incredibly slow. The pxe boot test
can take up to 400 seconds when testing the pseries ppc64 machine. While
we should still look for ways to speed up the test on the pseries machine,
it's better to increase the timeout in this test to 600 seconds anyway to
allow the test to pass successfully now with this unusal configuration
already.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression. Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The virtio-gpu-pci device is already in the display category, so the
virtio-gpu-device should be there, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.
Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).
While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.
CC: qemu-trivial@nongnu.org
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Instead of using the hardcoded (MemTxAttrs){0} for no memory attributes
let's use the already defined MEMTXATTRS_UNSPECIFIED macro instead.
This is technically a change of behaviour as MEMTXATTRS_UNSPECIFIED sets
the unspecified field to 1, but it doesn't look like anything is
checking this field.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Error process of baum_chr_open needs to set brlapi null, so it won't
get released twice in char_braille_finalize, which will cause
"/usr/bin/qemu-system-x86_64: double free or corruption (!prev)"
Signed-off-by: Liang Yan <lyan@suse.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Remove trailing whitespace in qemu-options documentation, as it causes
reproducibility issues depending on the echo implementation used by
the Makefile.
Reported-By: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
It may be better to add a trace event to monitor the last moment of
a key event from QEMU to guest VM
Signed-off-by: Liang Yan <lyan@suse.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This device is private and is created once per aux-bus.
So don't allow the user to create one from command-line.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
In qemu-thread-posix.c we have two implementations of the
various qemu_sem_* functions, one of which uses native POSIX
sem_* and the other of which emulates them with pthread conditions.
This is necessary because not all our host OSes support
sem_timedwait().
Instead of a hard-coded list of OSes which don't implement
sem_timedwait(), which gets out of date, make configure
test for the presence of the function and set a new
CONFIG_HAVE_SEM_TIMEDWAIT appropriately.
In particular, newer NetBSDs have sem_timedwait(), so this
commit will switch them over to using it. OSX still does
not have an implementation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This change fixes conflict with the DragonFly BSD headers.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If we are woken up from while() loop in nbd_read_reply_entry
handles must be equal. If we are woken up from
nbd_recv_coroutines_wake_all s->quit must be true, so we do
not need checking handles equality.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression. Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170918214649.17550-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
NULL sockets are used for NDP, BOOTP, and other critical operations.
If the topmost mbuf in a NULL session is blocked pending resolution,
it may cause problems if it blocks other packets with a NULL socket.
So do not add mbufs with a NULL socket field to the same session.
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
if_output() originally sent one mbuf per call and used the slirp->next_m
variable to keep track of where it left off. But nowadays it tries to
send all of the mbufs from the fastq, and one mbuf from each session on
the batchq. The next_m variable is both redundant and harmful: there is
a case[0] involving delayed packets in which next_m ends up pointing
to &slirp->if_batchq when an active session still exists, and this
blocks all traffic for that session until qemu is restarted.
The test case was created to reproduce a problem that was seen on
long-running Chromium OS VM tests[1] which rapidly create and
destroy ssh connections through hostfwd.
[0] https://pastebin.com/NNy6LreF
[1] https://bugs.chromium.org/p/chromium/issues/detail?id=766323
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
e.g.
./x86_64-softmmu/qemu-system-x86_64 -nographic -netdev 'user,id=vnet,hostfwd=:555.0.0.0:0-:22'
qemu-system-x86_64: -netdev user,id=vnet,hostfwd=:555.0.0.0:0-:22: Invalid host forwarding rule ':555.0.0.0:0-:22' (Bad host address)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
We had a per-chardev cache for context, then we don't need this
parameter to be passed in every time when chr_update_read_handler()
called. As long as we are calling chr_update_read_handler() using
qemu_chr_be_update_read_handlers() we'll be fine.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-5-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was only passed in by chr_update_read_handlers(). However when
reconnect, we'll lose that context information. So if a chardev was
running on another context (rather than the default context, the NULL
pointer), it'll switch back to the default context if reconnection
happens. But, it should really stick to the old context.
Convert all the callers of io_add_watch_poll() to use the internally
cached gcontext. Then the context should be able to survive even after
reconnections.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-4-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It caches the gcontext that is used to poll the chardev IO. Before this
patch, we only passed it in via chr_update_read_handlers(). However
that may not be enough if the char backend is disconnected and
reconnected afterward. There are chardev codes that still assumed the
context be NULL (which is the main context). Will fix that up in
following up patches.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-3-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Proper support of persistent reservation for multipath devices requires
communication with the multipath daemon, so that the reservation is
registered and applied when a path comes up. The device mapper
utilities provide a library to do so; this patch makes qemu-pr-helper.c
detect multipath devices and, when one is found, delegate the operation
to libmpathpersist.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a privileged helper to run persistent reservation commands.
This lets virtual machines send persistent reservations without using
CAP_SYS_RAWIO or out-of-tree patches. The helper uses Unix permissions
and SCM_RIGHTS to restrict access to processes that can access its socket
and prove that they have an open file descriptor for a raw SCSI device.
The next patch will also correct the usage of persistent reservations
with multipath devices.
It would also be possible to support for Linux's IOC_PR_* ioctls in
the future, to support NVMe devices. For now, however, only SCSI is
supported.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.
QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Fill postcopy-able pending only if ram postcopy is enabled.
It is necessary because of there will be other postcopy-able states and
when ram postcopy is disabled, it should not spoil common postcopy
related pending.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now postcopy-able states are recognized by not NULL
save_live_complete_postcopy handler. But when we have several different
postcopy-able states, it is not convenient. Ram postcopy may be
disabled, while some other postcopy enabled, in this case Ram state
should behave as it is not postcopy-able.
This patch add separate has_postcopy handler to specify behaviour of
savevm state.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.
One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Creation of the threads, nothing inside yet.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
Use pointers instead of long array names
Move to use semaphores instead of conditions as paolo suggestion
Put all the state inside one struct.
Use a counter for the number of threads created. Needed during cancellation.
Add error return to thread creation
Add id field
Rename functions to multifd_save/load_setup/cleanup
Change recv parameters to a pointer to struct
Change back to a struct
Use Error * for _cleanup
Indicates how many pages we are going to send in each batch to a multifd
thread.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Be consistent with defaults and documentation
Use new DEFINE_PROP_*
Rename x-multifd-group to x-multifd-page-count
Indicates the number of channels that we will create. By default we
create 2 channels.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Catch inconsistent defaults (eric).
Improve comment stating that number of threads is the same than number
of sockets
Use new DEFIN_PROP_*
Rename x-multifd-threads to x-multifd-threads
This function allows us to decide when to close the listener socket.
For now, we only need one connection.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
As this is defined on glib 2.32, add compatibility macros for older glibs.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We pass the ioc instead of the fd. This will allow us to have more
than one channel open. We also make sure that we set the
from_src_file sooner, so we don't need to pass it as a parameter.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
--
Do not assing mis->from_src_file (peterxu)
# gpg: Signature made Fri 22 Sep 2017 08:28:38 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/build-and-test-automation-pull-request: (36 commits)
docker: Drop 'set -e' from run script
docker: Use archive-source.py
tests: Add README for vm tests
MAINTAINERS: Add tests/vm entry
Makefile: Add rules to run vm tests
tests: Add OpenBSD image
tests: Add NetBSD image
tests: Add FreeBSD image
tests: Add ubuntu.i386 image
tests: Add vm test lib
tests: Add a test key pair
scripts: Add archive-source.sh
qemu.py: Add "wait()" method
gitignore: Ignore vm test images
MAINTAINERS: Fix subsystem name for "Build and test automation"
buildsys: Move rdma libs to per object
buildsys: Move brlapi libs to per object
buildsys: Move usb redir cflags/libs to per object
buildsys: Move libusb cflags/libs to per object
buildsys: Move libcacard cflags/libs to per object
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The only prototype doesn't need anything from the lib header, and not
including it here allows files that include this header, for example
vl.c, to compile without the libseccomp cflags.
The breakage is since c3883e1f93 for environments where `pkg-config
--cflags libseccomp" is non-empty.
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
Message-id: 20170920083647.14599-1-famz@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The migration interface for ais was introduced with kernel 4.13
but the capability itself had been active since 4.12. As migration
support is considered necessary lets disable ais in the 2.10
stable version. A proper fix and re-enablement will be done
for qemu 2.11.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20170921140834.14233-2-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This is the common code to implement a "VM test" to
1) Download and initialize a pre-defined VM that has necessary
dependencies to build QEMU and SSH access.
2) Archive $SRC_PATH to a .tar file.
3) Boot the VM, and pass the source tar file to the guest.
4) SSH into the VM, untar the source tarball, build from the source.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The subsystem name for the "Build test automation" section is
"-------------------------", because an actual subsystem name
line is missing:
$ ./scripts/get_maintainer.pl -f tests/docker/docker.py
"Alex Bennée" <alex.bennee@linaro.org> (maintainer:-----------------...)
Fam Zheng <famz@redhat.com> (maintainer:-----------------...)
"Philippe Mathieu-Daudé" <f4bug@amsat.org> (reviewer:-----------------...)
qemu-devel@nongnu.org (open list:-----------------...)
Fix the issue by inserting a subsystem name line where
get_maintainer.pl expects it.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170921170209.9101-1-ehabkost@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The 'run' script already creats src, build and install directories under
$TEST_DIR, use it in common.rc.
Also the tests always run from $QEMU_SRC/tests/docker, so use a relative
$CMD string.
Message-Id: <20170817035721.11064-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
If you invoke with NOCACHE=1 we pass --no-cache in the argv to
docker.py but may still not force a rebuild if the dockerfile checksum
hasn't changed. By testing for its presence we can force builds
without having to manually remove the docker image.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170725133425.436-5-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
It is a common requirement for virtual machine to send persistent
reservations, but this currently requires either running QEMU with
CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged
QEMU bypass Linux's filter on SG_IO commands.
As an alternative mechanism, the next patches will introduce a
privileged helper to run persistent reservation commands without
expanding QEMU's attack surface unnecessarily.
The helper is invoked through a "pr-manager" QOM object, to which
file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands. For example:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
or:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
Multiple pr-manager implementations are conceivable and possible, though
only one is implemented right now. For example, a pr-manager could:
- talk directly to the multipath daemon from a privileged QEMU
(i.e. QEMU links to libmpathpersist); this makes reservation work
properly with multipath, but still requires CAP_SYS_RAWIO
- use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though)
- more interestingly, implement reservations directly in QEMU
through file system locks or a shared database (e.g. sqlite)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~800000 -> ~370000). In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-16-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A container can be used instead of an alias to allow switching between
multiple subregions. In this case we cannot directly share the
subregions (since they only belong to a single parent), but if the
subregions are aliases we can in turn walk those.
This is not enough to remove all source of quadratic FlatView creation,
but it enables sharing of the PCI bus master FlatViews (and their
AddressSpaceDispatch structures) across all PCI devices. For 112
virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and
memory consumption from 1.4 to 1 G.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This avoids usual memory_region_transaction_commit() which rebuilds
all FVs.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings
down the boot time from 25s to 20s and reduces the amount of temporary FVs
allocated during machine constructon (~800000 -> ~640000) and amount of
temporary dispatch trees (~370000 -> ~300000), the total memory footprint
goes down (18G -> 17G).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-18-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds a new "-d" switch to "info mtree" to print dispatch tree
internals.
This changes the way "-f" is handled - it prints now flat views and
associated address spaces.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-15-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).
This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.
This should cause no change and memory use or boot time yet.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-13-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Address spaces get to keep a root MR (alias or not) but FlatView stores
the actual MR as this is going to be used later on to decide whether to
share a particular FlatView or not.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-10-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We store AddressSpaceDispatch* in FlatView anyway so there is no need
to carry it from mem_add() to register_subpage/register_multipage.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-8-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AS in ASD is only used to pass AS from mem_begin() to register_subpage()
to store it in MemoryRegionSection, we can do this directly now.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-6-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.
After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.
flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.
This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-5-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This moves a FlatView allocation and initialization to a helper.
While we are nere, replace g_new with g_new0 to not to bother if we add
new fields in the future.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-4-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.
Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-3-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds an AS** parameter to address_space_do_translate()
to make it easier for the next patch to share FlatViews.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-2-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's possible for address_space_get_flatview() as it currently stands
to cause a use-after-free for the returned FlatView, if the reference
count is incremented after the FlatView has been replaced by a writer:
thread 1 thread 2 RCU thread
-------------------------------------------------------------
rcu_read_lock
read as->current_map
set as->current_map
flatview_unref
'--> call_rcu
flatview_ref
[ref=1]
rcu_read_unlock
flatview_destroy
<badness>
Since FlatViews are not updated very often, we can just detect the
situation using a new atomic op atomic_fetch_inc_nonzero, similar to
Linux's atomic_inc_not_zero, which performs the refcount increment only if
it hasn't already hit zero. This is similar to Linux commit de09a9771a53
("CRED: Fix get_task_cred() and task_state() to not resurrect dead
credentials", 2010-07-29).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
target-arm queue:
* more preparatory work for v8M support
* convert some omap devices away from old_mmio
* remove out of date ARM ARM section references in comments
* add the Smartfusion2 board
# gpg: Signature made Thu 21 Sep 2017 17:40:40 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20170921: (31 commits)
msf2: Add Emcraft's Smartfusion2 SOM kit
msf2: Add Smartfusion2 SoC
msf2: Add Smartfusion2 SPI controller
msf2: Microsemi Smartfusion2 System Register block
msf2: Add Smartfusion2 System timer
hw/arm/omap2.c: Don't use old_mmio
hw/i2c/omap_i2c.c: Don't use old_mmio
hw/timer/omap_gptimer: Don't use old_mmio
hw/timer/omap_synctimer.c: Don't use old_mmio
hw/gpio/omap_gpio.c: Don't use old_mmio
hw/arm/palm.c: Don't use old_mmio for static_ops
target/arm: Remove out of date ARM ARM section references in A64 decoder
nvic: Support banked exceptions in acknowledge and complete
nvic: Make SHCSR banked for v8M
nvic: Make ICSR banked for v8M
target/arm: Handle banking in negative-execution-priority check in cpu_mmu_index()
nvic: Handle v8M changes in nvic_exec_prio()
nvic: Disable the non-secure HardFault if AIRCR.BFHFNMINS is clear
nvic: Implement v8M changes to fixed priority exceptions
nvic: In escalation to HardFault, support HF not being priority -1
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the A64 decoder, we have a lot of references to section numbers
from version A.a of the v8A ARM ARM (DDI0487). This version of the
document is now long obsolete (we are currently on revision B.a),
and various intervening versions renumbered all the sections.
The most recent B.a version of the document doesn't assign
section numbers at all to the individual instruction classes
in the way that the various A.x versions did. The simplest thing
to do is just to delete all the out of date C.x.x references.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170915150849.23557-1-peter.maydell@linaro.org
Update armv7m_nvic_acknowledge_irq() and armv7m_nvic_complete_irq()
to handle banked exceptions:
* acknowledge needs to use the correct vector, which may be
in sec_vectors[]
* acknowledge needs to return to its caller whether the
exception should be taken to secure or non-secure state
* complete needs its caller to tell it whether the exception
being completed is a secure one or not
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-20-git-send-email-peter.maydell@linaro.org
Now that we have a banked FAULTMASK register and banked exceptions,
we can implement the correct check in cpu_mmu_index() for whether
the MPU_CTRL.HFNMIENA bit's effect should apply. This bit causes
handlers which have requested a negative execution priority to run
with the MPU disabled. In v8M the test has to check this for the
current security state and so takes account of banking.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-17-git-send-email-peter.maydell@linaro.org
Update nvic_exec_prio() to support the v8M changes:
* BASEPRI, FAULTMASK and PRIMASK are all banked
* AIRCR.PRIS can affect NS priorities
* AIRCR.BFHFNMINS affects FAULTMASK behaviour
These changes mean that it's no longer possible to
definitely say that if FAULTMASK is set it overrides
PRIMASK, and if PRIMASK is set it overrides BASEPRI
(since if PRIMASK_NS is set and AIRCR.PRIS is set then
whether that 0x80 priority should take effect or the
priority in BASEPRI_S depends on the value of BASEPRI_S,
for instance). So we switch to the same approach used
by the pseudocode of working through BASEPRI, PRIMASK
and FAULTMASK and overriding the previous values if
needed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-16-git-send-email-peter.maydell@linaro.org
If AIRCR.BFHFNMINS is clear, then although NonSecure HardFault
can still be pended via SHCSR.HARDFAULTPENDED it mustn't actually
preempt execution. The simple way to achieve this is to clear the
enable bit for it, since the enable bit isn't guest visible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-15-git-send-email-peter.maydell@linaro.org
In v7M, the fixed-priority exceptions are:
Reset: -3
NMI: -2
HardFault: -1
In v8M, this changes because Secure HardFault may need
to be prioritised above NMI:
Reset: -4
Secure HardFault if AIRCR.BFHFNMINS == 1: -3
NMI: -2
Secure HardFault if AIRCR.BFHFNMINS == 0: -1
NonSecure HardFault: -1
Make these changes, including support for changing the
priority of Secure HardFault as AIRCR.BFHFNMINS changes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-14-git-send-email-peter.maydell@linaro.org
When escalating to HardFault, we must go into Lockup if we
can't take the synchronous HardFault because the current
execution priority is already at or below the priority of
HardFault. In v7M HF is always priority -1 so a simple < 0
comparison sufficed; in v8M the priority of HardFault can
vary depending on whether it is a Secure or NonSecure
HardFault, so we must check against the priority of the
HardFault exception vector we're about to use.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-13-git-send-email-peter.maydell@linaro.org
In armv7m_nvic_set_pending() we have to compare the
priority of an exception against the execution priority
to decide whether it needs to be escalated to HardFault.
In the specification this is a comparison against the
exception's group priority; for v7M we implemented it
as a comparison against the raw exception priority
because the two comparisons will always give the
same answer. For v8M the existence of AIRCR.PRIS and
the possibility of different PRIGROUP values for secure
and nonsecure exceptions means we need to explicitly
calculate the vector's group priority for this check.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-12-git-send-email-peter.maydell@linaro.org
Make the armv7m_nvic_set_pending() and armv7m_nvic_clear_pending()
functions take a bool indicating whether to pend the secure
or non-secure version of a banked interrupt, and update the
callsites accordingly.
In most callsites we can simply pass the correct security
state in; in a couple of cases we use TODO comments to indicate
that we will return the code in a subsequent commit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-10-git-send-email-peter.maydell@linaro.org
For v8M, the NVIC has a new set of registers per interrupt,
NVIC_ITNS<n>. These determine whether the interrupt targets Secure
or Non-secure state. Implement the register read/write code for
these, and make them cause NVIC_IABR, NVIC_ICER, NVIC_ISER,
NVIC_ICPR, NVIC_IPR and NVIC_ISPR to RAZ/WI for non-secure
accesses to fields corresponding to interrupts which are
configured to target secure state.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-8-git-send-email-peter.maydell@linaro.org
The Application Interrupt and Reset Control Register has some changes
for v8M:
* new bits SYSRESETREQS, BFHFNMINS and PRIS: these all have
real state if the security extension is implemented and otherwise
are constant
* the PRIGROUP field is banked between security states
* non-secure code can be blocked from using the SYSRESET bit
to reset the system if SYSRESETREQS is set
Implement the new state and the changes to register read and write.
For the moment we ignore the effects of the secure PRIGROUP.
We will implement the effects of PRIS and BFHFNMIS later.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-6-git-send-email-peter.maydell@linaro.org
Instead of looking up the pending priority
in nvic_pending_prio(), cache it in a new state struct
field. The calculation of the pending priority given
the interrupt number is more complicated in v8M with
the security extension, so the caching will be worthwhile.
This changes nvic_pending_prio() from returning a full
(group + subpriority) priority value to returning a group
priority. This doesn't require changes to its callsites
because we use it only in comparisons of the form
execution_prio > nvic_pending_prio()
and execution priority is always a group priority, so
a test (exec prio > full prio) is true if and only if
(execprio > group_prio).
(Architecturally the expected comparison is with the
group priority for this sort of "would we preempt" test;
we were only doing a test with a full priority as an
optimisation to avoid the mask, which is possible
precisely because the two comparisons always give the
same answer.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-5-git-send-email-peter.maydell@linaro.org
With banked exceptions, just the exception number in
s->vectpending is no longer sufficient to uniquely identify
the pending exception. Add a vectpending_is_s_banked bool
which is true if the exception is using the sec_vectors[]
array.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1505240046-11454-4-git-send-email-peter.maydell@linaro.org
For the v8M security extension, some exceptions must be banked
between security states. Add the new vecinfo array which holds
the state for the banked exceptions and migrate it if the
CPU the NVIC is attached to implements the security extension.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
In v8M the MSR and MRS instructions have extra register value
encodings to allow secure code to access the non-secure banked
version of various special registers.
(We don't implement the MSPLIM_NS or PSPLIM_NS aliases, because
we don't currently implement the stack limit registers at all.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-2-git-send-email-peter.maydell@linaro.org
This avoids a name clash with the access macro on windows 64:
make
CHK version_gen.h
CC aarch64-softmmu/memory.o
/home/konrad/qemu/memory.c: In function 'access_with_adjusted_size':
/home/konrad/qemu/memory.c:591:73: error: macro "access" passed 7 arguments, \
but takes just 2
(size - access_size - i) * 8, access_mask, attrs);
^
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Message-Id: <1505988260-8483-1-git-send-email-frederic.konrad@adacore.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pflash toggles mr->romd_mode. So this assert does not always hold.
1) a device was added with !mr->romd_mode, therefore effectively not
creating a kvm slot as we want to trap every access (add = false).
2) mr->romd_mode was toggled on before remove it. There is now
actually no slot to remove and the assert is wrong.
So let's just drop the assert.
Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920145025.19403-1-david@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.
Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 11:51:49 +02:00
964 changed files with 57702 additions and 18065 deletions
@@ -366,17 +366,9 @@ bus=PCI-BUS,addr=DEVFN to control the PCI device address, as usual.
=== Host Device Assignment ===
QEMU supports assigning host PCI devices (qemu-kvm only at this time)
and host USB devices.
and host USB devices. PCI devices can only be assigned with -device:
The old way to assign a host PCI device is
-pcidevice host=ADDR,dma=none,id=ID
The new way is
-device pci-assign,host=ADDR,iommu=IOMMU,id=ID
The old dma=none becomes iommu=off with -device.
-device vfio-pci,host=ADDR,id=ID
The old way to assign a host USB device is
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.