HMP pull
# gpg: Signature made Wed 17 May 2017 07:03:39 PM BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* dgilbert/tags/pull-hmp-20170517:
ramblock: add new hmp command "info ramblock"
utils: provide size_to_str()
ramblock: add RAMBLOCK_FOREACH()
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
pci, virtio, vhost: fixes
A bunch of fixes that missed the release.
Most notably we are reverting shpc back to enabled by default state
as guests uses that as an indicator that hotplug is supported
(even though it's unused). Unfortunately we can't fix this
on the stable branch since that would break migration.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 17 May 2017 10:42:06 PM BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* mst/tags/for_upstream:
exec: abstract address_space_do_translate()
pci: deassert intx when pci device unrealize
virtio: allow broken device to notify guest
Revert "hw/pci: disable pci-bridge's shpc by default"
acpi-defs: clean up open brace usage
ACPI: don't call acpi_pcihp_device_plug_cb on xen
iommu: Don't crash if machine is not PC_MACHINE
pc: add 2.10 machine type
pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot
libvhost-user: fix crash when rings aren't ready
hw/virtio: fix vhost user fails to startup when MQ
hw/arm/virt: generate 64-bit addressable ACPI objects
hw/acpi-defs: replace leading X with x_ in FADT field names
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This function is an abstraction helper for address_space_translate() and
address_space_get_iotlb_entry(). It does the lookup of address into
memory region section, then does proper IOMMU translation if necessary.
Refactor the two existing functions to use it.
This fixes vhost when IOMMU is disabled by guest.
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If a pci device is not reset by VM (by writing into config space)
and unplugged by VM, after that when VM reboots, qemu may assert:
pcibus_reset: Assertion `bus->irq_count[i] == 0' failed
Cc: qemu-stable@nongnu.org
Signed-off-by: herongguang <herongguang.he@huawei.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to section 2.1.2 of the virtio-1 specification:
"The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that
a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET,
the device MUST send a device configuration change notification to the
driver."
Commit "f5ed36635d8f virtio: stop virtqueue processing if device is broken"
introduced a virtio_error() call that just does that:
- internally mark the device as broken
- set the DEVICE_NEEDS_RESET bit in the status
- send a configuration change notification
Unfortunately, virtio_notify_vector(), called by virtio_notify_config(),
returns right away when the device is marked as broken and the notification
isn't sent in this case.
The spec doesn't say whether a broken device can send notifications
in other situations or not. But since the driver isn't supposed to do
anything but to reset the device, it makes sense to keep the check in
virtio_notify_config().
Marking the device as broken AFTER the configuration change notification was
sent is enough to fix the issue.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This reverts commit dc0ae76770.
Disabling the shpc controller has an undesired side effect.
The PCI bridge remains with no attached devices at boot time,
and the guest operating systems do not allocate any resources
for it, leaving the bridge unusable. Note that the behaviour
is dictated by the pci bridge specification.
Revert the commit and leave the shpc controller even if is not
actually used by any architecture. Slot 0 remains unusable at boot time.
Keep shpc off for QEMU 2.9 machines.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To dump information about ramblocks. It looks like:
(qemu) info ramblock
Block Name PSize Offset Used Total
/objects/mem 2 MiB 0x0000000000000000 0x0000000080000000 0x0000000080000000
vga.vram 4 KiB 0x0000000080060000 0x0000000001000000 0x0000000001000000
/rom@etc/acpi/tables 4 KiB 0x00000000810b0000 0x0000000000020000 0x0000000000200000
pc.bios 4 KiB 0x0000000080000000 0x0000000000040000 0x0000000000040000
0000:00:03.0/e1000.rom 4 KiB 0x0000000081070000 0x0000000000040000 0x0000000000040000
pc.rom 4 KiB 0x0000000080040000 0x0000000000020000 0x0000000000020000
0000:00:02.0/vga.rom 4 KiB 0x0000000081060000 0x0000000000010000 0x0000000000010000
/rom@etc/table-loader 4 KiB 0x00000000812b0000 0x0000000000001000 0x0000000000001000
/rom@etc/acpi/rsdp 4 KiB 0x00000000812b1000 0x0000000000001000 0x0000000000001000
Ramblock is something hidden internally in QEMU implementation, and this
command should only be used by mostly QEMU developers on RAM stuff. It
is not a command suitable for QMP interface. So only HMP interface is
provided for it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-4-git-send-email-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Moving the algorithm from print_type_size() into size_to_str() so that
other component can also leverage it. With that, refactor
print_type_size().
The assert() in that logic is removed though, since even UINT64_MAX
would not overflow.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-3-git-send-email-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
x86 and machine queue, 2017-05-17
# gpg: Signature made Wed 17 May 2017 02:37:54 PM BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* ehabkost/tags/x86-and-machine-pull-request: (22 commits)
tests: Add [+-]feature and feature=on|off test cases
s390-pcibus: No need to set user_creatable=false explicitly
xen-sysdev: Remove user_creatable flag
virtio-mmio: Remove user_creatable flag
sysbus-ohci: Remove user_creatable flag
hpet: Remove user_creatable flag
generic-sdhci: Remove user_creatable flag
esp: Remove user_creatable flag
fw_cfg: Remove user_creatable flag
unimplemented-device: Remove user_creatable flag
isabus-bridge: Remove user_creatable flag
allwinner-ahci: Remove user_creatable flag
sysbus-ahci: Remove user_creatable flag
kvmvapic: Remove user_creatable flag
ioapic: Remove user_creatable flag
kvmclock: Remove user_creatable flag
pflash_cfi01: Remove user_creatable flag
fdc: Remove user_creatable flag from sysbus-fdc & SUNW,fdtwo
iommu: Remove FIXME comment about user_creatable=true
xen-backend: Remove FIXME comment about user_creatable flag
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add test code to ensure features are enabled/disabled correctly in the
command-line. The test case use the "feature-words" and
"filtered-features" properties to check if the features were
enabled/disabled correctly.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170508183205.10884-1-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
commit 33cd52b5d7 unset
cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
sysbus devices appear on "-device help" and lack the "no-user"
flag in "info qdm".
To fix this, we can set user_creatable=false by default on
TYPE_SYS_BUS_DEVICE, but this requires setting
user_creatable=true explicitly on the sysbus devices that
actually work with -device.
Fortunately today we have just a few has_dynamic_sysbus=1
machines: virt, pc-q35-*, ppce500, and spapr.
virt, ppce500, and spapr have extra checks to ensure just a few
device types can be instantiated:
* virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE.
* ppce500 supports only TYPE_ETSEC_COMMON.
* spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.
This patch sets user_creatable=true explicitly on those 4 device
classes.
Now, the more complex cases:
pc-q35-*: q35 has no sysbus device whitelist yet (which is a
separate bug). We are in the process of fixing it and building a
sysbus whitelist on q35, but in the meantime we can fix the
"-device help" and "info qdm" bugs mentioned above. Also, despite
not being strictly necessary for fixing the q35 bug, reducing the
list of user_creatable=true devices will help us be more
confident when building the q35 whitelist.
xen: We also have a hack at xen_set_dynamic_sysbus(), that sets
has_dynamic_sysbus=true at runtime when using the Xen
accelerator. This hack is only used to allow xen-backend devices
to be dynamically plugged/unplugged.
This means today we can use -device with the following 22 device
types, that are the ones compiled into the qemu-system-x86_64 and
qemu-system-i386 binaries:
* allwinner-ahci
* amd-iommu
* cfi.pflash01
* esp
* fw_cfg_io
* fw_cfg_mem
* generic-sdhci
* hpet
* intel-iommu
* ioapic
* isabus-bridge
* kvmclock
* kvm-ioapic
* kvmvapic
* SUNW,fdtwo
* sysbus-ahci
* sysbus-fdc
* sysbus-ohci
* unimplemented-device
* virtio-mmio
* xen-backend
* xen-sysdev
This patch adds user_creatable=true explicitly to those devices,
temporarily, just to keep 100% compatibility with existing
behavior of q35. Subsequent patches will remove
user_creatable=true from the devices that are really not meant to
user-creatable on any machine, and remove the FIXME comment from
the ones that are really supposed to be user-creatable. This is
being done in separate patches because we still don't have an
obvious list of devices that will be whitelisted by q35, and I
would like to get each device reviewed individually.
Cc: Alexander Graf <agraf@suse.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Beniamino Galvani <b.galvani@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Pierre Morel <pmorel@linux.vnet.ibm.com>
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-arm@nongnu.org
Cc: qemu-block@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Shannon Zhao <zhaoshenglong@huawei.com>
Cc: sstabellini@kernel.org
Cc: Thomas Huth <thuth@redhat.com>
Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[ehabkost: Small changes at sysbus_device_class_init() comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
cannot_instantiate_with_device_add_yet was introduced by commit
efec3dd631 to replace no_user. It was
supposed to be a temporary measure.
When it was introduced, we had 54
cannot_instantiate_with_device_add_yet=true lines in the code.
Today (3 years later) this number has not shrunk: we now have
57 cannot_instantiate_with_device_add_yet=true lines. I think it
is safe to say it is not a temporary measure, and we won't see
the flag go away soon.
Instead of a long field name that misleads people to believe it
is temporary, replace it a shorter and less misleading field:
user_creatable.
Except for code comments, changes were generated using the
following Coccinelle patch:
@@
expression DC;
@@
(
-DC->cannot_instantiate_with_device_add_yet = false;
+DC->user_creatable = true;
|
-DC->cannot_instantiate_with_device_add_yet = true;
+DC->user_creatable = false;
)
@@
typedef ObjectClass;
expression dc;
identifier class, data;
@@
static void device_class_init(ObjectClass *class, void *data)
{
...
dc->hotpluggable = true;
+dc->user_creatable = true;
...
}
@@
@@
struct DeviceClass {
...
-bool cannot_instantiate_with_device_add_yet;
+bool user_creatable;
...
}
@@
expression DC;
@@
(
-!DC->cannot_instantiate_with_device_add_yet
+DC->user_creatable
|
-DC->cannot_instantiate_with_device_add_yet
+!DC->user_creatable
)
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-2-ehabkost@redhat.com>
[ehabkost: kept "TODO remove once we're there" comment]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The function is only used once, and nothing else in migration knows
about objects. Create the function vmstate_device_is_migratable() in
savem.c that really do the bit that is related with migration.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Yes, we don't have a good place to put that stuff.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
It is only used by migration, so move it there.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reflects better what it does now, and avoid confussions with
RAM_SAVE_FLAG_COMPRESS_PAGE.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Compression threads got broken on commit
commit 2479569466
Author: Juan Quintela <quintela@redhat.com>
Date: Tue Mar 21 11:45:01 2017 +0100
ram: reorganize last_sent_block
On do_compress_ram_page() we use a different QEMUFile than the
migration one. We need to pass it there. The failure can be seen as:
(qemu) qemu-system-x86_64: Unknown combination of migration flags: 0
qemu-system-x86_64: error while loading state section id 3(ram)
qemu-system-x86_64: load of migration failed: Invalid argument
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
QEMU does not depends on libxencall, it was added because it was a
missing link dependency of libxendevicemodel, but now the later should
be built properly.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
The Xen mapcache is able to create long term mappings, they are called
"locked" mappings. The third parameter of the xen_map_cache call
specifies if a mapping is a "locked" mapping.
>From the QEMU point of view there are two kinds of long term mappings:
[a] device memory mappings, such as option roms and video memory
[b] dma mappings, created by dma_memory_map & friends
After certain operations, ballooning a VM in particular, Xen asks QEMU
kindly to destroy all mappings. However, certainly [a] mappings are
present and cannot be removed. That's not a problem as they are not
affected by balloonning. The *real* problem is that if there are any
mappings of type [b], any outstanding dma operations could fail. This is
a known shortcoming. In other words, when Xen asks QEMU to destroy all
mappings, it is an error if any [b] mappings exist.
However today we have no way of distinguishing [a] from [b]. Because of
that, we cannot even print a decent warning.
This patch introduces a new "dma" bool field to MapCacheRev entires, to
remember if a given mapping is for dma or is a long term device memory
mapping. When xen_invalidate_map_cache is called, we print a warning if
any [b] mappings exist. We ignore [a] mappings.
Mappings created by qemu_map_ram_ptr are assumed to be [a], while
mappings created by address_space_map->qemu_ram_ptr_length are assumed
to be [b].
The goal of the patch is to make debugging and system understanding
easier.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Instead, put the CURLAIOCB on a wait list and yield; curl_clean_state will
wake the corresponding coroutine.
Because of CURL's callback-based structure, we cannot easily convert
everything to CoMutex/CoQueue; keeping the QemuMutex is simpler. However,
CoQueue is a simple wrapper around a linked list, so we can easily
use QSIMPLEQ and open-code a CoQueue, protected by the BDRVCURLState
QemuMutex instead of a CoMutex.
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-8-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This is pretty simple. The bottom half goes away because, unlike
bdrv_aio_readv, coroutine-based read can return immediately without
yielding. However, for simplicity I kept the former bottom half
handler in a separate function.
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-7-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This is in preparation for the conversion from bdrv_aio_readv to
bdrv_co_preadv, and it also requires changing some of the size_t values
to uint64_t. This was broken before for disks > 2TB, but now it would
break at 4GB.
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-6-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
If curl_easy_init fails, a CURLState is left with s->in_use = 1. Split
curl_init_state in two, so that we can distinguish the two failures and
call curl_clean_state if needed.
While at it, simplify curl_find_state, removing a dummy loop. The
aio_poll loop is moved to the sole caller that needs it.
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-5-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The curl driver has a ugly hack where, if it cannot find an empty CURLState,
it just uses aio_poll to wait for one to be empty. This is probably
buggy when used together with dataplane, and the simplest way to fix it
is to use coroutines instead.
A more immediate effect of the bug however is that it can cause a
recursive call to curl_readv_bh_cb and recursively taking the
BDRVCURLState mutex. This causes a deadlock.
The fix is to unlock the mutex around aio_poll, but for cleanliness we
should also take the mutex around all calls to curl_init_state, even if
reaching the unlock/lock pair is impossible. The same is true for
curl_clean_state.
Reported-by: Kun Wei <kuwei@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170515100059.15795-4-pbonzini@redhat.com
Cc: qemu-stable@nongnu.org
Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
I volunteer to review NetBSD patches.
Adding myself will help to not miss some of them.
Restore NetBSD as a maintained host.
All patches to make qemu/pkgsrc building have been emitted to review.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Message-id: 20170513022143.2838-1-n54@gmx.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix for CVE-2017-7493.
# gpg: Signature made Mon 15 May 2017 07:48:20 PM BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@fr.ibm.com>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* gkurz/tags/security-fix-for-2.10:
9pfs: local: forbid client access to metadata (CVE-2017-7493)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Queued target/sh4 patches
# gpg: Signature made Sat 13 May 2017 10:25:41 AM BST
# gpg: using RSA key 0xBA9C78061DDD8C9B
# gpg: Good signature from "Aurelien Jarno <aurelien@aurel32.net>"
# gpg: aka "Aurelien Jarno <aurelien@jarno.fr>"
# gpg: aka "Aurelien Jarno <aurel32@debian.org>"
# Primary key fingerprint: 7746 2642 A9EF 94FD 0F77 196D BA9C 7806 1DDD 8C9B
* aurel32/tags/pull-target-sh4-20170513:
target/sh4: use cpu_loop_exit_restore
target/sh4: trap unaligned accesses
target/sh4: movua.l is an SH4-A only instruction
target/sh4: implement tas.b using atomic helper
target/sh4: generate fences for SH4
target/sh4: optimize gen_write_sr using extract op
target/sh4: optimize gen_store_fpr64
target/sh4: fold ctx->bstate = BS_BRANCH into gen_conditional_jump
target/sh4: only save flags state at the end of the TB
target/sh4: fix BS_EXCP exit
target/sh4: fix BS_STOP exit
target/sh4: move DELAY_SLOT_TRUE flag into a separate global
target/sh4: do not include DELAY_SLOT_TRUE in the TB state
target/sh4: get rid of DELAY_SLOT_CLEARME
target/sh4: split ctx->flags into ctx->tbflags and ctx->envflags
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
usb: bugfixes, doc update
# gpg: Signature made Fri 12 May 2017 01:20:29 PM BST
# gpg: using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138
* kraxel/tags/pull-usb-20170512-1:
hw/usb/dev-serial: Do not try to set vendorid or productid properties
xhci: relax link check
usb-hub: clear PORT_STAT_SUSPEND on wakeup
xhci: fix logging
usb-redir: fix stack overflow in usbredir_log_data
qemu-doc: Update to use the new way of attaching USB devices
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When using the mapped-file security mode, we shouldn't let the client mess
with the metadata. The current code already tries to hide the metadata dir
from the client by skipping it in local_readdir(). But the client can still
access or modify it through several other operations. This can be used to
escalate privileges in the guest.
Affected backend operations are:
- local_mknod()
- local_mkdir()
- local_open2()
- local_symlink()
- local_link()
- local_unlinkat()
- local_renameat()
- local_rename()
- local_name_to_path()
Other operations are safe because they are only passed a fid path, which
is computed internally in local_name_to_path().
This patch converts all the functions listed above to fail and return
EINVAL when being passed the name of the metadata dir. This may look
like a poor choice for errno, but there's no such thing as an illegal
path name on Linux and I could not think of anything better.
This fixes CVE-2017-7493.
Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
x86 and machine queue, 2017-05-11
Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification
# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
migration/i386: Remove support for pre-0.12 formats
vmstatification: i386 FPReg
migration/i386: Remove old non-softfloat 64bit FP support
tests: check -numa node,cpu=props_list usecase
numa: add '-numa cpu,...' option for property based node mapping
numa: remove node_cpu bitmaps as they are no longer used
numa: use possible_cpus for not mapped CPUs check
machine: call machine init from wrapper
numa: remove no longer need numa_post_machine_init()
tests: numa: add case for QMP command query-cpus
QMP: include CpuInstanceProperties into query_cpus output output
virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
numa: mirror cpu to node mapping in MachineState::possible_cpus
numa: add check that board supports cpu_index to node mapping
virt-arm: add node-id property to CPU
pc: add node-id property to CPU
spapr: add node-id property to sPAPR core
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
ppc patch queue for 2017-05-11
This pull request supersedes the one from yesterday (20170510), fixing
an important style bug in one patch, and adding an extra couple of
simple patches.
Highlights of this set:
* Some fixes for POWER9
* TCG support for POWER9 radix MMU
* VGA rom for Mac machine types
* Fixes for the XICS interrupt controller
* MTTCG support for ppc targets
As suggested by Paolo, I've tried to add the Docker tests to my
standard pre-pull-request tests. I haven't wholly suceeded; this has
been tested with some of the Docker images, but others I haven't
managed due to problems that as best I can tell are not due to
problems in this patch series. I'll continue working on this for
future pull requests. Specifically, 'travis', 'fedora', and 'centos6'
seem to work. 'min-glib' jammed while gtesting moxie, which seems
very unlikely to be caused by this series. 'ubuntu', 'debian' and
'debian-bootstrap' hit build errors almost immediately that look like
problems with the container configuration, and 'debian-*-cross' hit
build errors later on which also look like missing dependencies from
the container.
# gpg: Signature made Thu 11 May 2017 05:13:46 AM BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* dgibson/tags/ppc-for-2.10-20170511: (23 commits)
target/ppc: Avoid printing wrong aliases in CPU help text
pnv: Fix build failures on some host platforms
target/ppc: Allow workarounds for POWER9 DD1
spapr: Don't accidentally advertise HTM support on POWER9
ppc: xics: fix compilation with CentOS 6
target/ppc: Enable RADIX mmu mode for pseries TCG guest
target/ppc: Implement ISA V3.00 radix page fault handler
target/ppc: Change tlbie invalid fields for POWER9 support
target/ppc: Update tlbie to check privilege level based on GTSE
target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE
ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs
ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs
Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule
Add QemuMacDrivers as submodule
ppc/xics: preserve P and Q bits for KVM IRQs
ppc/xics: Fix stale irq->status bits after get
target/ppc: do not reset reserve_addr in exec_enter
tcg: enable MTTCG by default for PPC64 on x86
cpus: Fix CPU unplug for MTTCG
target/ppc: Generate fence operations
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use cpu_loop_exit_restore when using cpu_restore_state and cpu_loop_exit
together.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
SH4 requires that memory accesses are naturally aligned, except for the
SH4-A movua.l instructions which can do unaligned loads.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
At the same time change the comment describing the instruction the same
way than other instruction, so that the code is easier to read and search.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We only emulate UP SH4, however as the tas.b instruction is used in the GNU
libc, this improve linux-user emulation.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This doesn't change the generated code on x86, but optimizes it on most
RISC architectures and makes the code simpler to read.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is no need to save flags when entering and exiting the delay slot.
They can be saved only when reaching the end of the TB. If the TB is
interrupted before by an exception, they will be restored using
restore_state_to_opc.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In case of exception, there is no need to call tcg_gen_exit_tb as the
exception helper won't return.
Also fix a few cases where BS_BRANCH is called instead of BS_EXCP.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When stopping the translation because the state has changed, goto_tb
should not be used as it might link TB with different flags.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of using one bit of the env flags to store the condition of the
next delay slot, use a separate global. It simplifies reading and
writing the flags variable and also removes some confusion between
ctx->envflags and env->flags.
Note that the global is first transfered to a temp in order to be
able to discard the global before the brcond.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
DELAY_SLOT_TRUE is used as a dynamic condition for the branch after the
delay slot instruction. It is not used in code generation, so there is
no need to including in the TB state.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that ctx->flags has been split, it becomes clear that
DELAY_SLOT_CLEARME has not impact on the code generation: in both case
ctx->envflags is cleared, either by clearing all the flags, or by
setting it to 0. This is left-over from pre-TCG era.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is a confusion (and not only in the SH4 target) between tb->flags,
env->flags and ctx->flags. To avoid it, split ctx->flags into
ctx->tbflags and ctx->envflags. ctx->tbflags stays unchanged during the
whole TB translation, while ctx->envflags evolves and is kept in sync
with env->flags using TCG instructions. ctx->envflags now only contains
the part that of env->flags that is contained in the TB state, i.e. the
DELAY_SLOT* flags.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
All of the interlocked access facility instructions raise a
specification exception for unaligned accesses. Do this by
using the (previously unused) unaligned_access hook.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count
returns 0, cause bitmap_new call abort.
The abort can be triggered as follows:
$ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu
$ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G
(gdb) bt
#0 0x00007ffff04e25f7 in raise () from /lib64/libc.so.6
#1 0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6
#2 0x00005555559de905 in bitmap_new (nbits=<optimized out>)
at /home/root/git/qemu2.git/include/qemu/bitmap.h:96
#3 cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399
#4 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341
#5 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341
#6 0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560,
type=type@entry=0x55555656bd30) at qom/object.c:376
#7 0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484
#8 0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu")
at qom/object.c:494
#9 0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0,
errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101
#10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690)
at /home/root/git/qemu2.git/hw/i386/pc.c:1184
#11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121
#12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The main loop uses aio_disable_external()/aio_enable_external() to
temporarily disable processing of external AioContext clients like
device emulation.
This allows monitor commands to quiesce I/O and prevent the guest from
submitting new requests while a monitor command is in progress.
The aio_enable_external() API is currently broken when an IOThread is in
aio_poll() waiting for fd activity when the main loop re-enables
external clients. Incrementing ctx->external_disable_cnt does not wake
the IOThread from ppoll(2) so fd processing remains suspended and leads
to unresponsive emulated devices.
This patch adds an aio_notify() call to aio_enable_external() so the
IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.
The bug can be reproduced as follows:
$ qemu -M accel=kvm -m 1024 \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
-drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
-device scsi-hd,id=scsi-hd0,drive=drive0 \
-qmp tcp::5555,server,nowait
$ scripts/qmp/qmp-shell localhost:5555
(qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
mode=absolute-paths format=qcow2
After blockdev-snapshot-sync completes the SCSI disk will be
unresponsive. This leads to request timeouts inside the guest.
Reported-by: Qianqian Zhu <qizhu@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170508180705.20609-1-stefanha@redhat.com
Suggested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.
Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170504173745.27414-1-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The GThread implementation is not functional enough to actually
run QEMU reliably. While it was potentially useful for debugging,
we have a scripts/qemugdb/coroutine.py to enable tracing of
ucontext coroutines in GDB, so that removes the only reason for
GThread to exist.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is unnecessary to assign 'packed_bytes' to 'estimated_bytes', because 'estimated_bytes' unused after assignment.
Signed-off-by: Wei Qi <weiqi4@huawei.com>
Reviewed-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When starting QEMU with the legacy USB serial device like this:
qemu-system-x86_64 -usbdevice serial:vendorid=0x1234:stdio
it currently aborts since the vendorid property does not exist
anymore (it has been removed by commit f29783f72e):
Unexpected error in object_property_find() at qemu/qom/object.c:1008:
qemu-system-x86_64: -usbdevice serial:vendorid=0x1234:stdio: Property
'.vendorid' not found
Aborted (core dumped)
Fix this crash by issuing a more friendly error message instead
(and simplify the code also a little bit this way).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1493883704-27604-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The strict td link limit added by commit "05f43d4 xhci: limit the
number of link trbs we are willing to process" causes problems with
Windows guests. Let's raise the limit.
This change is analogous to:
commit ab6b1105a2
Author: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue Mar 7 09:40:18 2017 +0100
ohci: relax link check
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 20170512102100.22675-1-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The spec says:
Suspend: (PORT_SUSPEND) This field indicates whether or not the device
on this port is suspended. Setting this field causes the device to
suspend by not propagating bus traffic downstream. This field may be
reset by a request or by resume signaling from the device attached to
the port.
I can't find any specific statement like "the PORT_SUSPEND field is reset
automatically on remote wakeup", but without this patch, the only way to
reset it is via the ClearPortFeature request so the ".. or by resume
signaling from the device" clause is clearly not implemented on the remote
wakeup path.
The default xhci Windows driver does not issue the ClearPortFeature request
and suspended devices attached to a hub don't properly get out of the
suspended state. Interestingly, the default uhci Windows driver *does*
issue the ClearPortFeature request and does not exhibit this problem.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 20170511125314.24549-3-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The preferred way of adding USB devices is via "-device" and
"device_add" nowadays, so let's start to get rid of "-usbdevice"
and "usb_add" in the documentation. While we're at it, also
add the new USB devices there which have been added to QEMU
during the last years, and get rid of the old "vendorid" and
"productid" parameters of "-usbdevice serial" which have been
removed in QEMU version 0.14.0 already.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1494256429-31720-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add egl-headless user interface. It doesn't provide a real user
interface, it only provides opengl support using drm render nodes.
It will copy back the bits rendered by the guest using virgl back
to a DisplaySurface and kick the usual display update code paths,
so spice and vnc and screendump can pick it up.
Use it this way:
qemu -display egl-headless -vnc $display
qemu -display egl-headless -spice gl=off,$args
Note that you should prefer native spice opengl support (-spice
gl=on) if possible because that delivers better performance.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170505104101.30589-7-kraxel@redhat.com
When running on gtk we need X11 platform not mesa platform.
Create separate functions for mesa and x11 so we can keep
the egl #ifdef mess local to egl-helpers.c
Fixes: 0ea1523fb6
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170505104101.30589-4-kraxel@redhat.com
Remove support for versions of the CPU state prior to 11
which is the version used in qemu 0.12 - you'd be pretty
lucky if you got a migration stream to work from anything
that old anyway. This doesn't affect the machine type
definition in any way.
My main reason for doing this is the hack for sysenter_esp/eip
that uses .get/.put's in state versions less than 7 (that's
prior to somewhere before 0.10).
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170405190024.27581-4-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Long long ago, we used to support storing the x86 FP registers in
a 64bit format.
Then c31da136a0 in v0.14-rc0 removed
the last support for writing that in the migration format.
Even before that, it was only used if you had softfloat disabled
(i.e. !USE_X86LDOUBLE) so in practice use of it in even earlier
qemu is unlikely for most users.
Kill it off, it's complicated, and possibly broken.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170405190024.27581-2-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.
This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).
At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.
Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.
For example:
# exact mapping x86
-numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
# exact mapping SPAPR
-numa cpu,node-id=x,core-id=y
# wildcard mapping, all cpu objects that match socket-id=y
# are mapped to node-id=x
-numa cpu,node-id=x,socket-id=y
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.
Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
if board supports CpuInstanceProperties, report them for
each CPU thread listed. Main motivation for this is to
provide these properties introspection via QMP interface
for using in test cases to verify numa node to cpu mapping,
which includes not only boards that support cpu hotplug
and have this info in query-hotpluggable-cpus (pc/spapr)
but also for boards that don't not support hotpluggable-cpus
but support numa mapping (virt-arm).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1494415802-227633-12-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.
Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
it will allow switching from cpu_index to property based
numa mapping in follow up patches.
PS:
patch changes default value of CPUState::numa_node from 0
to CPU_UNSET_NUMA_NODE_ID. The only place for x86 that
would affected is monitor's 'infor numa' command which
uses that field. However legacy 0 value is still preserved
by pc_cpu_pre_plug() in this patch if user/numa.c hasn't
set it explicitly, so there is no change in behavior.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.
As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.
In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Currently cpu_index is implicitly auto assigned during
cpu.realize() time cpu_exec_realizefn()->cpu_list_add().
It happens to match index in possible_cpus so take
control over it and make board initialize cpu_index
to possible_cpus index explicitly. It will at least
document that board is in control of it and when
'-device cpu' support comes it will keep cpu_index
stable regardless of order cpus are created so it won't
break migration.
Within this series it will be used for internal
conversion from storing cpu_index based NUMA node
bitmaps to property based mapping with possible_cpus,
And will allow map cpu_index to a CPU entry in
possible_cpus array.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1493816238-33120-5-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
for now precalculate and store mp_afinity in possible_cpus
as ARM cpus don't have socket/core/thread-id properties yet.
In follow patches possible_cpus will be used for storing
and setting NUMA node mapping and replace legacy bitmap
based numa_info[node_id].node_cpu/numa_get_node_for_cpu()
For the lack of better idea, this patch cannibalizes
possible_cpus.cpus[x].props.thread_id so that
*_cpu_index_to_props() callback could return addressable
by props CPU which will be used by machine_set_cpu_numa_node()
in follow up patches to assign a CPU to node. But
cannibalizing is fine for now as that thread_id isn't exposed
to users (no hotpluggable_cpus callback support for ARM yet)
and it will be used only internally until 'device_add cpu'
is supported where we can decide on which properties to use.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1493816238-33120-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.
This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.
To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.
We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.
Example:
qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \
-numa node -numa node -numa node \
-numa node -numa node -numa node
Before:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB
After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB
[1] https://en.wikipedia.org/wiki/Error_diffusion
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.
With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:
```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=3,val=21 \
```
Signed-off-by: He Chen <he.chen@linux.intel.com>
Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Change the nested if statements into a flat format, to make
it clearer what validation / capping is being performed on
different CPUID index values.
NB this changes behaviour when "index > env->cpuid_xlevel2".
This won't have any guest-visible effect because no there is
no CPUID[0xC0000001] feature supported by TCG, and KVM code
will never call cpu_x86_cpuid() with such an index value.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170509132736.10071-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Block patches for the block queue.
# gpg: Signature made Thu May 11 14:28:41 2017 CEST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-05-11: (22 commits)
MAINTAINERS: Add qemu-progress to the block layer
qcow2: Discard/zero clusters by byte count
qcow2: Assert that cluster operations are aligned
qcow2: Optimize write zero of unaligned tail cluster
iotests: Add test 179 to cover write zeroes with unmap
iotests: Improve _filter_qemu_img_map
qcow2: Optimize zero_single_l2() to minimize L2 churn
qcow2: Make distinction between zero cluster types obvious
qcow2: Name typedef for cluster type
qcow2: Correctly report status of preallocated zero clusters
block: Update comments on BDRV_BLOCK_* meanings
qcow2: Use consistent switch indentation
qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
tests: Add coverage for recent block geometry fixes
blkdebug: Add ability to override unmap geometries
blkdebug: Simplify override logic
blkdebug: Add pass-through write_zero and discard support
blkdebug: Refactor error injection
blkdebug: Sanity check block layer guarantees
qemu-io: Switch 'map' output to byte-based reporting
...
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness. Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster. Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once. All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-13-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.
Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-12-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes. The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.
However, note that for now, the special case of end-of-file is only
recognized if there is no backing file, or if the backing file has
the same length; that's because when the backing file is shorter
than the active layer, we don't have code in place to recognize
that reads of a sector unallocated at the top and beyond the backing
end-of-file are implicitly zero. It's not much of a real loss,
because most people don't use images that aren't cluster-aligned,
or where the active layer is a different size than the backing
layer (especially where the difference falls within a single cluster).
Update test 154 to cover the new scenarios, using two images of
intentionally differing length.
While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-11-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
No tests were covering write zeroes with unmap. Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.
The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters - in part, the largest consecutive allocation
we tend to get is often bounded by the size covered by one L2
table).
Note that testing for zero clusters is tricky: 'qemu-io map'
reports whether data comes from the current layer of the image
(useful for sniffing out which regions of the file have
QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
while 'qemu-img map' sees "zero":true for both unallocated and
zero clusters for any qcow2 with no backing layer (so less useful
at detecting true zero clusters), but reliably shows mappings.
So we have to rely on both queries side-by-side at each point of
the test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-10-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Although _filter_qemu_img_map documents that it scrubs offsets, it
was only doing so for human mode. Of the existing tests using the
filter (97, 122, 150, 154, 176), two of them are affected, but it
does not hurt the validity of the tests to not require particular
mappings (another test, 66, uses offsets but intentionally does not
pass through _filter_qemu_img_map, because it checks that offsets
are unchanged before and after an operation).
Another justification for this patch is that it will allow a future
patch to utilize 'qemu-img map --output=json' to check the status of
preallocated zero clusters without regards to the mapping (since
the qcow2 mapping can be very sensitive to the chosen cluster size,
when preallocation is not in use).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-9-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.
Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.
Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated. But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request. Thus, this patch is intentionally limited to just clusters
that are already marked as zero.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-8-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Treat plain zero clusters differently from allocated ones, so that
we can simplify the logic of checking whether an offset is present.
Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.
I tried to arrange the enum so that we could use
'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
I didn't actually end up taking advantage of the layout.
In many cases, this leads to simpler code, by properly combining
cases (sometimes, both zero types pair together, other times,
plain zero is more like unallocated while allocated zero is more
like normal).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-7-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Although it doesn't add all that much type safety (this is C, after
all), it does add a bit of legibility to use the name QCow2ClusterType
instead of a plain int.
In particular, qcow2_get_cluster_offset() has an overloaded return
type; a QCow2ClusterType on success, and -errno on failure; keeping
the cluster type in a separate variable makes it slightly easier for
the next patch to make further computations based on the type.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-6-eblake@redhat.com
[mreitz: Use the new type in two more places (one of them pulled from
the next patch)]
Signed-off-by: Max Reitz <mreitz@redhat.com>
We were throwing away the preallocation information associated with
zero clusters. But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.
count_contiguous_clusters_by_type() is now used only for unallocated
cluster runs, hence it gets renamed and tightened.
Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json. The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.
The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-5-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly. Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back a specific
BDS (not necessarily bs->file). As the 8-way table is the
intended semantics, simplify the rest of the text to get rid of
the confusion.
ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it). RAW is used only internally, but
by more than the raw driver. Document these additional items on
the driver callback.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-4-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
In order to keep checkpatch happy when the next patch changes
indentation, we first have to shorten some long lines. The easiest
approach is to use a new variable in place of
'offset & L2E_OFFSET_MASK', except that 'offset' is the best name
for that variable. Change '[old_]offset' to '[old_]entry' to
make room.
While touching things, also fix checkpatch warnings about unusual
'for' statements.
Suggested by Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-2-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Use blkdebug's new geometry constraints to emulate setups that
have needed past regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries. Also, add
coverage that the block layer is honoring max transfer limits.
For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.
For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:
qemu-io: discard 30M at 80000001, passed to blkdebug
blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
qcow2: discard 13M bytes at 77M, succeeds
blkdebug: discard 15M bytes at 90M, passed to qcow2
qcow2: discard 15M bytes at 90M, succeeds
blkdebug: discard 1356800 bytes at 105M, passed to qcow2
qcow2: discard 1M at 105M, succeeds
qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-10-eblake@redhat.com
[mreitz: For cooperation with image locking, add -r to the qemu-io
invocation which verifies the image content]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-9-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid. This however requires that the struct store
a 64-bit number, rather than a narrower type. Likewise, setting
a sane errno value in ret prior to the sequence of parsing and
jumping to out: on error makes it easier for the next patch to
add a chain of similar checks.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-8-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions. It also allows us to inject errors on
those operations, just like we can for read/write/flush.
We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).
This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment. If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.
Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M
Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire. Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-7-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-6-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug. For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-5-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Mixing byte offset and sector allocation counts is a bit
confusing. Also, reporting n/m sectors, where m decreases
according to the remaining size of the file, isn't really
adding any useful information; and reporting an offset at
both the front and end of the line, with large amounts of
whitespace, is pointless. Update the output to use byte
counts and shorter lines, then adjust the affected tests
(./check -qcow2 102, ./check -vpc 146).
Note that 'qemu-io map' is MUCH weaker than 'qemu-img map';
the former only shows which regions of the active layer are
allocated, without regards to where the allocation comes from
or whether the allocated portion is known to read as zero
(because it is using the weaker bdrv_is_allocated()); while the
latter (especially in --output=json mode) reports more details
from bdrv_get_block_status().
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-4-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
For the 'alloc' command, accepting an offset in bytes but a length
in sectors, and reporting output in sectors, is confusing. Do
everything in bytes, and adjust the expected output accordingly.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-3-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Several copy-and-pasted alignment checks exist in qemu-io, which
could use some minor improvements:
- Manual comparison against 0x1ff is not as clean as using our
alignment macros (QEMU_IS_ALIGNED) from osdep.h.
- The error messages aren't quite grammatically correct.
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-2-eblake@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
On error path (like i/o error in one of the coroutines), it's required to
- wait for coroutines completion before cleaning the common structures
- reenter dependent coroutines so they ever finish
Introduced in 2d9187bc65.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that the block layer takes care to request a lot less permissions
for inactive nodes, the special-casing in file-posix isn't necessary any
more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Format drivers for inactive nodes don't need write/resize permissions on
their bs->file and can share write/resize with another VM (in fact, this
is the whole point of keeping images inactive). Represent this fact in
the op blocker system, so that image locking does the right thing
without special-casing inactive images.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The proper order for inactivating block nodes is that first the parents
get inactivated and then the children. If we do things in this order, we
can assert that we didn't accidentally leave a parent activated when one
of its child nodes is inactive.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
With image locking, permissions affect other qemu processes as well. We
want to be sure that the destination can run, so let's drop permissions
on the source when migration completes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Instead of manually calling blk_resume_after_migration() in migration
code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend
activation with cache invalidation into a single function. This is
achieved with a new callback in BdrvChildRole that is called by
bdrv_invalidate_cache_all().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Migration code activates all block driver nodes on the destination when
the migration completes. It does so by calling
bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is
one code path for precopy and one for postcopy migration, resulting in
four function calls, which used to have three different failure modes.
This patch unifies the behaviour so that failure to activate all block
nodes is non-fatal, but the error message is logged and the VM isn't
automatically started. 'cont' will retry activating the block nodes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
066 was supposed to be a test "for discarding preallocated zero
clusters", but it did so incompletely: While it did check the image
file's integrity after the operation, it did not confirm that the
clusters are indeed freed. This patch adds this test.
In addition, new cases for writing to preallocated zero clusters are
added.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In discard_single_l2(), we completely discard normal clusters instead of
simply turning them into preallocated zero clusters. That means we
should probably do the same with such preallocated zero clusters:
Discard them instead of keeping them allocated.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just freeing preallocated zero clusters and completely
allocating them from scratch, reuse them.
We cannot do this in handle_copied(), however, since this is a COW
operation. Therefore, we have to add the new logic to handle_alloc() and
simply return the existing offset if it exists. The only catch is that
we have to convince qcow2_alloc_cluster_link_l2() not to free the old
clusters (because we have reused them).
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When calculating the number of reftable entries, we should actually use
the number of refblocks and not (wrongly[1]) re-calculate it.
[1] "Wrongly" means: Dividing the number of clusters by the number of
entries per refblock and rounding down instead of up.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This extends the permission bits of op blocker API to external using
Linux OFD locks.
Each permission in @perm and @shared_perm is represented by a locked
byte in the image file. Requesting a permission in @perm is translated
to a shared lock of the corresponding byte; rejecting to share the same
permission is translated to a shared lock of a separate byte. With that,
we use 2x number of bytes of distinct permission types.
virtlockd in libvirt locks the first byte, so we do locking from a
higher offset.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
They are wrappers of POSIX fcntl "file private locking", with a
convenient "try lock" wrapper implemented with F_OFD_GETLK.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Opening the backing image for the second time is bad, especially here
when it is also in use as the active image as the source. The
drive-backup job itself doesn't read from target->backing for COW,
instead it gets data from the write notifier, so it's not a big problem.
However, exporting the target to NBD etc. won't work, because of the
likely stale metadata cache.
Use BDRV_O_NO_BACKING in this case and manually set up the backing
BdrvChild.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The COLO block replication architecture requires one disk to be shared
between primary and secondary, in the test both processes use posix file
protocol (instead of over NBD) so it is affected by image locking.
Disable the lock.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We share the same set of QAPI options with file-posix, but locking is
not supported here. So error out if it is specified as 'on' for now.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Making this option available even before implementing it will let
converting tests easier: in coming patches they can specify the option
already when necessary, before we actually write code to lock the
images.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The test scenario doesn't require the same image, instead it focuses on
the duplicated node-name, so use null-co to avoid locking conflict.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the case where we test the expected error when a blockdev-snapshot
target already has a backing image, the backing chain is opened multiple
times. This will be a problem when we use image locking, so use a
different backing file that is not already open.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Double attach is not a valid usage of the target image, drive-backup
will open the blockdev itself so skip the add_drive call in this case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qemu-img info command is executed while VM is running, add -U option
to avoid the image locking error.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-img and qemu-io commands when guest is running need "-U" option,
add it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add --force-share/-U to program options and -U to open subcommand.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will force the opened images to allow sharing all permissions with other
programs.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It can be used outside of block.c for making user friendly messages.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds support for absolute pointer events to the input-linux
subsystem. This support was omitted from the original input-linux patch,
however most of the code required for it is already in place.
Support for absolute events is especially useful for guests with vga
passthrough. Since they have a physical monitor, none of normal channels
for sending video output (vnc, etc) are used, meaning they also can't be
used to send absolute input events. This leaves QMP as the only option
to send absolute input into vga passthrough guests, which is not its
intended use and is not efficient.
This patch allows, for example, uinput to be used to create virtual
absolute input devices. This lets you build external systems which share
physical input devices between guests. Without absolute input
capability, such external systems can't seamlessly share pointer devices
between guests.
Signed-off-by: Philippe Voinov <philippevoinov@gmail.com>
Message-id: 20170505134231.30210-1-philippevoinov@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch refactors ui/input.c to support absolute axis
minimum values other than 0. All dependent calls to qemu_input_queue_abs
have been updated to explicitly supply 0 as the axis minimum value.
Signed-off-by: Philippe Voinov <philippevoinov@gmail.com>
Message-id: 20170505133952.29885-1-philippevoinov@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When running with KVM, we update the "family" CPU alias to point
to the right host CPU type, so that it for example possible to
use "-cpu POWER8" on a POWER8NVL host. However, the function for
printing the list of available CPU models is called earlier than
the KVM setup code, so the output of "-cpu help" is wrong in that
case. Since it would be somewhat ugly anyway to have different
help texts depending on whether "-enable-kvm" has been specified
or not, we should better always print the same text, so fix this
issue by printing "alias for preferred XXX CPU" instead.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes some changes to fix build failures on the 'min-glib' docker
image, and maybe other platforms with a buildchain that's less tolerant
about duplicated typedefs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
POWER9 DD1 silicon has some bugs which mean it a) isn't really compliant
with the ISA v3.00 and b) require a number of special workarounds in the
kernel.
At the moment, qemu isn't aware of DD1. For TCG we don't really want it to
be (why bother emulating buggy silicon). But with KVM, the guest does need
to be aware of DD1 so it can apply the necessary workarounds.
Meanwhile, the feature negotiation between qemu and the guest strongly
favours architected compatibility modes to "raw" CPU modes. In combination
with the above, this means the guest sees architected POWER9 mode, and
doesn't apply the DD1 workarounds. Well, unless it has yet another
workaround to partially ignore what qemu tells it.
This patch addresses this by disabling support for compatibility modes when
using KVM on a POWER9 DD1 host.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.
However, this assumes that the HTM bit is off in the base template used for
the device tree value. That is true for POWER8, but not for POWER9.
It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".
Fixes: 9fb4541f58
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The PowerPCCPU typedef is included twice if a file includes
both hw/ppc/xics.h and target/ppc/cpu-qom.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.
In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.
A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9
Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA V3.00 introduced a new radix mmu model. Implement the page fault
handler for this so we can run a tcg guest in radix mode and perform
address translation correctly.
In real mode (mmu turned off) addresses are masked to remove the top
4 bits and then are subject to partition scoped translation, since we only
support pseries at this stage it is only necessary to perform the masking
and then we're done.
In virtual mode (mmu turned on) address translation if performed as
follows:
1. Use the quadrant to determine the fully qualified address.
The fully qualified address is defined as the combination of the effective
address, the effective logical partition id (LPID) and the effective
process id (PID). Based on the quadrant (EA63:62) we set the pid and lpid
like so:
quadrant 0: lpid = LPIDR, pid = PIDR
quadrant 1: HV only (not allowed in pseries)
quadrant 2: HV only (not allowed in pseries)
quadrant 3: lpid = LPIDR, pid = 0
If we can't get the fully qualified address we raise a segment interrupt.
2. Find the guest radix tree
We ask the virtual hypervisor for the partition table which was registered
with H_REGISTER_PROC_TBL which points us to the process table in guest
memory. We then index this table by pid to get the process table entry
which points us to the appropriate radix tree to translate the address.
If the process table isn't big enough to contain an entry for the current
pid then we raise a storage interrupt.
3. Walk the radix tree
Next we walk the radix tree where each level is a table of page directory
entries indexed by some number of bits from the effective address, where
the number of bits is determined by the table size. We continue to walk
the tree (while entries are valid and the table is of minimum size) until
we reach a table of page table entries, indicated by having the leaf bit
set. The appropriate pte is then checked for sufficient access permissions,
the reference and change bits are updated and the real address is
calculated from the real page number bits of the pte and the low bits of
the effective address.
If we can't find an entry or can't access the entry bacause of permissions
then we raise a storage interrupt.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Add missing parentheses to macro]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tlbie[l] instructions are used to invalidate TLB entries used to cache
address translations.
In ISAv3.00 (POWER9) more fields were added to the tblie[l] instructions
which were previously invalid. We don't care about any of these new fields
since we just invalidate the whole world anyway but we need to not
cause an illegal instruction exception when the instructions are called.
We also don't want to allow an older processor to have these fields set
since that would be invalid.
Add a new GEN_HANDLER for the ISAv3 instructions with the correct invalid
mask. These will only be generated to a POWER9 processor for now based on
the instruction flag. Also remove the PPC_MEM_TLBIE instruction flag from
the POWER9 processor definition to ensure the old tlbie isn't generated.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Guest Translation Shootdown Enable (GTSE) bit in the Logical Partition
Control Register (LPCR) can be set to enable a guest to use the tlbie
instruction directly to invalidate translations.
When the GTSE bit is set then the tlbie instruction is supervisor
privileged, otherwise it is hypervisor privileged.
Add a guest translation shootdown enable (gtse) field to the diassembly
context and use this to check the correct privilege level at code
generation time.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE
to choose determine how address translation is performed. Currently these
bits in the LPCR are only set for the cpu which handles the H_CALL, however
they need to be set for all cpus for that guest as address translation
cannot be performed differently on a per cpu basis.
Update the H_CALL handler to set these bits in the LPCR correctly for all
cpus of the guest.
Note it is the reponsibility of the guest to ensure that any secondary cpus
are suspended when the H_CALL is made and thus we can safely update these
values here.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS
P/Q states") added new bits to the state used by KVM IRQs. Currently,
QEMU does not preserve these bits, so migrating (or otherwise saving
and restoring) the guest state causes the P and Q bits to be cleared.
Clearing the P bit has no effect, because the kernel will set it based
on other data, but the loss of a set Q bit will cause a lost
interrupt.
This patch preserves the P and Q bits, correcting the problem.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ics_get_kvm_state() "or"s set bits into irq->status but does not mask
out clear bits.
Correct this by initializing the IRQ status to zero before adding bits
to it.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In case when atomic operation is not supported, exit_atomic is called
and we stop the world and execute the atomic operation. This results
in a following call chain:
tcg_gen_atomic_cmpxchg_tl()
-> gen_helper_exit_atomic()
-> HELPER(exit_atomic)
-> cpu_loop_exit_atomic() -> EXCP_ATOMIC
-> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC
-> cpu_exec_step_atomic()
-> cpu_step_atomic()
-> cc->cpu_exec_enter() = ppc_cpu_exec_enter()
Sets env->reserve_addr = -1;
But by the time it return back, the reservation is erased and the code
fails, this continues forever and the lock is never taken.
Instead set this in powerpc_excp()
Now that ppc_cpu_exec_enter() doesn't have anything meaningful to do,
let us get rid of the function.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Ensure that the unplugged CPU thread is destroyed and the waiting
thread is notified about it. This is needed for CPU unplug to work
correctly in MTTCG mode.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In case where the conditional write is the first write to the page,
TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as
a special case and set the dirty bit. After that fall through to the
actual atomic instruction below.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Emulating LL/SC with cmpxchg is not correct, since it can suffer from
the ABA problem. However, portable parallel code is written assuming
only cmpxchg which means that in practice this is a viable alternative.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a PowerNV guest runs, it uses the sensor definitions of
the BMC simulator to populate the device tree. But an external IPMI
BMC could also be used and, in that case, it is not (yet) possible to
retrieve the sensor list. Generating the OEM SEL event for shutdown or
reboot also does not make sense as it should be generated on the BMC
side.
This change allows a guest to use an 'ipmi-bmc-extern' backend to the
'isa-ipmi-bt' device and a 'chardev' for transport such as :
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
-device isa-ipmi-bt,bmc=bmc0,irq=10
and connect to a BMC simulator, the OpenIPMI ipmi_sim simulator for
instance.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
patchew has been saying:
ERROR: open brace '{' following struct go on the same line
Fix up acpi-defs.h to follow this rule.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit f0c9d64a exposed the issue that with a xenfv machine using
pci passthrough, acpi pci hotplug code was being executed by mistake.
Guard calls to acpi_pcihp_device_plug_cb (and corresponding
acpi_pcihp_device_unplug_cb) with a check for xen_enabled(). Without
this check I am seeing an error that the bus doesn't have the
acpi-pcihp-bsel property set.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently it's possible to crash QEMU using "-device *-iommu" and
"-machine none":
$ qemu-system-x86_64 -machine none -device amd-iommu
qemu/hw/i386/amd_iommu.c:1140:amdvi_realize: Object 0x55627dafbc90 is not an instance of type generic-pc-machine
Aborted (core dumped)
$ qemu-system-x86_64 -machine none -device intel-iommu
qemu/hw/i386/intel_iommu.c:2972:vtd_realize: Object 0x56292ec0bc90 is not an instance of type generic-pc-machine
Aborted (core dumped)
Fix amd-iommu and intel-iommu to ensure the current machine is really a
TYPE_PC_MACHINE instance at their realize methods.
Resulting error messages:
$ qemu-system-x86_64 -machine none -device amd-iommu
qemu-system-x86_64: -device amd-iommu: Machine-type 'none' not supported by amd-iommu
$ qemu-system-x86_64 -machine none -device intel-iommu
qemu-system-x86_64: -device intel-iommu: Machine-type 'none' not supported by intel-iommu
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since 2.7 commit (b2a575a Add optionrom compatible with fw_cfg DMA version)
regressed migration during firmware exection time by
abusing fwcfg.dma_enabled property to decide loading
dma version of option rom AND by mistake disabling DMA
for 2.6 and earlier globally instead of only for option rom.
so 2.6 machine type guest is broken when it already runs
firmware in DMA mode but migrated to qemu-2.7(pc-2.6)
at that time;
a) qemu-2.6:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport)
b) qemu-2.7:pc2.6 (fwcfg.dma=off,firmware=ioport,oprom=ioport)
to: a b
from
a OK FAIL
b OK OK
So we currently have broken forward migration from
qemu-2.6 to qemu-2.[789] that however could be fixed
for 2.10 by re-enabling DMA for 2.[56] machine types
and allowing dma capable option rom only since 2.7.
As result qemu should end up with:
c) qemu-2.10:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport)
to: a b c
from
a OK FAIL OK
b OK OK OK
c OK FAIL OK
where forward migration from qemu-2.6 to qemu-2.10 should
work again leaving only qemu-2.[789]:pc-2.6 broken.
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Calling libvhost-user functions like vu_queue_get_avail_bytes() when the
queue doesn't yet have addresses will result in the crashes like the
following:
Program received signal SIGSEGV, Segmentation fault.
0x000055c414112ce4 in vring_avail_idx (vq=0x55c41582fd68, vq=0x55c41582fd68)
at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:940
940 vq->shadow_avail_idx = vq->vring.avail->idx;
(gdb) p vq
$1 = (VuVirtq *) 0x55c41582fd68
(gdb) p vq->vring
$2 = {num = 0, desc = 0x0, avail = 0x0, used = 0x0, log_guest_addr = 0, flags = 0}
at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:940
No locals.
at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:960
num_heads = <optimized out>
out_bytes=out_bytes@entry=0x7fffd035d7c4, max_in_bytes=max_in_bytes@entry=0,
max_out_bytes=max_out_bytes@entry=0) at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:1034
Add a pre-condition checks on vring.avail before accessing it.
Fix documentation and return type of vu_queue_empty() while at it.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Qemu2.7~2.9 and vhost user for dpdk 17.02 release work together
to cause failures of new connection when negotiating to set MQ.
(one queue pair works well).
Because there exist some bugs in qemu code when introducing
VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. When vhost_user_set_mem_table
is invoked to deal with the vhost message VHOST_USER_SET_MEM_TABLE
for the second time, qemu indeed doesn't send the messge (The message
needs to be sent only once)but still will be waiting for dpdk's reply
ack, then, qemu is always freezing, while DPDK is always waiting for
next vhost message from qemu.
The patch aims to fix the bug, MQ can work well.
The same bug is found in function vhost_user_net_set_mtu, it is fixed
at the same time.
DPDK related patch is as following:
http://www.dpdk.org/dev/patchwork/patch/23955/
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Cc: qemu-stable@nongnu.org
Fixes: ca525ce561 ("vhost-user: Introduce a new protocol feature REPLY_ACK.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jens Freimann <jfreiman@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Our current ACPI table generation code limits the placement of ACPI
tables to 32-bit addressable memory, in order to be able to emit the
root pointer (RSDP) and root table (RSDT) using table types from the
ACPI 1.0 days.
Since ARM was not supported by ACPI before version 5.0, it makes sense
to lift this restriction. This is not crucial for mach-virt, which is
guaranteed to have some memory available below the 4 GB mark, but it
is a nice to have for QEMU machines that do not have any 32-bit
addressable memory, which is not uncommon for real world 64-bit ARM
systems.
Since we already emit a version of the RSDP root pointer that has a
secondary 64-bit wide address field for the 64-bit root table (XSDT),
all we need to do is replace the RSDT generation with the generation
of an XSDT table, and use a different slot in the FADT table to refer
to the DSDT.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
At the request of Michael, replace the leading capital X in the FADT
field name Xfacs and Xdsdt with lower case x + underscore.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
trivial patches for 2017-05-10
# gpg: Signature made Wed 10 May 2017 03:19:30 AM EDT
# gpg: using RSA key 0x701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg: aka "Michael Tokarev <mjt@corpit.ru>"
# gpg: aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59
* mjt/tags/trivial-patches-fetch: (23 commits)
tests: Remove redundant assignment
MAINTAINERS: Update paths for AioContext implementation
MAINTAINERS: Update paths for main loop
jazz_led: fix bad snprintf
tests: Ignore another built executable (test-hmp)
scripts: Switch to more portable Perl shebang
scripts/qemu-binfmt-conf.sh: Fix shell portability issue
virtfs: allow a device id to be specified in the -virtfs option
hw/core/generic-loader: Fix crash when running without CPU
virtio-blk: Remove useless condition around g_free()
qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip
use _Static_assert in QEMU_BUILD_BUG_ON
channel-file: fix wrong parameter comments
block: Make 'replication_state' an enum
util: Use g_malloc/g_free in envlist.c
qga: fix compiler warnings (clang 5)
device_tree: fix compiler warnings (clang 5)
usb-ccid: make ccid_write_data_block() cope with null buffers
tests: Ignore more test executables
Add 'none' as type for drive's if option
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Merge qcrypto 2017/05/09 v1
# gpg: Signature made Tue 09 May 2017 09:43:47 AM EDT
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* danpb/tags/pull-qcrypto-2017-05-09-1:
crypto: qcrypto_random_bytes() now works on windows w/o any other crypto libs
crypto: move 'opaque' parameter to (nearly) the end of parameter list
List SASL config file under the cryptography maintainer's realm
Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Detected by GCC 7's -Wformat-truncation. snprintf writes at most
2 bytes here including the terminating NUL, so the result is
truncated. In addition, the newline at the end is pointless.
Fix the buffer size and the format string.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit 78f86a2b7 added a new test, but forgot to exclude the built
binary from version control.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The default NetBSD package manager is pkgsrc and it installs Perl
along other third party programs under custom and configurable prefix.
The default prefix for binary prebuilt packages is /usr/pkg, and the
Perl executable lands in /usr/pkg/bin/perl.
This change switches "/usr/bin/perl" to "/usr/bin/env perl" as it's
the most portable solution that should work for almost everybody.
Perl's executable is detected automatically.
This change switches -w option passed to the executable with more
modern "use warnings;" approach. There is no functional change to the
default behavior.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Appease pkgsrc and use portable shell variable comparison.
This switches "==" to "=". It should not be a functional change.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When using a virtfs root filesystem, the mount_tag needs to be set to
/dev/root. This can be done long-hand as
-fsdev local,id=root,path=/path/to/rootfs,...
-device virtio-9p-pci,fsdev=root,mount_tag=/dev/root
but the -virtfs shortcut cannot be used as it hard-codes the device identifier
to match the mount_tag, and device identifiers may not contain '/':
$ qemu-system-x86_64 -virtfs local,path=/foo,mount_tag=/dev/root,security_model=passthrough
qemu-system-x86_64: -virtfs local,path=/foo,mount_tag=/dev/root,security_model=passthrough: duplicate fsdev id: /dev/root
To support this case using -virtfs, we allow the device identifier to be
specified explicitly when the mount_tag is not suitable:
-virtfs local,id=root,path=/path/to/rootfs,mount_tag=/dev/root,...
Signed-off-by: Chris Webb <chris@arachsys.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When running QEMU with "-M none -device loader,file=kernel.elf", it
currently crashes with a segmentation fault, because the "none"-machine
does not have any CPU by default and the generic loader code tries
to dereference s->cpu. Fix it by adding an appropriate check for a
NULL pointer.
Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Laszlo spotted and studied this wasteful "if". He pointed out:
The original virtio_blk_free_request needed an "if" as it accesses one
field, since 671ec3f056 ("virtio-blk: Convert VirtIOBlockReq.elem to
pointer", 2014-06-11); later on in f897bf751f ("virtio-blk: embed
VirtQueueElement in VirtIOBlockReq", 2014-07-09) the field became
embedded, so the "if" became unnecessary (at which point we were using
g_slice_free(), but it is the same.
Now drop it.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
There are some broken URLs in the qemu-doc which reference tools that
are not available at their original location anymore. Fortunately, they
have been mirrored to archive.org, so point to that location instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
QEMU_BUILD_BUG_ON should use C11's _Static_assert, if the compiler supports it,
to provide more readable messages on failure.
We check for _Static_assert in configure, and set CONFIG_STATIC_ASSERT
accordingly. QEMU_BUILD_BUG_ON invokes _Static_assert if CONFIG_STATIC_ASSERT
is defined, and reverts to the old way otherwise.
That way, systems without C11 conforming compiler will still have the old
messages, as verified by intentionally breaking the configure check.
the following example output was generated by inverting the condition in
QEMU_BUILD_BUG_ON:
without _Static_assert:
> In file included from /qemu/include/qemu/osdep.h:36:0,
> from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:89:12: error: negative width in bit-field ‘<anonymous>’
> struct { \
> ^
> /qemu/include/qemu/compiler.h:96:38: note: in expansion of macro QEMU_BUILD_BUG_ON_STRUCT’
> #define QEMU_BUILD_BUG_ON(x) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
> ^~~~~~~~~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
> QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
> atomic_load_acquire(ptr)
> ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
> bool finished = atomic_mb_read(&gei->finished);
> ^~~~~~~~~~~~~~
with _Static_assert:
> In file included from /qemu/include/qemu/osdep.h:36:0,
> from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:94:30: error: static assertion failed: "not expecting: sizeof(*&gei->finished) > sizeof(void *)"
> #define QEMU_BUILD_BUG_ON(x) _Static_assert(!(x), #x)
> ^
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
> QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
> atomic_load_acquire(ptr)
> ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
> bool finished = atomic_mb_read(&gei->finished);
> ^~~~~~~~~~~~~~
Signed-off-by: Andreas Grapentin <andreas@grapentin.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If no crypto library is included in the build, QEMU uses
qcrypto_random_bytes() to generate random data. That function tried to open
/dev/urandom or /dev/random and if opening both files failed it errored out.
Those files obviously do not exist on windows, so there the code uses
CryptGenRandom().
Furthermore there was some refactoring and a new function
qcrypto_random_init() was introduced. If a proper crypto library (gnutls or
libgcrypt) is included in the build, this function does nothing. If neither
is included it initializes the (platform specific) handles that are used by
qcrypto_random_bytes().
Either:
* a handle to /dev/urandom | /dev/random on unix like systems
* a handle to a cryptographic service provider on windows
Signed-off-by: Geert Martin Ijewski <gm.ijewski@web.de>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Previous commit moved 'opaque' to be the 2nd parameter in the list:
commit 375092332e
Author: Fam Zheng <famz@redhat.com>
Date: Fri Apr 21 20:27:02 2017 +0800
crypto: Make errp the last parameter of functions
Move opaque to 2nd instead of the 2nd to last, so that compilers help
check with the conversion.
this puts it back to the 2nd to last position.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
No one is listed as maintainer for qemu.sasl. It is used by the
VNC server for SASL auth, but since it is cryptography related,
list it under the crytography maintainer's realm, rather than
under the UI maintainer.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
RFC 6331 documents a number of serious security weaknesses in
the SASL DIGEST-MD5 mechanism. As such, QEMU should not be
using or recommending it as a default mechanism for VNC auth
with SASL.
GSSAPI (Kerberos) is the only other viable SASL mechanism that
can provide secure session encryption so enable that by defalt
as the replacement. If users have TLS enabled for VNC, they can
optionally decide to use SCRAM-SHA-1 instead of GSSAPI, allowing
plain username and password auth.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use the existing readline history function we are utilizing
to provide persistent command history across instances of qmp-shell.
This assists entering debug commands across sessions that may be
interrupted by QEMU sessions terminating, where the qmp-shell has
to be relaunched.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20170427223628.20893-1-jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
SocketAddressLegacy is a simple union, and simple unions are awkward:
they have their variant members wrapped in a "data" object on the
wire, and require additional indirections in C. SocketAddress is the
equivalent flat union. Convert all users of SocketAddressLegacy to
SocketAddress, except for existing external interfaces.
See also commit fce5d53..9445673 and 85a82e8..c5f1ae3.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Minor editing accident fixed, commit message and a comment tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The next commit will rename SocketAddressFlat to SocketAddress, and
the commit after that will replace most uses of SocketAddressLegacy by
SocketAddress, replacing most of this commit's renames right back.
Note that checkpatch emits a few "line over 80 characters" warnings.
The long lines are all temporary; the SocketAddressLegacy replacement
will shorten them again.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
I'm going to flatten SocketAddress: rename SocketAddress to
SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate
SocketAddressLegacy except in external interfaces.
inet_parse() returns a newly allocated InetSocketAddress. Lift the
allocation from inet_parse() into its caller socket_parse() to prepare
for flattening SocketAddress.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Straightforward rebase]
I'm going to flatten SocketAddress: rename SocketAddress to
SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate
SocketAddressLegacy except in external interfaces.
vsock_parse() returns a newly allocated VsockSocketAddress. Lift the
allocation from vsock_parse() into its caller socket_parse() to
prepare for flattening SocketAddress.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Commit 62c39b3 introduced test-qga, and at face value, appears
to be testing the 'guest-sync' behavior that is recommended for
guests in sending 0xff to QGA to force the parser to reset. But
this aspect of the test has never actually done anything: the
qmp_fd() call chain converts its string argument into QObject,
then converts that QObject back to the actual string that is
sent over the wire - and the conversion process silently drops
the 0xff byte from the string sent to QGA, thus never resetting
the QGA parser.
An upcoming patch will get rid of the wasteful round trip
through QObject, at which point the string in test-qga will be
directly sent over the wire.
But fixing qmp_fd() to actually send 0xff over the wire is not
all we have to do - the actual QMP parser loudly complains that
0xff is not valid JSON, and sends an error message _prior_ to
actually parsing the 'guest-sync' or 'guest-sync-delimited'
command. With 'guest-sync', we cannot easily tell if this error
message is a result of our command - which is WHY we invented
the 'guest-sync-delimited' command. So for the testsuite, fix
things to only check 0xff behavior on 'guest-sync-delimited',
and to loop until we've consumed all garbage prior to the
requested delimiter, which is compatible with the documented actions
that a real QGA client is supposed to do.
Ideally, we'd fix the QGA JSON parser to silently ignore 0xff
rather than sending an error message back, at which point we
could enhance this test for 'guest-sync' as well as for
'guest-sync-delimited'. But for the sake of this patch, our
testing of 'guest-sync' is no worse than it was pre-patch,
because we have never been sending 0xff over the wire in the
first place.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-11-eblake@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[Additional comment squashed in, along with matching commit message
update]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use the preferred blockdev-change-medium command instead.
Also, use of 'device' is deprecated; adding an explicit id on
the command line lets us use 'id' for both blockdev-change-medium
and eject.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-10-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Noticed while checking Coccinelle results. Naming a label 'out:'
when it is only used on error paths is weird. Also, we had some
dead stores to 'ret'. Meanwhile we know that snapshot_options
is NULL on success and that QDECREF(NULL) is safe. So merge the
two exit paths into one by careful control over bs_snapshot.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-8-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We now have macros in place to make it less verbose to add a scalar
to QDict and QList, so use them.
Patch created mechanically via:
spatch --sp-file scripts/coccinelle/qobject.cocci \
--macro-file scripts/cocci-macro-file.h --dir . --in-place
then touched up manually to fix a couple of '?:' back to original
spacing, as well as avoiding a long line in monitor.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-7-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Rather than making lots of callers wrap a scalar in a QInt, QString,
or QBool, provide helper macros that do the wrapping automatically.
Update the Coccinelle script to make mass conversions easy, although
the conversion itself will be done as a separate patches to ease
review and backport efforts.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-6-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We have macros in place to make it less verbose to add a subtype
of QObject to both QDict and QList. While we have made cleanups
like this in the past (see commit fcfcd8ffc, for example), having
it be automated by Coccinelle makes it easier to maintain.
Patch created mechanically via:
spatch --sp-file scripts/coccinelle/qobject.cocci \
--macro-file scripts/cocci-macro-file.h --dir . --in-place
then I verified that no manual touchups were required.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We have macros in place to make it less verbose to add a subtype
of QObject to both QDict and QList. While we have made cleanups
like this in the past (see commit fcfcd8ffc, for example), having
it be automated by Coccinelle makes it easier to maintain.
The script is separate from the cleanups, for ease of review and
backporting. A later patch will then add further possible cleanups.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-4-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
It's simpler to just use a C struct than it is to bundle things
into a QDict in one function just to pull them back out in the
caller. Plus, doing this gets rid of one more user of dynamic
JSON through qobject_from_jsonf(), as well as a memory leak of
the QDict.
While cleaning the code, fix things to report all errors (the
code was previously silently ignoring a failure of
pcie_aer_inject_error(), at a distance).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-2-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
A large set of small patches. I have not included yet vhost-user-scsi,
but it'll come in the next pull request.
* use GDB XML register description for x86
* use _Static_assert in QEMU_BUILD_BUG_ON
* add "R:" to MAINTAINERS and get_maintainers
* checkpatch improvements
* dump threading fixes
* first part of vhost-user-scsi support
* QemuMutex tracing
* vmw_pvscsi and megasas fixes
* sgabios module update
* use Rev3 (ACPI 2.0) FADT
* deprecate -hdachs
* improve -accel documentation
* hax fix
* qemu-char GSource bugfix
# gpg: Signature made Fri 05 May 2017 06:10:40 AM EDT
# gpg: using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* bonzini/tags/for-upstream: (21 commits)
vhost-scsi: create a vhost-scsi-common abstraction
libvhost-user: replace vasprintf() to fix build
get_maintainer: add subsystem to reviewer output
get_maintainer: --r (list reviewer) is on by default
get_maintainer: it's '--pattern-depth', not '-pattern-depth'
get_maintainer: Teach get_maintainer.pl about the new "R:" tag
MAINTAINERS: Add "R:" tag for self-appointed reviewers
Fix the -accel parameter and the documentation for 'hax'
dump: Acquire BQL around vm_start() in dump thread
hax: Fix memory mapping de-duplication logic
checkpatch: Disallow glib asserts in main code
trace: add qemu mutex lock and unlock trace events
vmw_pvscsi: check message ring page count at initialisation
sgabios: update for "fix wrong video attrs for int 10h,ah==13h"
scsi: avoid an off-by-one error in megasas_mmio_write
vl: deprecate the "-hdachs" option
use _Static_assert in QEMU_BUILD_BUG_ON
target/i386: Add GDB XML register description support
char: Fix removing wrong GSource that be found by fd_in_tag
hw/i386: Build-time assertion on pc/q35 reset register being identical.
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Updating MAINTAINERS to set Pavel Dovgalyuk as record/replay maintainer
and Paolo Bonzini as a reviewer.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-id: 20170503113304.8704.13997.stgit@PASHA-ISP
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The UST trace backend can only cope with upto 10 arguments. To ensure we
don't exceed the limit when UST is not compiled in, disallow more than
10 arguments upfront.
This prevents the case where:
commit 0fc8aec7de
Author: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Date: Tue Apr 18 10:20:20 2017 +0800
COLO-compare: Optimize tcp compare trace event
Optimize two trace events as one, adjust print format make
it easy to read. rename trace_colo_compare_pkt_info_src/dst
to trace_colo_compare_tcp_info.
regressed the fix done in
commit 2dfe5113b1
Author: Alex Bennée <alex.bennee@linaro.org>
Date: Fri Oct 28 14:25:59 2016 +0100
net: split colo_compare_pkt_info into two trace events
It seems there is a limit to the number of arguments a UST trace event
can take and at 11 the previous trace command broke the build. Split the
trace into a src pkt and dst pkt trace to fix this.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20161028132559.8324-1-alex.bennee@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now we get an immediate fail even when UST is disabled:
GEN net/trace.h
Traceback (most recent call last):
File "/home/berrange/src/virt/qemu/scripts/tracetool.py", line 154, in <module>
main(sys.argv)
File "/home/berrange/src/virt/qemu/scripts/tracetool.py", line 145, in main
events.extend(tracetool.read_events(fh))
File "/home/berrange/src/virt/qemu/scripts/tracetool/__init__.py", line 307, in read_events
event = Event.build(line)
File "/home/berrange/src/virt/qemu/scripts/tracetool/__init__.py", line 244, in build
event = Event(name, props, fmt, args)
File "/home/berrange/src/virt/qemu/scripts/tracetool/__init__.py", line 196, in __init__
"argument count" % name)
ValueError: Event 'colo_compare_tcp_info' has more than maximum permitted argument count
Makefile:96: recipe for target 'net/trace.h-timestamp' failed
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170426153900.21066-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- decode escape sequences
- decompress run-length encoding escape sequences
- report command parsing problems to output when debug output is enabled
- reject packet checksums that are not valid hex digits
- compute the checksum based on the packet stream, not based on the
decoded packet
Tested with GDB and QtCreator integrated debugger on SMP QEMU instance.
Works for me.
Signed-off-by: Doug Gale <doug16k@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
BDRVReplicationState.replication_state is a name with a bit of
duplication, plus it could be an enum like BDRVReplicationState.mode,
which is more readable and also more straightforward in a debugger.
Rename it, and improve the type while at it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Change malloc/strdup/free to g_malloc/g_strdup/g_free in
util/envlist.c.
Remove NULL checks for pointers returned from g_malloc and g_strdup
as they exit in case of failure. Also, update calls to envlist_create
to reflect this.
Free array and array contents returned by envlist_to_environ using
g_free in bsd-user/main.c and linux-user/main.c.
Update comments to reflect change in semantics.
Signed-off-by: Saurav Sachidanand <sauravsachidanand@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Ignore test executables when building in-tree:
test-arm-mptimer introduced in commit 882fac3
test-crypto-hmac introduced in commit 4fd460b
test-aio-multithread introduced in commit 0c330a7
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The tb_env variable is set two lines above. So just drop the double assignment.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Basic support for using channel-attached 3270 'green-screen'
devices via tn3270. Actual handling of the data stream is
delegated to x3270; more info at http://wiki.qemu.org/Features/3270
# gpg: Signature made Thu 04 May 2017 11:36:51 AM BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* cohuck/tags/s390x-3270-20170504:
s390x/3270: Mark non-migratable and enable the device
s390x/3270: Detect for continued presence of a 3270 client
s390x/3270: Add the TCP socket events handler for 3270
s390x/3270: 3270 data stream handling
s390x/3270: Add emulated terminal3270 device
s390x/3270: Add abstract emulated ccw-attached 3270 device
s390x/css: Add an algorithm to find a free chpid
chardev: Basic support for TN3270
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
migration/next for 20170504
# gpg: Signature made Thu 04 May 2017 10:35:41 AM BST
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg: aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* quintela/tags/migration/20170504:
migration: Extra tracing
migration: Move postcopy-ram.h to migration/
monitor: Move hmp_info_snapshots from savevm.c to hmp.c
monitor: Move hmp_delvm from savevm.c to hmp.c
monitor: Move hmp_savevm from savevm.c to hmp.c
monitor: Move hmp_loadvm from monitor.c to hmp.c
monitor: Remove monitor parameter from save_vmstate
migration: to_dst_file at that point is NULL
migration: setup bi-directional I/O channel for exec: protocol
ram: Split dirty bitmap by RAMBlock
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Openrisc Features and Fixes for qemu 2.10
# gpg: Signature made Thu 04 May 2017 01:41:45 AM BST
# gpg: using RSA key 0xC3B31C2D5E6627E4
# gpg: Good signature from "Stafford Horne <shorne@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: D9C4 7354 AEF8 6C10 3A25 EFF1 C3B3 1C2D 5E66 27E4
* shorne/tags/pull-or-20170504:
target/openrisc: Support non-busy idle state using PMR SPR
target/openrisc: Remove duplicate features property
target/openrisc: Implement full vmstate serialization
migration: Add VMSTATE_STRUCT_2DARRAY()
target/openrisc: implement shadow registers
migration: Add VMSTATE_UINTTL_2DARRAY()
target/openrisc: add numcores and coreid support
target/openrisc: Fixes for memory debugging
target/openrisc: Implement EPH bit
target/openrisc: Implement EVBAR register
MAINTAINERS: Add myself as openrisc maintainer
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
More s390x patches, this time boot related:
- LOADPARM machine property, exposed to the guest via SCLP and
diagnose 308
- Use LOADPARM in the s390-ccw bios to select a boot entry
- Fix a crash in the ipl device code when a virtio-scsi-pci device
has been specified
# gpg: Signature made Tue 02 May 2017 02:29:26 PM BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* cohuck/tags/s390x-20170502:
hw/s390x/ipl: Fix crash with virtio-scsi-pci device
pc-bios/s390-ccw.img: update image
pc-bios/s390-ccw: add boot entry selection to El Torito routine
pc-bios/s390-ccw: add boot entry selection for ECKD DASD
pc-bios/s390-ccw: provide entry selection on LOADPARM for SCSI disk
pc-bios/s390-ccw: provide a function to interpret LOADPARM value
pc-bios/s390-ccw: get LOADPARM stored in SCP Read Info
pc-bios/s390-ccw: Make ebcdic/ascii conversion public
util/qemu-config: Add loadparm to qemu machine_opts
hw/s390x/sclp: update LOADPARM in SCP Info
hw/s390x/ipl: enable LOADPARM in IPIB for a boot device
hw/s390x: provide loadparm property for the machine
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In order to introduce a new vhost-user-scsi host device type, it makes
sense to abstract part of vhost-scsi into a common parent class. This
commit does exactly that.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-3-git-send-email-felipe@nutanix.com>
On gcc 3.4 and newer, simply using (void) in front of WUR functions is
not sufficient to ignore the return value. That prevents a build when
handling warnings as errors.
libvhost-user had a usage of (void)vasprintf() which triggered such a
condition. This fixes it by replacing this call with g_strdup_vprintf()
which aborts on OOM.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-2-git-send-email-felipe@nutanix.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewer output currently does not include the subsystem
that matched. Add it.
Miscellanea:
o Add a get_subsystem_name routine to centralize this
Cherry picked from Linux commit 2a7cb1dc82fc2a52e747b4c496c13f6575fb1790.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We don't consistenly document the default value next to the option
listing, but we do have a list of defaults here, so let's keep it up to
date.
Cherry picked from Linux commit 4f07510df2e8c47fd65b8ffaaf6c5d334d59d598.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Though it appears that Perl's GetOptions will take either, the latter is
not documented in the options listing.
Cherry picked from Linux commit cc7ff0ef6eca3deeea4a424ca47a67c8450d5424.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can now designate reviewers in the MAINTAINERS file with the new
"R:" tag, so this commit teaches get_maintainers.pl to add their
email addresses.
Cherry picked from Linux commit c1c3f2c906e35bcb6e4cdf5b8e077660fead14fe,
with fixes to avoid \C as in QEMU commit ba10f729f1 ("get_maintainer.pl:
\C is deprecated", 2015-09-25).
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some people are not content with the amount of mail they get, and would
like to be CCed on patches for areas they do not maintain. Let them
satisfy their own appetite for qemu-devel messages.
Seriously: the purpose here is a bit different from the Linux kernel.
While Linux uses "R" to designate non-maintainers for reviewing patches
in a given area, in QEMU I would also like to use "R" so that people can
delegate sending pull requests while keeping some degree of oversight.
Based on Linux commit eafbaac3093760d1fd3b2a5b9f016362dd68af36.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes an assertion failure in the following backtrace:
__GI___assert_fail
memory_region_transaction_commit
memory_region_add_eventfd
virtio_pci_ioeventfd_assign
virtio_bus_set_host_notifier
virtio_blk_data_plane_start
virtio_bus_start_ioeventfd
virtio_vmstate_change
vm_state_notify
vm_prepare_start
vm_start
dump_cleanup
dump_process
dump_thread
start_thread
clone
vm_start need BQL, acquire it if doing cleaning up from main thread.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170503072819.14462-1-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hax_update_mapping() avoids unnecessary and potentially expensive
calls to HAX_VM_IOCTL_SET_RAM by computing the net result (i.e.
effective mapping changes) of each MemoryRegion transaction, with
the help of a linked list of HAXMapping objects.
However, when processing a new mapping that overlaps with an
existing mapping in the list, it fails to handle the case where the
start address of the new mapping is above that of the existing
mapping in the guest physical address space. This happens when QEMU
is launched with "-machine q35 -enable-hax", which involves the
following MemoryRegion transaction for digging the VGA hole:
region_del: 0x00000000->0x08000000 VA 05fa0000 ('pc.ram')
region_add: 0x00000000->0x000a0000 VA 05fa0000 ('pc.ram')
region_add: 0x000a0000->0x000c0000 VA 00000000 ('vga-lowmem')
region_add: 0x000c0000->0x08000000 VA 06060000 ('pc.ram')
where the third MemoryRegion is MMIO and is ignored. The current
de-duplication logic handles the last MemoryRegion incorrectly and
produces the following result:
hax_mapping_dump_list updates:
+ 0x000c0000->0x08000000 VA 0x06060000
- 0x07fe0000->0x08000000 VA 0x0df80000
which is why VGA emulation does not work for Q35.
With this patch, one can see VGA output as Q35 boots up. Note that
Q35 support also requires a change to HAXM kernel module, which is
not available in the current HAXM release (6.1.2).
+ Add a warning if the input MemoryRegion is a ROM device, which is
not supported by HAXM kernel module at this time.
Signed-off-by: Yu Ning <yu.ning@linux.intel.com>
Message-Id: <20170428072723.7036-1-yu.ning@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Glib commit a6a875068779 (from 2013) made many of the glib assert
macros non-fatal if a flag is set.
This causes two problems:
a) Compilers moan that your code is unsafe even though you've
put an assert in before the point of use.
b) Someone evil could, in a library, call
g_test_set_nonfatal_assertions() and cause our assertions in
important places not to fail and potentially allow memory overruns.
Ban most of the glib assertion functions (basically everything except
g_assert and g_assert_not_reached) except in tests/
This makes checkpatch gives an error such as:
ERROR: Use g_assert or g_assert_not_reached
#77: FILE: vl.c:4725:
+ g_assert_cmpstr("Chocolate", >, "Cheese");
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170427165526.19836-1-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These trace events were very useful to help me to understand and find a
reordering issue in vfio, for example:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2020c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
that also helped me to see the desired result after the fix:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2000c, 4)
vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
So it could be a good idea to have these traces implemented. It's worth
mentioning that they should be surgically enabled during the debugging,
otherwise it can flood the trace logs with lock/unlock messages.
How to use it:
trace-event qemu_mutex_lock on|off
trace-event qemu_mutex_unlock on|off
or
trace-event qemu_mutex* on|off
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Message-Id: <1493054398-26013-1-git-send-email-joserz@linux.vnet.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
[Also handle trylock, cond_wait and win32; trace "unlocked" while still
in the critical section, so that "unlocked" always comes before the
next "locked" tracepoint. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the user needs to specify the disk geometry, the corresponding
parameters of the "-device ide-hd" option should be used instead.
"-hdachs" is considered as deprecated and might be removed soon.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493270454-1448-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU_BUILD_BUG_ON should use C11's _Static_assert, if the compiler supports it,
to provide more readable messages on failure.
We check for _Static_assert in configure, and set CONFIG_STATIC_ASSERT
accordingly. QEMU_BUILD_BUG_ON invokes _Static_assert if CONFIG_STATIC_ASSERT
is defined, and reverts to the old way otherwise.
That way, systems without C11 conforming compiler will still have the old
messages, as verified by intentionally breaking the configure check.
the following example output was generated by inverting the condition in
QEMU_BUILD_BUG_ON:
without _Static_assert:
> In file included from /qemu/include/qemu/osdep.h:36:0,
> from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:89:12: error: negative width in bit-field ‘<anonymous>’
> struct { \
> ^
> /qemu/include/qemu/compiler.h:96:38: note: in expansion of macro QEMU_BUILD_BUG_ON_STRUCT’
> #define QEMU_BUILD_BUG_ON(x) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
> ^~~~~~~~~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
> QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
> atomic_load_acquire(ptr)
> ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
> bool finished = atomic_mb_read(&gei->finished);
> ^~~~~~~~~~~~~~
with _Static_assert:
> In file included from /qemu/include/qemu/osdep.h:36:0,
> from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:94:30: error: static assertion failed: "not expecting: sizeof(*&gei->finished) > sizeof(void *)"
> #define QEMU_BUILD_BUG_ON(x) _Static_assert((x), #x)
> ^
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
> QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
> atomic_load_acquire(ptr)
> ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
> bool finished = atomic_mb_read(&gei->finished);
> ^~~~~~~~~~~~~~
Signed-off-by: Andreas Grapentin <andreas@grapentin.org>
Message-Id: <20170314165953.18506-1-andreas@grapentin.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch implements XML target description support for X86 and X86-64
architectures in the GDB stub, as the way with ARM and PowerPC:
- gdb-xml/32bit-core.xml & gdb-xml/64bit-core.xml: Adding the XML target
description files, these files are picked from GDB source code.
- configure: Define gdb_xml_files for X86 targets.
- target/i386/cpu.c: Define gdb_core_xml_file and gdb_arch_name to add
XML awareness for this architecture, modify the gdb_num_core_regs to
fit the registers number defined in each XML file.
Signed-off-by: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
Message-Id: <2b3c8119-1602-28c7-eab4-296593877103@lauterbach.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Coverity warns that multiplying two 32-bit values gives a 32-bit result which
is assigned to a 64-bit variable. Add an explicit ram_addr_t cast to silence
the warning.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Commit ee72bed0 "tcx: remove primitives for non-32-bit surfaces" accidentally
left a trailing break in update_palette_entries() causing the palette update
routine to exit after just one iteration. Remove it.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Block layer patches
# gpg: Signature made Fri 28 Apr 2017 09:20:17 PM BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* kwolf/tags/for-upstream: (34 commits)
progress: Show current progress on SIGINFO
iotests: fix exclusion option
iotests: clarify help text
qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed
qemu-img: improve convert_iteration_sectors()
block: assert no image modification under BDRV_O_INACTIVE
block: fix obvious coding style mistakes in block_int.h
qcow2: Allow discard of final unaligned cluster
block: Add .bdrv_truncate() error messages
block: Add errp to BD.bdrv_truncate()
block: Add errp to b{lk,drv}_truncate()
block/vhdx: Make vhdx_create() always set errp
qemu-img: Document backing options
qemu-img/convert: Move bs_n > 1 && -B check down
qemu-img/convert: Use @opts for one thing only
block: fix alignment calculations in bdrv_co_do_zero_pwritev
block: Do not unref bs->file on error in BD's open
iotests: 109: Filter out "len" of failed jobs
iotests: Fix typo in 026
Issue a deprecation warning if the user specifies the "-hdachs" option.
...
Message-id: 1493411622-5343-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of flushing the buffer byte by byte, call qemu_chr_be_write()
with as much byte possible accepted by the front-end.
Factor out buffer flushing in a common function udp_chr_flush_buffer().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This is mainly useful to know the actual bound port when using port 0.
For example, when starting qemu with socket on port 0, before:
QEMU waiting for connection on: disconnected:tcp:localhost:0,server
After:
QEMU waiting for connection on: disconnected:tcp:localhost:32454,server
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This helper will be used in yet another place in the following patch.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The list is now empty, the chardev cleanup is taken care of by the unref
of the root container.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
qemu_chardev_new() now uses object_new_with_props() with /chardevs
parent container. It will fail to insert the object if the same "id"
already exists. "chardevs" list usage has been removed in previous
commits.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Use object_resolve_path_component() and object_child_foreach() on
/chardevs container instead of iterating over chardevs list.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a /chardevs container object to hold the list of chardevs.
(Note: QTAILQ chardevs is going away in the following commits)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
object_property_add_child() references the child, unref it after to
avoid ref leaks.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
mux_chr_event() already send events to all backends, rename it,
export it, and use it from muxes_realize_done. This should help abstract
away mux implementation.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
A couple more traces that would have made fixing that postcopy
bug a bit easier.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is internal to migration, not intended for other users.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Mark 3270 as non-migratable for the experimental stage. Enable
the 3270 device so that we can use x3270 client to operate the guest.
Run qemu with the arguments:
-chardev socket,id=char3270_0,host=0.0.0.0,port=23,nowait,server,tn3270 \
-device x-terminal3270,chardev=char3270_0,devno=fe.0.000a,id=terminal3270_0 \
There are some restrictions for the first stage: We don't support SSL
connections, multiple client connections and client resizing. Only
tested with the x3270 client.
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
To ensure that we do not keep any 3270 sockets where the client is not
connected anymore, we send a packet with the timing mark option after
ten minutes of client inactivity. If the client does not answer it,
then the socket will be closed automatically.
This helps to ensure that there is no half-open situation on the 3270
socket.
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This introduces the input and output handlers for 3270 device, setting
up the data tunnel among guest kernel, qemu and the 3270 client.
After the client connected and TN3270 handshake done, signal the not-ready
to ready status by an unsolicited device-end interrupt, and then the 3270
data stream could be handled correctly between the channel and socket.
Multiple commands generated by "Reset" key on x3270 are not supported now,
just simply terminate the connection.
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This is a basic implementation of the emulated ccw-attached 3270
called x-terminal3270, which provides visibility of the device in
the qemu monitor and guest. The x prefix indicates that this is
just an experimental implementation for the current stage. This
device will not be compiled until the basic functions are available.
Signed-off-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This introduces a function named css_find_free_chpid() to find a
free channel path. Because virtio-ccw device used zero as its
channel path number, it would be sensible to skip the reserved one
and search upwards.
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This introduces basic support for TN3270, which needs to negotiate
three Telnet options during handshake:
- End of Record
- Binary Transmission
- Terminal-Type
As a basic implementation, this simply ignores NOP and Interrupt
Process(IP) commands. More work should be done for them later.
For more details, please refer to RFC 854 and 1576.
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
It is a monitor command, and has nothing migration specific in it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are going to move the rest of hmp snapshots functions there instead
of monitor.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
load_vmstate() already use error_report, so be consistent. There is
an identical error message in load_vmstate() that ends in a
period. Remove it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We have just arrived as:
migration.c: qemu_migrate()
....
s = migrate_init() <- puts it to NULL
....
{tcp,unix}_start_outgoing_migration ->
socket_outgoing_migration
migration_channel_connect()
sets to_dst_file
if tls is enabled, we do another round through
migrate_channel_tls_connect(), but we only set it up if there is no
error. So we don't need the assignation. I am removing it to remove
in the follwing patches the knowledge about MigrationState in that two
files.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Historically the migration data channel has only needed to be
unidirectional. Thus the 'exec:' protocol was requesting an
I/O channel with O_RDONLY on incoming side, and O_WRONLY on
the outgoing side.
This is fine for classic migration, but if you then try to run
TLS over it, this fails because the TLS handshake requires a
bi-directional channel.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Both the ram bitmap and the unsent bitmap are split by RAMBlock.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang)
The OpenRISC architecture has the Power Management Register (PMR)
special purpose register to manage cpu power states. The interesting
modes are:
* Doze Mode (DME) - Stop cpu except timer & pic - wake on interrupt
* Sleep Mode (SME) - Stop cpu and all units - wake on interrupt
* Suspend Model (SUME) - Stop cpu and all units - wake on reset
The linux kernel will set DME when idle.
This patch implements the PMR SPR and halts the qemu cpu when there is a
change to DME or SME. This means that openrisc qemu in no longer peggs
a host cpu at 100%.
In order for this to work we need to kick the CPU when timers are
expired. Update the cpu timer to kick the cpu upon each timer event.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stafford Horne <shorne@gmail.com>
The features property has stored the exact same thing as the cpucfgr
spr. Remove the feature enum and property as it is not needed.
In order to preserve the behavior or keeping features accross reset this
patch moves cpucfgr into the non reset region of the state struct. Since
the cpucfgr is read only this means we only need to sset cpucfgr once
during class init.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Previously serialization did not persist the tlb, timer, pic and other
key state items. This meant snapshotting and restoring a running os
would crash. After adding these I am able to take snapshots of a
running linux os and restore at a later time.
I am currently not trying to maintain capatibility with older versions
as I do not believe this really worked before or anyone used it.
Signed-off-by: Stafford Horne <shorne@gmail.com>
For openrisc we implement tlb state as a 2d array of tlb entry structs.
This is added to allow easy storing of state of 2d arrays.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Shadow registers are part of the openrisc spec along with sr[cid], as
part of the fast context switching feature. When exceptions occur,
instead of having to save registers to the stack if enabled the CID will
increment and a new set of registers will be available.
This patch only implements shadow registers which can be used as extra
scratch registers via the mfspr and mtspr if required. This is
implemented in a way where it would be easy to add on the fast context
switching, currently cid is hardcoded to 0.
This is need for openrisc linux smp kernels to boot correctly.
Signed-off-by: Stafford Horne <shorne@gmail.com>
In openRISC we are implementing the shadow registers as a 2d array.
Using this target long method rather than direct 32-bit alternatives is
consistent with the rest of our vm state serialization logic.
Signed-off-by: Stafford Horne <shorne@gmail.com>
When debugging in gdb you might want to inspect instructions in mapped
pages or in exception vectors like 0x800 etc. This was previously not
possible in qemu since the *get_phys_page_debug() routine only looked
into the data tlb.
Change to fall back to look into instruction tlb and plain physical
pages.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stafford Horne <shorne@gmail.com>
When the "No host device provided" error occurs, the hint message
that starts with "Use -vfio-pci," makes no sense, since "-vfio-pci"
is not a valid command line parameter.
Correct this by replacing "-vfio-pci" with "-device vfio-pci".
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch enables 8-byte writes and reads to VFIO. Such implemention
is already done but it's missing the 'case' to handle such accesses in
both vfio_region_write and vfio_region_read and the MemoryRegionOps:
impl.max_access_size and impl.min_access_size.
After this patch, 8-byte writes such as:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x4140c, 4)
vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
goes like this:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0xbfd0008, 8)
qemu_mutex_unlock unlocked mutex 0x10905ad8
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Sets valid.max_access_size and valid.min_access_size to ensure safe
8-byte accesses to vfio. Today, 8-byte accesses are broken into pairs
of 4-byte calls that goes unprotected:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2020c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
which occasionally leads to:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2030c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x1000c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
causing strange errors in guest OS. With this patch, such accesses
are protected by the same lock guard:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2000c, 4)
vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
This happens because the 8-byte write should be broken into 4-byte
writes by memory.c:access_with_adjusted_size() in order to be under
the same lock. Today, it's done in exec.c:address_space_write_continue()
which was able to handle only 4 bytes due to a zero'ed
valid.max_access_size (see exec.c:memory_access_size()).
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When driving QEMU from the outside, we have basically no chance to
determine how quickly the guest OS picks up key events, so we usually
have to limit ourselves to very slow keyboard presses to make sure
the guest always has enough chance to pick them up.
This patch adds a trace events when the keyboarde queue is drained.
An external driver can use that as hint that new keys can be pressed.
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1490883775-94658-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
qemu_input_event_send() discards key event when the guest is paused,
but not the delay.
The delay ends up in the input queue, and qemu_input_event_send_key()
will further fill the queue with upcoming events.
VNC uses qemu_input_event_send_key_delay(), not SPICE, which results
in a different input behaviour on pause: VNC will queue the events
(except the first that is discarded), SPICE will discard all events.
Don't queue delay if paused, and provide same behaviour on SPICE and
VNC clients on resume (and potentially avoid over-allocating the
buffer queue)
Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1444326
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20170425130520.31819-1-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This updates the FADT generated for x86/64 machine types from Revision 1 to 3. (Based on ACPI standard 2.0 instead of 1.0) The intention is to expose the reset register information to guest operating systems which require it, specifically OS X/macOS. Revision 1 FADTs do not contain the fields relating to the reset register.
The new layout and contents remains backwards-compatible with operating systems which only support ACPI 1.0, as the existing fields are not modified by this change, as the 64-bit and 32-bit variants are allowed to co-exist according to the ACPI 2.0 standard. No regressions became apparent in tests with a range of Windows (XP-10) and Linux versions.
The BIOS tables test suite's FADT checksum test has also been updated to reflect the new FADT layout and content.
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Message-Id: <1489558827-28971-2-git-send-email-phil@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-system-s390x currently crashes when it is started with a
virtio-scsi-pci device, e.g.:
qemu-system-s390x -nographic -enable-kvm -device virtio-scsi-pci \
-drive file=/tmp/disk.dat,if=none,id=d1,format=raw \
-device scsi-cd,drive=d1,bootindex=1
The problem is that the code in s390_gen_initial_iplb() currently assumes
that all SCSI devices are also CCW devices, which is not the case for
virtio-scsi-pci of course. Fix it by adding an appropriate check for
TYPE_CCW_DEVICE here.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1493126327-13162-1-git-send-email-thuth@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Contains the following commits:
- pc-bios/s390-ccw: Make ebcdic/ascii conversion public
- pc-bios/s390-ccw: get LOADPARM stored in SCP Read Info
- pc-bios/s390-ccw: provide a function to interpret LOADPARM value
- pc-bios/s390-ccw: provide entry selection on LOADPARM for SCSI disk
- pc-bios/s390-ccw: add boot entry selection for ECKD DASD
- pc-bios/s390-ccw: add boot entry selection to El Torito routine
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
1. change a bit definition of ScsiMbr to allow an array of pointers
2. add loadparm fetch to boot script processing
3. apply loadparm index to boot entry selection, if any
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Fix SCSI bootmap interpreter to make use of any specified entry of the
Program Table using the leftmost numeric value from the LOADPARM, if specified.
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The LOADPARM value is fetched from SCP Read Info, but it's applied
only at the phase of bootmap interpretation. So let's read the LOARPARM
value and store it. Also provide a parsing function to detect numbers in
the LOADPARM which can be used during bootmap interpretation.
Remove a stray whitespace.
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Obtain the loadparm value stored in SCP Read Info by performing
a SCLP Read Info request.
Rename sclp-ascii.c to sclp.c to reflect the changed scope of
the file.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Make the ebcdic_to_ascii function public to the rest of the
"bios" code, as the volume label is no more the single thing
to be converted.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Add S390CcwMachineState machine parameter "loadparm" to qemu machine_opts so
libvirt can query for it.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
LOADPARM has two copies:
1. in SCP Information Block
2. in IPL Information Parameter Block
So, update SCLP intrinsics now. We always store LOADPARM in SCP
information block even if we don't have a valid IPL Information
Parameter Block.
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Insert the LOADPARM value to the IPL Information Parameter Block.
An IPL Information Parameter Block is created when "bootindex" is
specified for a device. If a user specifies "loadparm=", then we
store the loadparm value in the created IPIB for that boot device.
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
In order to specify the LOADPARM value one may now add ",loadparm=xxx"
parameter to the "-machine s390-ccw-virtio" option.
The property setter will normalize and check the value provided much
like the way the HMC does.
The value is stored, but not used at the moment.
Initial patch from Eugene (jno) Dvurechenski.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Working up the stack, this replaces the slirp_socket_load/save
with VMState definitions.
A place holder for IPv6 support is added as a comment; it needs
testing once the rest of the IPv6 code is there.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
The socket structure has a pair of unions for lhost and fhost
addresses; the unions are identical so split them out into
a separate union declaration.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Convert the sbuf structure to a VMStateDescription.
Note this uses the VMSTATE_WITH_TMP mechanism to calculate
and reload the offsets based on the pointers.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Convert the migration of the struct tcpcb to use a VMStateDescription,
the rest of it will come later.
Mostly mechanical, except for conversion of some 'char' to uint8_t
to ensure portability.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
ASAN detects an "unknown-crash" when running pxe-test:
/ppc64/pxe/spapr-vlan: =================================================================
==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0
READ of size 128 at 0x7f6dcd298d30 thread T2
#0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73
#1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289
#2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446
#3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82
#4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67
Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame
#0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13
This frame has 3 object(s):
[32, 48) '<unknown>'
[96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable
[160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable
The sockaddr_storage pointer is the sockaddr_in6 lhost on the
stack. Copy only the source addr size.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
gcc 7 (on fedora 26) objects to many of the snprintf's
in the smb path and command creation because it can't
figure out that the smb_dir (i.e. the /tmp dir for the configuration)
is known to be short.
Replace all these fixed length buffers by g_str* functions that dynamically
allocate and use g_dir_make_tmp to make the directory.
(It's fairly new glib but we have a compat function for it).
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
The OS will allocate automatically a free port. This is useful if you
want to be sure to not get any port conflict. You still have to figure
out which port you got, for example with "lsof" (this could be exposed
in the monitor if needed).
Example of use:
$ qemu-system-x86_64 -net user,hostfwd=127.0.0.1:0-:22 ...
Then, get your port with:
$ lsof -np 1474 | grep LISTEN
qemu-syst 31777 bernat 12u IPv4 [...] TCP 127.0.0.1:35145 (LISTEN)
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Block patches for the block queue
# gpg: Signature made Fri Apr 28 20:50:48 2017 CEST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-04-28:
progress: Show current progress on SIGINFO
iotests: fix exclusion option
iotests: clarify help text
qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed
qemu-img: improve convert_iteration_sectors()
block: assert no image modification under BDRV_O_INACTIVE
block: fix obvious coding style mistakes in block_int.h
qcow2: Allow discard of final unaligned cluster
block: Add .bdrv_truncate() error messages
block: Add errp to BD.bdrv_truncate()
block: Add errp to b{lk,drv}_truncate()
block/vhdx: Make vhdx_create() always set errp
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since commit "c53eeaf75a04 configure: eliminate Python dependency for
--help", configure --help fails to produce the list of available trace
backends if invoked out-of-tree. It also spits the following error:
grep: scripts/tracetool/backend/*.py: No such file or directory
This patch simply adds the missing $source_path to fix it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-id: 149321376763.7874.12797658801011614451.stgit@bahia
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu-ga patch queue
* new commands: guest-get-timezone, guest-get-users, guest-get-host-name
* fix hang on w32 when stopping qemu-ga service while fs frozen
* fix missing setting of can-offline in guest-get-vcpus
* make qemu-ga VSS w32 service on-demand rather than on-startup
* fix unecessary errors to EventLog on w32
* improvements to fsfreeze documentation
v2:
* document 'zone' field of guest-get-timezone as informational-only
(Daniel, Eric)
* fix build error for glib < 2.32 (Peter)
# gpg: Signature made Thu 27 Apr 2017 06:43:42 AM BST
# gpg: using RSA key 0x3353C9CEF108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg: aka "Michael Roth <mdroth@utexas.edu>"
# gpg: aka "Michael Roth <mdroth@linux.vnet.ibm.com>"
# Primary key fingerprint: CEAC C9E1 5534 EBAB B82D 3FA0 3353 C9CE F108 B584
* mdroth/tags/qga-pull-2017-04-25-v2-tag:
qga: Add `guest-get-timezone` command
qga: Add 'guest-get-users' command
qga: improve fsfreeze documentations
qga: Add 'guest-get-host-name' command
qga-win: Fix Event Viewer errors caused by qemu-ga
qga-win: Fix a bug where qemu-ga service is stuck during stop operation
qga-win: Enable 'can-offline' field in 'guest-get-vcpus' reply
qemu-ga: Make QGA VSS provider service run only when needed
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As mentioned in commit 0c1bd46, we ignored requests to
discard the trailing cluster of an unaligned image. While
discard is an advisory operation from the guest standpoint,
(and we are therefore free to ignore any request), our
qcow2 implementation exploits the fact that a discarded
cluster reads back as 0. As long as we discard on cluster
boundaries, we are fine; but that means we could observe
non-zero data leaked at the tail of an unaligned image.
Enhance iotest 66 to cover this case, and fix the implementation
to honor a discard request on the final partial cluster.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170407013709.18440-1-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add missing error messages for the block driver implementations of
.bdrv_truncate(); drop the generic one from block.c's bdrv_truncate().
Since one of these changes touches a mis-indented block in
block/file-posix.c, this patch fixes that coding style issue along the
way.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-5-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error parameter to the block drivers' bdrv_truncate() interface.
If a block driver does not set this in case of an error, the generic
bdrv_truncate() implementation will do so.
Where it is obvious, this patch also makes some block drivers set this
value.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-4-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch makes vhdx_create() always set errp in case of an error. It
also adds errp parameters to vhdx_create_bat() and
vhdx_create_new_region_table() so we can pass on the error object
generated by blk_truncate() as of a future commit.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20170328205129.15138-2-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The create and convert subcommands have shorthands to set the
backing_file and, in the case of create, the backing_fmt options for the
new image. However, they have not been documented so far, which is
remedied by this patch.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It does not make much sense to use a backing image for the target when
you concatenate multiple images (because then there is no correspondence
between the source images' backing files and the target's); but it was
still possible to give one by using -o backing_file=X instead of -B X.
Fix this by moving the check.
(Also, change the error message because -B is not the only way to
specify the backing file, evidently.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After storing the creation options for the new image into @opts, we
fetch some things for our own information, like the backing file name,
or whether to use encryption or preallocation.
With the -n parameter, there will not be any creation options; this is
not too bad because this just means that querying a NULL @opts will
always return the default value.
However, we also use @opts for the --object options. Therefore, @opts is
not necessarily NULL if -n was specified; instead, it may contain those
options. In practice, this probably does not cause any problems because
there most likely is no object that supports any of the parameters we
query here, but this is neither something we should rely on nor does
this variable reuse make the code very nice to read.
Therefore, just use a separate variable for the --object options.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
tail_padding_bytes is calculated wrong. F.e. for
offset = 0
bytes = 2048
align = 512
we will have tail_padding_bytes = 512 which is definitely wrong. The patch
fixes that arithmetics.
Fortunately this problem is harmless, we will have 1 extra allocation and
free thus there is no need to put this into stable. The problem is here
from the very beginning.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer takes care of removing the bs->file child if the block
driver's bdrv_open()/bdrv_file_open() implementation fails. The block
driver therefore does not need to do so, and indeed should not unless it
sets bs->file to NULL afterwards -- because if this is not done, the
bdrv_unref_child() in bdrv_open_inherit() will dereference the freed
memory block at bs->file afterwards, which is not good.
We can now decide whether to add a "bs->file = NULL;" after each of the
offending bdrv_unref_child() invocations, or just drop them altogether.
The latter is simpler, so let's do that.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mirror calculates job len from current I/O progress:
s->common.len = s->common.offset +
(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
The final "len" of a failed mirror job in iotests 109 depends on the
subtle timing of the completion of read and write issued in the first
mirror iteration. The second iteration may or may not have run when the
I/O error happens, resulting in non-deterministic output of the
BLOCK_JOB_COMPLETED event text.
Similar to what was done in a752e4786, filter out the field to make the
test robust.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the user needs to specify the disk geometry, the corresponding
parameters of the "-device ide-hd" option should be used instead.
"-hdachs" is considered as deprecated and might be removed soon.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no reason for the qemu-nbd server used for tests not to accept
an arbitrary number of clients. In fact, test 181 will require it to
accept two clients at the same time (and thus it fails before this
patch).
This patch updates common.rc to launch qemu-nbd with -e 42 which should
be enough for all of our current and future tests.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reported by Coverity. We already use bs in bdrv_inc_in_flight before
checking for NULL. It is unnecessary as all callers pass non-NULL bs, so
drop it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We test for the presence of perl and bc and save their path in the
variables PERL_PROG and BC_PROG, but never actually make use of them.
Remove the checks and assignments so qemu-iotests can run even when
bc isn't installed.
Reported-by: Yash Mankad <ymankad@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reproducer:
$ ./qemu-img info ''
qemu-img: ./block.c:1008: bdrv_open_driver: Assertion
`!drv->bdrv_needs_filename || bs->filename[0]' failed.
[1] 26105 abort (core dumped) ./qemu-img info ''
This patch fixes this to be:
$ ./qemu-img info ''
qemu-img: Could not open '': The 'file' block driver requires a file
name
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only thing the escape characters achieve is making the reference
output unreadable and lines that are potentially so long that git
doesn't want to put them into an email any more. Let's filter them out.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Commit d35ff5e6 ('block: Ignore guest dev permissions during incoming
migration') added blk_resume_after_migration() to the precopy migration
path, but neglected to add it to the duplicated code that is used for
postcopy migration. This means that the guest device doesn't request the
necessary permissions, which ultimately led to failing assertions.
Add the missing blk_resume_after_migration() to the postcopy path.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
img_convert has been around before there was an ImgConvertState or
a block backend, but it has never been modified to directly use
these structs. Change this by parsing parameters directly into
the ImgConvertState and directly use BlockBackend where possible.
Furthermore variable initialization has been reworked and sorted.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reverts commit e3e0003a8f.
This commit was necessary for the 2.9 release because we were unable to
fix the underlying issue(s) in time. However, we will be for 2.10.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_name() is not modifying data passed to it through pointer and it
returns also a pointer to const so the argument can be made const for
code safeness.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Adds a new command `guest-get-timezone` reporting the currently
configured timezone on the system. The information on what timezone is
currently is configured is useful in case of Windows VMs where the
offset of the hardware clock is required to have the same offset. This
can be used for management systems like `oVirt` to detect the timezone
difference and warn administrators of the misconfiguration.
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
Reviewed-by: Sameeh Jubran <sameeh@daynix.com>
Tested-by: Sameeh Jubran <sameeh@daynix.com>
* moved stub implementation to end of function for consistency
* document that timezone names are for informational use only.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
A command that will list all currently logged in users, and the time
since when they are logged in.
Examples:
virsh # qemu-agent-command F25 '{ "execute": "guest-get-users" }'
{"return":[{"login-time":1490622289.903835,"user":"root"}]}
virsh # qemu-agent-command Win2k12r2 '{ "execute": "guest-get-users" }'
{"return":[{"login-time":1490351044.670552,"domain":"LADIDA",
"user":"Administrator"}]}
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
* make g_hash_table_contains compat func inline to avoid
unused warnings
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Retrieving the guest host name is a very useful feature for virtual management
systems. This information can help to have more user friendly VM access
details, instead of an IP there would be the host name. Also the host name
reported can be used to have automated checks for valid SSL certificates.
virsh # qemu-agent-command F25 '{ "execute": "guest-get-host-name" }'
{"return":{"host-name":"F25.lab.evilissimo.net"}}
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
* minor whitespace fix-ups
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When the command "guest-fsfreeze-freeze" is executed it causes
the VSS service to log the error below in the Event Viewer. This
error is caused by an issue in the function "CommitSnapshots" in
provider.cpp:
* When VSS_TIMEOUT_MSEC expires the funtion returns E_ABORT. This causes
the error #12293.
|event id| error |
* 12293 : Volume Shadow Copy Service error: Error calling a routine on a
Shadow Copy Provider {00000000-0000-0000-0000-000000000000}.
Routine details CommitSnapshots [hr = 0x80004004, Operation
aborted.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
After triggering a freeze command without any following thaw command,
qemu-ga will not respond to stop operation. This behaviour is wanted on Linux
as there is no time limit for a freeze command and we want to prevent
quitting in the middle of freeze, on the other hand on Windows the time
limit for freeze is 10 seconds, so we should wait for the timeout, thaw
the file system and quit.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The QGA schema states:
@can-offline: Whether offlining the VCPU is possible. This member
is always filled in by the guest agent when the structure
is returned, and always ignored on input (hence it can be
omitted then).
Currently 'can-offline' is missing entirely from the reply. This causes
errors in libvirt which is expecting the reply to be compliant with the
schema docs.
BZ#1438735: https://bugzilla.redhat.com/show_bug.cgi?id=1438735
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the service runs in background on boot even though it is not
needed and once it is running it never stops. The service needs to be
running only during freeze operation and it should be stopped after
executing thaw.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize
the output. Even though this code is dead, it gets translated, and
without the initialization we encounter a tcg_error.
Reported-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This reverts commit 0fc8aec7de.
In commit 2dfe5113b1 we split a trace event with a lot of arguments
in two, because the UST trace backend has a limit on the number
of arguments you can have in a single trace event. Unfortunately
we subsequently forgot about this, and in commit 0fc8aec7de
we merged the two trace events again, recreating the "UST backend
doesn't build" bug.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
HMP pull, with tcg fix
# gpg: Signature made Wed 26 Apr 2017 14:55:30 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-hmp-20170426:
tests: Add a tester for HMP commands
libqtest: Add a generic function to run a callback function for every machine
libqtest: Ignore QMP events when parsing the response for HMP commands
monitor: Check whether TCG is enabled before running the "info jit" code
hmp: gpa2hva and gpa2hpa hostaddr command
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
HMP commands do not get any automatic testing yet, so on certain
QEMU machines, some HMP commands were causing crashes in the past.
Thus we should test HMP commands in our test suite, too, to avoid
that such problems creep in again in the future.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493097407-20482-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Some tests need to run single tests for every available machine of the
current QEMU binary. To avoid code duplication, let's extract this
code that deals with 'query-machines' into a separate function.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1490860207-8302-3-git-send-email-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When running certain HMP commands (like "device_del") via QMP, we
can sometimes get a QMP event in the response first, so that the
"g_assert(ret)" statement in qtest_hmp() triggers and the test
fails. Fix this by ignoring such QMP events while looking for the
real return value from QMP.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1490860207-8302-2-git-send-email-thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added note to qtest_hmp/qtest_hmpv's header description to say
it discards events
The "info jit" command currently aborts on Mac OS X with the message
"qemu_mutex_lock: Invalid argument" when running with "-M accel=qtest".
We should only call into the TCG code here if TCG has really been
enabled and initialized.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493179907-22516-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
ppc patch queue 2017-04-26
Here's a respind of my first pull request for qemu-2.10, consisting of
assorted patches which have accumulated while qemu-2.9 stabilized.
Highlights are:
* Rework / cleanup of the XICS interrupt controller
* Substantial improvement to the 'powernv' machine type
- Includes an MMIO XICS version
* POWER9 support improvements
- POWER9 guests with KVM
- Partial support for POWER9 guests with TCG
* IOMMU and VFIO improvements
* Assorted minor changes
There are several IPMI patches here that aren't usually in my area of
maintenance, but there isn't a regular maintainer and these patches
are for the benefit of the powernv machine type.
This pull request supersedes my 2017-04-26 pull request. This new set
fixes a bug in one of the aforementioned IPMI patches which caused
clang sanitizer failures (and may have crashed on some libc / host
versions).
# gpg: Signature made Wed 26 Apr 2017 07:58:10 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170426: (48 commits)
MAINTAINERS: Remove myself from e500
target/ppc: Style fixes
e500,book3s: mfspr 259: Register mapped/aliased SPRG3 user read
target/ppc: Flush TLB on write to PIDR
spapr-cpu-core: Release ICPState object during CPU unrealization
ppc/pnv: generate an OEM SEL event on shutdown
ppc/pnv: add initial IPMI sensors for the BMC simulator
ppc/pnv: populate device tree for IPMI BT devices
ppc/pnv: populate device tree for serial devices
ppc/pnv: populate device tree for RTC devices
ppc/pnv: scan ISA bus to populate device tree
ppc/pnv: enable only one LPC bus
ppc/pnv: Add support for POWER8+ LPC Controller
spapr: remove the 'nr_servers' field from the machine
target/ppc: Fix size of struct PPCElfPrstatus
ipmi: introduce an ipmi_bmc_gen_event() API
ipmi: introduce an ipmi_bmc_sdr_find() API
ipmi: provide support for FRUs
ipmi: use a file to load SDRs
ppc: add IPMI support
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Xen 2017/04/21 + fix
# gpg: Signature made Tue 25 Apr 2017 19:10:37 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg: aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20170421-v2-tag: (21 commits)
move xen-mapcache.c to hw/i386/xen/
move xen-hvm.c to hw/i386/xen/
move xen-common.c to hw/xen/
add xen-9p-backend to MAINTAINERS under Xen
xen/9pfs: build and register Xen 9pfs backend
xen/9pfs: send responses back to the frontend
xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal
xen/9pfs: receive requests from the frontend
xen/9pfs: connect to the frontend
xen/9pfs: introduce Xen 9pfs backend
9p: introduce a type for the 9p header
xen: import ring.h from xen
configure: use pkg-config for obtaining xen version
xen: additionally restrict xenforeignmemory operations
xen: use libxendevice model to restrict operations
xen: use 5 digit xen versions
xen: use libxendevicemodel when available
configure: detect presence of libxendevicemodel
xen: create wrappers for all other uses of xc_hvm_XXX() functions
xen: rename xen_modified_memory() to xen_hvm_modified_memory()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
I recently left Freescale/NXP, and even before that it'd been a few years
since I was actively involved in KVM/QEMU work.
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes a small step fixing one of many style problems that exist in
the older ppc code. This removes spaces between function (or macro) name
and the following '('.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch registers mfspr 259 for Book3S and e500 family cores
following this research:
mfspr 259 provides read-only mapped user access to SPRG3(SPR 275) according to:
- PowerISA 2.02, Book III (documents implementation starting with POWER4+ @ p20)
- IBM PowerPC 970MP RISC Microprocessor User's Manual v2.1, page 48
- Amit Singh: "Mac OS X Internals: A Systems Approach" on 970 and 970FX cores:
He demonstrates mfspr 259 reading TLS data from Mac OS X on G5 on page 588
- NXP documents it in the Core Reference Manuals of: e500, e500mc and e5500
- getcpu() of the 32 & 64-bit Book3S Linux vDSOs use it to read the core number
mfspr 259 does not appear to be implemented in these cores according to:
- 74xx series: MPC7410/MPC7400 and MPC7450 RISC Microprocessor Reference Manuals
- 4xx series: PPC440 Processor User's Manual, Revision 1.09 by AMCC
- 750 series: IBM PowerPC 750CL RISC Microprocessor User's Manual
- e200 series: e200z4 Power Architectureâ Core Reference Manual
Implementation: gen_spr_usprg3() is called from init_proc_book3s_common()
(covers the 970 and POWER cores) and init_proc_e500() (covers the e500 family)
to register spr_read_ureg() in the same way which it already provides
the mapped SPR access for SPR_USPRG4-7 in gen_spr_usprgh() for cores
which have the same read-only mapped SPRG register access for SPRG4-7.
Verified using Linux by pinning a thread to a core and checking sched_getcpu()
using qemu-system-ppc64 -M pseries -cpu POWER8 using MTTCG on a x86_64 host.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
Reviewed-by: Stefan Resch <stefan.resch@thalesgroup.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PIDR (process id register) is used to store the id of the currently
running process, which is used to select the process table entry used to
perform address translation. This means that when we write to this register
all the translations in the TLB become outdated as they are for a
previously running process. Thus when this register is written to we need
to invalidate the TLB entries to ensure stale entries aren't used to
to perform translation for the new process, which would result in at best
segfaults or alternatively just random memory being accessed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Fixed compile error for 32-bit targets]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Recent commits that re-organized ICPState object missed to destroy
the object when CPU is unrealized. Fix this so that CPU unplug
doesn't abort QEMU.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenPOWER systems expect to be notified with such an event before a
shutdown or a reboot. An OEM SEL message is sent with specific
identifiers and a user data containing the request : OFF or REBOOT.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Skiboot, the firmware for the PowerNV platform, expects the BMC to
provide some specific IPMI sensors. These sensors are exposed in the
device tree and their values are updated by the firmware at boot time.
Sensors of interest are :
"FW Boot Progress"
"Boot Count"
As such a device is defined on the command line, we can only detect
its presence at reset time.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is an empty shell that we will use to include nodes in the device
tree for ISA devices. We expect RTC, UART and IPMI BT devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The default LPC bus of a multichip system is on chip 0. It's
recognized by the firmware (skiboot) using a "primary" property in the
device tree.
We introduce a pnv_chip_lpc_offset() routine to locate the LPC node of
a chip and set the property directly from the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It adds the Naples chip which supports proper LPC interrupts via the
LPC controller rather than via an external CPLD.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- ported on latest PowerNV patchset
- moved the IRQ handler in pnv_lpc.c
- introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().
This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
gdb refuses to parse QEMU memory dumps because struct PPCElfPrstatus
is the wrong size. Fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Fixes: e62fbc54d4 ("target-ppc: dump-guest-memory support")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will be used to fill the message buffer with custom events expected
by some systems. Typically, an Open PowerNV platform guest is notified
with an OEM SEL message before a shutdown or a reboot.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch exposes a new IPMI routine to query a sdr entry from the
sdr table maintained by the IPMI BMC simulator. The API is very
similar to the internal sdr_find_entry() routine and should be used
the same way to query one or all sdrs.
A typical use would be to loop on the sdrs to build nodes of a device
tree.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch provides a simple FRU support for the BMC simulator. FRUs
are loaded from a file which name is specified in the object
properties, each entry having a fixed size, also specified in the
properties. If the file is unknown or not accessible for some reason,
a unique entry of 1024 bytes is created as a default. Just enough to
start some simulation.
These commands complies with the IPMI spec : "34. FRU Inventory Device
Commands".
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
[dwg: Folded in subsequent fix to handle NULL filename]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The IPMI BMC simulator populates the sdr/sensor tables with a minimal
set of entries (Watchdog). But some qemu platforms might want to use
extra entries for their custom needs.
This patch modifies slighty the initializing routine to take into
account a larger set read from a file. The name of the file to use is
defined through a new 'sdr' property of the simulator device.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenPOWER systems use a BT device to communicate with the BMC.
Provide support for it.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OCC is an on-chip microcontroller based on a ppc405 core used
for various power management tasks. It comes with a pile of additional
hardware sitting on the PIB (aka XSCOM bus). At this point we don't
emulate it (nor plan to do so). However there is one facility which
is provided by the surrounding hardware that we do need, which is the
interrupt generation facility. OPAL uses it to send itself interrupts
under some circumstances and there are other uses around the corner.
So this implement just enough to support this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- changed the XSCOM interface to fit new model
- QOMified the model ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Service Interface (PSI) Controller is one of the engines
of the "Bridge" unit which connects the different interfaces to the
Power Processor.
This adds just enough of the PSI bridge to handle various on-chip and
the one external interrupt. The rest of PSI has to do with the link to
the IBM FSP service processor which we don't plan to emulate (not used
on OpenPower machines).
The ics_get() and ics_resend() handlers of the XICSFabric interface of
the PowerNV machine are now defined to handle the Interrupt Control
Source of PSI. The InterruptStatsProvider interface is also modified
to dump the new ICS.
Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.
This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.
The device tree is populated accordingly.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each thread of a core is linked to an ICP. This allocates a PnvICPState
object before the PowerPCCPU object is realized and lets the XICSFabric
do the store under the 'intc' backlink when xics_cpu_setup() is
called.
This modeling removes the need of maintaining an array of ICP objects
under the PowerNV machine and also simplifies the XICSFabric icp_get()
handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A XICSFabric QOM interface is used by the XICS layer to manipulate the
ICP and ICS objects. Let's define the associated handlers for the
PowerNV machine. All handlers should be defined even if there is no
ICS under the PowerNV machine yet.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This provides a new ICPState object for the PowerNV machine (POWER8).
Access to the Interrupt Management area is done though a memory
region. It contains the registers of the Interrupt Control Presenters
of each thread which are used to accept, return, forward interrupts in
the system.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.
This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.
This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the second step to abstract the IRQ 'server' number of the
XICS layer. Now that the prereq cleanups have been done in the
previous patch, we can move down the 'cpu_dt_id' to 'cpu_index'
mapping in the sPAPR machine handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICPState array of the sPAPR machine is indexed with
'cpu_index' of the CPUState. This numbering of CPUs is internal to
QEMU and the guest only knows about what is exposed in the device
tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.
To provide a more generic XICS layer, we need to abstract the IRQ
'server' number and remove any assumption made on its nature. It
should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
and xics_cpu_destroy() do.
To reach that goal, we choose to introduce a generic 'intc' backlink
under PowerPCCPU, and let the machine core init routine do the
ICPState lookup. The resulting object is passed on to xics_cpu_setup()
which does the store under PowerPCCPU. The IRQ 'server' number in XICS
is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR'
number.
This also has the benefit of simplifying the sPAPR hcall routines
which do not need to do any ICPState lookups anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ibm,processor-radix-AP-encodings device tree property of the cpu node
is used to specify the radix mode supported page sizes of the processor
to the guest os. Contained in the top 3 bits of the msb is the actual
page size (AP) encoding associated with the corresponding radix mode
supported page size. Add this property for a TCG guest, note the TCG code
is capable of translating any format so just add the 4 default page sizes.
The ibm,processor-radix-AP-encodings device tree property is defined as:
One to n cells in ascending order of radix mode supported page sizes
encoded as BE ints (32bit on ppc) in the form:
0bxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
- 0bxxx -> AP encoding
- 0byyyyyyyyyyyyyyyyyyyyyyyyyyyyy -> supported page size encoded as a shift
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
in-kernel acceleration of TCE handling won't be enabled and performance
might be slower than expected.
This prints a warning if system page size is not enabled. This should
print a warning if huge pages are enabled but sphb.pgsz still uses
the default value of 4K|64K.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables in-kernel handling of H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls. The host kernel support is there since v4.6,
in particular d3695aa4f452
("KVM: PPC: Add support for multiple-TCE hcalls").
H_PUT_TCE is already accelerated and does not need any special enablement.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.
Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.
Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done. QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.
Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.
If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.
ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In the next patch, spapr_fixup_cpu_dt() will need to call
spapr_populate_pa_features() so move it's definition up without making
any other changes.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.
Provide the implementation of this H_CALL for a guest.
We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of the new in memory tables introduced in ISAv3.00 for translation,
also referred to as process tables, requires the introduction of 3 new
H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID.
Add shells for each of these and register them as the hypercall handlers.
Currently they all log an unimplemented hypercall and return H_FUNCTION.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Query and cache the value of two new KVM capabilities that indicate
KVM's support for new radix and hash modes of the MMU.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU
information from KVM and present the page encodings in the device tree
under ibm,processor-radix-AP-encodings. This provides page size
information to the guest which is necessary for it to use radix mode.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Compile fix for 32-bit targets, style nit fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM_CAP_SPAPR_TCE capability allows creating TCE tables in KVM which
allows having in-kernel acceleration for H_PUT_TCE_xxx hypercalls.
However it only supports 32bit DMA windows at zero bus offset.
There is a new KVM_CAP_SPAPR_TCE_64 capability which supports 64bit
window size, variable page size and bus offset.
This makes use of the new capability. The kernel headers are already
updated as the kernel support went in to v4.6.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The devices that are derived from TYPE_PNV_CHIP currently show up
as "uncategorized" devices in the help text of "-device ?". Since
they obviously are related to the CPU, let's put them into the
CPU category instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold
the RTC object. Overall, these changes remove an unnecessary and
implicit dependency on SysBus.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power8 hosts it is currently theoretically possible for QEMU/KVM-HV guests
to receive a ibm,pa-features property indicating that HTM support is available
when it is not. The situation would occur if the platform firmware of
a Power8 host cleared the HTM bit of the ibm,pa-features property.
QEMU would query KVM for the availability of HTM, which will return no
support, but workaround code in kvm_arch_init_vcpu() would then
re-enable it because KVM_HV is in use and the processor is P8.
This patch adjusts the workaround in kvm_arch_init_vcpu() so that it does not
enable HTM (in the above case) unless the host kernel indicates to the QEMU
process, via the auxiliary vector, that userspace can use HTM (via the HWCAP2
bit KVM_FEATURE2_HTM).
The reason to use the value from the auxiliary vector is that it is
set based only on what the host kernel found in the ibm,pa-features
HTM bit at boot time.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Once a request is completed, xen_9pfs_push_and_notify gets called. In
xen_9pfs_push_and_notify, update the indexes (data has already been
copied to the sg by the common code) and send a notification to the
frontend.
Schedule the bottom-half to check if we already have any other requests
pending.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Upon receiving an event channel notification from the frontend, schedule
the bottom half. From the bottom half, read one request from the ring,
create a pdu and call pdu_submit to handle it.
For now, only handle one request per ring at a time.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Write the limits of the backend to xenstore. Connect to the frontend.
Upon connection, allocate the rings according to the protocol
specification.
Initialize a QEMUBH to schedule work upon receiving an event channel
notification from the frontend.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
# gpg: Signature made Tue 25 Apr 2017 12:22:03 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
COLO-compare: Optimize tcp compare trace event
COLO-compare: Optimize tcp compare for option field
slirp: add a fake NC-SI backend
aspeed: add a FTGMAC100 nic
net/ftgmac100: add a 'aspeed' property
net: add FTGMAC100 support
hw/net: add MII definitions
colo-compare: Fix old packet check bug.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
s390_virtio_hypercall can trigger IO events and interrupts, most notably
when using virtio-ccw devices.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Fixes: 278f5e98c6 ("s390x/misc_helper.c: wrap IO instructions in BQL")
Signed-off-by: Alexander Graf <agraf@suse.de>
According to "CPU Signaling and Response", "Signal-Processor Orders",
the order field is bit position 56-63. Without this, the Linux
guest kernel is sometimes unable to stop emulation and enters
an infinite loop of "XXX unknown sigp: 0xffffffff00000005".
Signed-off-by: Philipp Kern <phil@philkern.de>
Reviewed-by: Thomas Huth <thuth@tuxfamily.org>
[agraf: add comment according to email]
Signed-off-by: Alexander Graf <agraf@suse.de>
Optimize two trace events as one, adjust print format make
it easy to read. rename trace_colo_compare_pkt_info_src/dst
to trace_colo_compare_tcp_info.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In this patch we support packet that have tcp options field.
Add tcp options field check, If the packet have options
field we just skip it and compare tcp payload,
Avoid unnecessary checkpoint, optimize performance.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
NC-SI (Network Controller Sideband Interface) enables a BMC to manage
a set of NICs on a system. This model takes the simplest approach and
reverses the NC-SI packets to pretend a NIC is present and exercise
the Linux driver.
The NCSI header file <ncsi-pkt.h> comes from mainline Linux and was
untabified.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There is a second NIC but we do not use it for the moment. We use the
'aspeed' property to tune the definition of the end of ring buffer bit
for the Aspeed SoCs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The Aspeed SoCs have a different definition of the end of the ring
buffer bit. Add a property to specify which set of bits should be used
by the NIC.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Exynos4210 has four SD/MMC controllers supporting:
- SD Standard Host Specification Version 2.0,
- MMC Specification Version 4.3,
- SDIO Card Specification Version 2.0,
- DMA and ADMA.
Add emulation of SDHCI devices which allows accessing storage through SD
cards. Differences from real hardware:
- Devices are shipped with eMMC memory, not SD card.
- The Exynos4210 SDHCI has few more registers, e.g. for
controlling the clocks, additional status (0x80, 0x84, 0x8c). These
are not implemented.
Testing on smdkc210 machine with "-drive file=FILE,if=sd,bus=0,index=2".
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170422190709.8676-1-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Mon 24 Apr 2017 20:18:05 BST
# gpg: using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg: aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg: aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057
* remotes/cody/tags/block-pull-request:
qemu-iotests: _cleanup_qemu must be called on exit
block/rbd: Add support for reopen()
block/rbd - update variable names to more apt names
block: use bdrv_can_set_read_only() during reopen
block: introduce bdrv_can_set_read_only()
block: code movement
block: honor BDRV_O_ALLOW_RDWR when clearing bs->read_only
block: do not set BDS read_only if copy_on_read enabled
block: add bdrv_set_read_only() helper function
qemu-iotests: exclude vxhs from image creation via protocol
block/vxhs.c: Add qemu-iotests for new block device type "vxhs"
block/vxhs.c: Add support for a new block device type called "vxhs"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For the tests that use the common.qemu functions for running a QEMU
process, _cleanup_qemu must be called in the exit function.
If it is not, if the qemu process aborts, then not all of the droppings
are cleaned up (e.g. pidfile, fifos).
This updates those tests that did not have a cleanup in qemu-iotests.
(I swapped spaces for tabs in test 102 as well)
Reported-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: d59c2f6ad6c1da8b9b3c7f357c94a7122ccfc55a.1492544096.git.jcody@redhat.com
Update 'clientname' to be 'user', which tracks better with both
the QAPI and rados variable naming.
Update 'name' to be 'image_name', as it indicates the rbd image.
Naming it 'image' would have been ideal, but we are using that for
the rados_image_t value returned by rbd_open().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: b7ec1fb2e1cf36f9b6911631447a5b0422590b7d.1491597120.git.jcody@redhat.com
A few block drivers will set the BDS read_only flag from their
.bdrv_open() function. This means the bs->read_only flag could
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
flag check occurs prior to the call to bdrv->bdrv_open().
This adds an error return to bdrv_set_read_only(), and an error will be
return if we try to set the BDS to read_only while copy_on_read is
enabled.
This patch also changes the behavior of vvfat. Before, vvfat could
override the drive 'readonly' flag with its own, internal 'rw' flag.
For instance, this -drive parameter would result in a writable image:
"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"
This is not correct. Now, attempting to use the above -drive parameter
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com
The protocol VXHS does not support image creation. Some tests expect
to be able to create images through the protocol. Exclude VXHS from
these tests.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Source code for the qnio library that this code loads can be downloaded from:
https://github.com/VeritasHyperScale/libqnio.git
Sample command line using JSON syntax:
./x86_64-softmmu/qemu-system-x86_64 -name instance-00000008 -S -vnc 0.0.0.0:0
-k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
-msg timestamp=on
'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
"server":{"host":"172.172.17.4","port":"9999"}}'
Sample command line using URI syntax:
qemu-img convert -f raw -O raw -n
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
vxhs://192.168.0.1:9999/c6718f6b-0401-441d-a8c3-1f0064d75ee0
Sample command line using TLS credentials (run in secure mode):
./qemu-io --object
tls-creds-x509,id=tls0,dir=/etc/pki/qemu/vxhs,endpoint=client -c 'read
-v 66000 2.5k' 'json:{"server.host": "127.0.0.1", "server.port": "9999",
"vdisk-id": "/test.raw", "driver": "vxhs", "tls-creds":"tls0"}'
[Jeff: Modified trace-events with the correct string formatting]
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 1491277689-24949-2-git-send-email-Ashish.Mittal@veritas.com
Error reporting patches for 2017-04-24
# gpg: Signature made Mon 24 Apr 2017 08:16:34 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-error-2017-04-24:
error: Apply error_propagate_null.cocci again
qga: Make errp the last parameter of qga_vss_fsfreeze
migration: Make errp the last parameter of local functions
scsi: Make errp the last parameter of virtio_scsi_common_realize
fdc: Make errp the last parameter of fdctrl_connect_drives
nfs: Make errp the last parameter of nfs_client_open
block: Make errp the last parameter of commit_active_start
mirror: Make errp the last parameter of mirror_start_job
crypto: Make errp the last parameter of functions
block: Make errp the last parameter of bdrv_img_create
socket: Make errp the last parameter of vsock_connect_saddr
socket: Make errp the last parameter of unix_connect_saddr
socket: Make errp the last parameter of inet_connect_saddr
socket: Make errp the last parameter of socket_connect
util/error: Fix leak in error_vprepend()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is to allow clients to initialise these without failing as long
as no 2D engine function is called that would use the written value.
Saved values are not used yet (may get used when more of 2D engine is
added sometimes) and clients normally only write to most of these
registers, nothing is known to ever read them but they are documented
as read/write so also implement read for these.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 80adf8e4d084ec6cc30d149f8e8215debb67314a.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Rework HWC handling to simplify it and fix cursor not updating on
screen as needed. Previously cursor was not updated because checking
for changes in a line overrode the update flag set for the cursor but
fixing this is not enough because the cursor should also be updated if
its shape or location changes. Introduce hwc_invalidate() function to
handle that similar to other display controller models.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 6970a5e9868b7246656c1d02038dc5d5fa369507.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We only emulate the sysbus device in its default LE mode and PCI is LE
as well so specify this for registers and framebuffer memory.
Note that though the Linux kernel driver has code which claims to
handle both big and little endian, it is obviously bogus for 16 bit
and cannot be trusted as a source of information on the framebuffer
pixel format. This is our best guess about device behaviour based on
the specs and testing with MorphOS that is known to work on real HW.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 8b9605a569f8bf54074e15903620b18cd9967c89.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu-sparc update
# gpg: Signature made Fri 21 Apr 2017 20:09:35 BST
# gpg: using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed:
tcx: switch to load_image_mr() and remove prom_addr hack
tcx: use tcx_set_dirty() for accelerated ops
tcx: remove primitives for non-32-bit surfaces
tcx: remove TARGET_PAGE_SIZE from tcx24_update_display()
tcx: remove TARGET_PAGE_SIZE from tcx_update_display()
tcx: remove page24 and cpage from tcx24_update_display()
tcx: alter tcx24_reset_dirty() to accept address and length parameters
tcx: alter tcx24_check_dirty() to accept address and length parameters
tcx: ensure tcx_set_dirty() also invalidates the 24-bit plane and cplane
tcx: alter tcx_set_dirty() to accept address and length parameters
cg3: switch to load_image_mr() and remove prom-addr hack
cg3: fix up size parameter for memory_region_get_dirty()
cg3: remove TARGET_PAGE_SIZE rounding on dirty page detection
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add properties for the default display resolution, pass
on that information to the guest so the driver can use it.
Also move up qxl_crc32() function so we don't need a
forward declaration.
Additionally guest driver updates are needed so the
guest driver will actually pick this up, which will
probably land in linux kernel 4.12.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421092234.8368-1-kraxel@redhat.com
Fix standard vga mode check: Both s->config and s->enabled must be set
to enable vmware command fifo processing.
Drop dirty tracking code from the fifo rendering code path, it isn't
used anyway because vmsvga turns off dirty tracking when leaving
standard vga mode.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-9-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The vga code clears the dirty bits *after* reading the framebuffer
memory. So if the guest framebuffer updates hits the race window
between vga reading the framebuffer and vga clearing the dirty bits
vga will miss that update
Fix it by using the new memory_region_copy_and_clear_dirty()
memory_region_copy_get_dirty() functions. That way we clear the
dirty bitmap before reading the framebuffer. Any guest display
updates happening in parallel will be properly tracked in the
dirty bitmap then and the next display refresh will pick them up.
Problem triggers with mttcg only. Before mttcg was merged tcg
never ran in parallel to vga emulation. Using kvm will hide the
problem too, due to qemu operating on a userspace copy of the
kernel's dirty bitmap.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-5-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add vga_scanline_invalidated helper to check whenever a scanline was
invalidated. Add a sanity check to fix OOB read access for display
heights larger than 2048.
Only cirrus uses this, for hardware cursor rendering, so having this
work properly for the first 2048 scanlines only shouldn't be a problem
as the cirrus can't handle large resolutions anyway. Also changing the
invalidated_y_table size would break live migration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-4-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch adds support for getting and using a local copy of the dirty
bitmap.
memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy. The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.
memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-3-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The FTGMAC100 device is an Ethernet controller with DMA function that
can be found on Aspeed SoCs (which include NCSI).
It is fully compliant with IEEE 802.3 specification for 10/100 Mbps
Ethernet and IEEE 802.3z specification for 1000 Mbps Ethernet and
includes Reduced Media Independent Interface (RMII) and Reduced
Gigabit Media Independent Interface (RGMII) interfaces. It adopts an
AHB bus interface and integrates a link list DMA engine with direct
M-Bus accesses for transmitting and receiving packets. It has
independent TX/RX fifos, supports half and full duplex (1000 Mbps mode
only supports full duplex), flow control for full duplex and
backpressure for half duplex.
The FTGMAC100 also implements IP, TCP, UDP checksum offloads and
supports IEEE 802.1Q VLAN tag insertion and removal. It offers
high-priority transmit queue for QoS and CoS applications
This model is backed with a RealTek 8211E PHY which is the chip found
on the AST2500 EVB. It is complete enough to satisfy two different
Linux drivers and a U-Boot driver. Not supported features are :
- IEEE 802.1Q VLAN
- High Priority Transmit Queue
- Wake-On-LAN functions
The code is based on the Coldfire Fast Ethernet Controller model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This adds comments on the Basic mode control and status registers bit
definitions. It also adds a couple of bits for 1000BASE-T and the
RealTek 8211E PHY for the FTGMAC100 model to use.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If colo-compare find one old packet,we can notify colo-frame
do checkpoint, no need continue find more old packet here.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Do not use the ring.h header installed on the system. Instead, import
the header into the QEMU codebase. This avoids problems when QEMU is
built against a Xen version too old to provide all the ring macros.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
Instead of trying to guess the Xen version to use by compiling various
test programs first just ask the system via pkg-config. Only if it
can't return the version fall back to the test program scheme.
If configure is being called with dedicated flags for the Xen libraries
use those instead of the pkg-config output. This will avoid breaking
an in-tree Xen build of an old Xen version while a new Xen version is
installed on the build machine: pkg-config would pick up the installed
Xen config files as the Xen tree wouldn't contain any of them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Commit f0f272baf3a7 "xen: use libxendevice model to restrict operations"
added a command-line option (-xen-domid-restrict) to limit operations
using the libxendevicemodel API to a specified domid. The commit also
noted that the restriction would be extended to cover operations issued
via other xen libraries by subsequent patches.
My recent Xen patch [1] added a call to the xenforeignmemory API to allow
it to be restricted. This patch now makes use of that new call when the
-xen-domid-restrict option is passed.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=5823d6eb
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch adds a command-line option (-xen-domid-restrict) which will
use the new libxendevicemodel API to restrict devicemodel [1] operations
to the specified domid. (Such operations are not applicable to the xenpv
machine type).
This patch also adds a tracepoint to allow successful enabling of the
restriction to be monitored.
[1] I.e. operations issued by libxendevicemodel. Operation issued by other
xen libraries (e.g. libxenforeignmemory) are currently still unrestricted
but this will be rectified by subsequent patches.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Today qemu is using e.g. the value 480 for Xen version 4.8.0. As some
Xen version tests are using ">" relations this scheme will lead to
problems when Xen version 4.10.0 is being reached.
Instead of the 3 digit schem use a 5 digit scheme (e.g. 40800 for
version 4.8.0).
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch modifies the wrapper functions in xen_common.h to use the
new xendevicemodel interface if it is available along with compatibility
code to use the old libxenctrl interface if it is not.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch adds code in configure to set CONFIG_XEN_CTRL_INTERFACE_VERSION
to a new value of 490 if libxendevicemodel is present in the build
environment.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
migration/next for 20170421
# gpg: Signature made Fri 21 Apr 2017 11:28:13 BST
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg: aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/juanquintela/tags/migration/20170421: (65 commits)
hmp: info migrate_parameters format tunes
hmp: info migrate_capability format tunes
migration: rename max_size to threshold_size
migration: set current_active_state once
virtio-rng: stop virtqueue while the CPU is stopped
migration: don't close a file descriptor while it can be in use
ram: Remove migration_bitmap_extend()
migration: Disable hotplug/unplug during migration
qdev: Move qdev_unplug() to qdev-monitor.c
qdev: Export qdev_hot_removed
qdev: qdev_hotplug is really a bool
migration: Remove MigrationState parameter from migration_is_idle()
ram: Use RAMBitmap type for coherence
ram: rename last_ram_offset() last_ram_pages()
ram: Use ramblock and page offset instead of absolute offset
ram: Change offset field in PageSearchStatus to page
ram: Remember last_page instead of last_offset
ram: Use page number instead of an address for the bitmap operations
ram: reorganize last_sent_block
ram: ram_discard_range() don't use the mis parameter
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Exception Prefix High (EPH) control bit of the Supervision Register
(SR).
The significant bits (31-12) of the vector offset address for each
exception depend on the setting of the Supervision Register (SR)'s EPH
bit and the Exception Vector Base Address Register (EVBAR).
If SR[EPH] is set, the vector offset is logically ORed with the offset
0xF0000000.
This means if EPH is;
* 0 - Exceptions vectors start at EVBAR
* 1 - Exception vectors start at EVBAR | 0xF0000000
Signed-off-by: Tim 'mithro' Ansell <mithro@mithis.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Exception Vector Base Address Register (EVBAR) - This optional register
can be used to apply an offset to the exception vector addresses.
The significant bits (31-12) of the vector offset address for each
exception depend on the setting of the Supervision Register (SR)'s EPH
bit and the Exception Vector Base Address Register (EVBAR).
Its presence is indicated by the EVBARP bit in the CPU Configuration
Register (CPUCFGR).
Signed-off-by: Tim 'mithro' Ansell <mithro@mithis.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
# gpg: Signature made Fri 21 Apr 2017 10:43:04 BST
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
MAINTAINERS: update my email address
MAINTAINERS: update Wen's email address
migration/block: use blk_pwrite_zeroes for each zero cluster
throttle: make throttle_config(throttle_get_config()) symmetric
throttle: do not use invalid config in test
qemu-options: explain disk I/O throttling options
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The first batch of s390x changes for 2.10:
- the new compat machine
- several cleanups and optimizations
- introspection for css ids
# gpg: Signature made Fri 21 Apr 2017 08:36:25 BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20170421:
s390x: Drop useless casts
s390x: register I/O adapters per ISC during init
s390x/flic: cache flic in s390_get_flic
s390x: initialize flic before I/O subsystems
s390x: use enum for adapter type and standardize its naming
s390x/css: consolidate the devno property for ccw devices
s390x/css: provide introspection for virtual subchannel and device busid
s390x/css: introduce read-only property type for device ids
s390x/pci: make printf always compile in debug output
s390x/kvm: make printf always compile in debug output
s390x: introduce 2.10 compat machine
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Dump the info in a single line is hard to read. Do it one per line.
Also, the first "capabilities:" didn't help much. Let's remove it.
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In migration codes (especially in migration_thread()), max_size is used
in many place for the threshold value that we will start to do the final
flush and jump to the next stage to dump the whole rest things to
destination. However its name is confusing to first readers. Let's
rename it to "threshold_size" when proper and add a comment for it. No
functional change is made.
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If we modify the virtio-rng virqueue while the
vmstate is already migrated we can have some
inconsistencies between the virtqueue state and
the memory content.
To avoid this, stop the virtqueue while the CPU
is stopped.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Amit Shah <amit@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If we close the QEMUFile descriptor in process_incoming_migration_co()
while it has been stopped by an error, the postcopy_ram_listen_thread()
can try to continue to use it. And as the memory has been freed
it is working with an invalid pointer and crashes.
Fix this by releasing the memory after having managed the error
case (which, in fact, calls exit())
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Until we have reviewed what can/can't be hotplugged during migration,
disable it. We can enable it later for the things that we know that
work. For instance, memory hotplug during postcopy doesn't work
currently.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
--
- Fix typo. Thanks Thomas.
- Delay migration check after we have checked that we can hotplug that
device.
- more typos
Only user don't have a MigrationState handly.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This removes the needto pass also the absolute offset.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are moving everything to work on pages, not addresses.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We use an unsigned long for the page number. Notice that our bitmaps
already got that for the index, so we have that limit.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
rename page to page_abs everywhere.
fix trace types for pages
We were setting it far away of when we changed it. Now everything is
done inside save_page_header. Once there, reorganize code to pass
RAMState. We also set CONTINUE flag in a single place.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We change the meaning of start to be the offset from the beggining of
the block.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The number of dirty pages is output in 'pages' in the command
'info migrate', so add page-size to calculate the number of dirty
pages in bytes.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It was used as a size in all cases except one.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We need to call for the migrate_get_current() in more that half of the
uses, so call that inside.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We can calculate its value, so we don't create a variable for it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
After Peter and Dave review, I dropped the variable and just inlined
the condition.
Fix typo
We receive the file from save_live operations and we don't use it
until 3 or 4 levels of calls down.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Treat it like the rest of ram stats counters. Export its value the
same way. As an added bonus, no more MigrationState used in
migration_bitmap_sync();
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Again, dave was the one reviewing it
It can be recalculated from dirty_pages_rate.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Dave was the one that reviewed it O:-)
This is a ram field that was inside MigrationState. Move it to
RAMState and make it the same that the other ram stats.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
This are the last postcopy fields still at MigrationState. Once there
Move MigrationSrcPageRequest to ram.c and remove MigrationState
parameters where appropiate.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
It was on MigrationState when it is only used inside ram.c for
postcopy. Problem is that we need to access it without being able to
pass it RAMState directly.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Just unfold it. Move ram_bytes_remaining() with the rest of exported
functions.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Its value can be calculated by other exported.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
For compatibility, we need to still send a value, but just specify it
and comment the fact.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We create a struct where to put all the ram state
Start with the following fields:
last_seen_block, last_sent_block, last_offset, last_version and
ram_bulk_stage are globals that are really related together.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix typo and warnings
So all places are consistent on the naming of a block name parameter.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added doc comments for existing functions comment and rewrite them in
a common style.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix Peter Xu comments
Improve postcopy comments as per reviews.
Users can inherit from the simpletrace.Analyzer class and receive
callbacks when events of interest occur in a trace file. The method
signature is a little magic because the timestamp and pid arguments are
optional. Document this.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170411095654.18383-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently all trace.o are linked into qemu-system, qemu-img,
qemu-nbd, qemu-io etc., even the corresponding components
are not included.
Put all trace.o into libqemuutil.a that the linker would only pull in .o
files containing symbols that are actually referenced by the
program.
Signed-off -by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The ./configure script should produce --help output even if Python is
not installed.
Listing trace backends is simple: show the names of all Python modules
in scripts/tracetool/backend/ whose source code contains 'PUBLIC =
True'.
Perform the backend enumeration in shell instead of Python so that we
can move the Python check until after ./configure --help.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170328134418.3426-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
this may cause the qcow2 file size to be bigger after migration.
This patch checks each cluster, using blk_pwrite_zeroes for each
zero cluster.
[Initialize cluster_size to BLOCK_SIZE to prevent a gcc uninitialized
variable compiler warning. In reality we always initialize cluster_size
in a conditional but gcc doesn't know that.
--Stefan]
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Message-id: 1492050868-16200-1-git-send-email-lidongchen@tencent.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Throttling has a weird property that throttle_get_config() does not
always return the same throttling settings that were given with
throttle_config(). In other words, the set and get functions aren't
symmetric.
If .max is 0 then the throttling code assigns a default value of .avg /
10 in throttle_config(). This is an implementation detail of the
throttling algorithm. When throttle_get_config() is called the .max
value returned should still be 0.
Users are exposed to this quirk via "info block" or "query-block"
monitor commands. This has caused confusion because it looks like a bug
when an unexpected value is reported.
This patch hides the .max value adjustment in throttle_get_config() and
updates test-throttle.c appropriately.
Reported-by: Nini Gu <ngu@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170301115026.22621-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The (burst) max parameter cannot be smaller than the avg parameter.
There is a test case that uses avg = 56, max = 1 and gets away with it
because no input validation is performed by the test case.
This patch switches to valid test input parameters.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170301115026.22621-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Machine queue for 2.10
# gpg: Signature made Thu 20 Apr 2017 19:44:27 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/machine-pull-request:
qdev: Constify local variable returned by blk_bs
qdev: Constify value passed to qdev_prop_set_macaddr
hostmem: use host_memory_backend_mr_inited() where proper
hostmem: introduce host_memory_backend_mr_inited()
hw/core/null-machine: Print error message when using the -kernel parameter
qdev: Make "hotplugged" property read-only
intel_iommu: enable remote IOTLB
intel_iommu: allow dynamic switch of IOMMU region
intel_iommu: provide its own replay() callback
intel_iommu: use the correct memory region for device IOTLB notification
memory: add MemoryRegionIOMMUOps.replay() callback
memory: introduce memory_region_notify_one()
memory: provide iommu_replay_all()
memory: provide IOMMU_NOTIFIER_FOREACH macro
memory: add section range info for IOMMU notifier
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Previous to the existence of load_image_mr(), the only way to load in the
FCode ROM image was to pass in its physical address via qdev properties
and use load_image_targphys().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Rather than calling memory_region_set_dirty() directly, make sure that we call
tcx_set_dirty() instead. This ensures that the 24-bit plane and cplane are
also invalidated correctly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
As all surfaces in QEMU are now either shared or 32-bit ARGB regardless of
the guest depth, remove all non-32-bit primitives from tcx_update_display()
and consequence their implementation which are no longer required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Now that page alignment is handled by the memory API, there is no need to
duplicate the code 4 times (4 * 1024 == 4096 == TARGET_PAGE_SIZE).
Finally we have now removed all traces of TARGET_PAGE_SIZE.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Now that page alignment is handled by the memory API, there is no need to
duplicate the code 4 times (4 * 1024 == 4096 == TARGET_PAGE_SIZE).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Since all of the tcx_*_dirty() functions now calculate the 24-bit and
cplane offsets themselves from the base address, these variables are no
longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This can now be used by both the 8-bit and 24-bit display code, so rename
to tcx_check_dirty().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This can now be used by both the 8-bit and 24-bit display code, so rename
to tcx_check_dirty().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Previous to the existence of load_image_mr(), the only way to load in the
FCode ROM image was to pass in its physical address via qdev properties
and use load_image_targphys().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
An upcoming Coccinelle cleanup script wanted to reformat the casts
present in this file - but on closer look, we don't need the casts
at all because C automatically converts void* to any other pointer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170405194741.18956-4-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The I/O adapters should exist as soon as the bus/infrastructure
exists, and not only when the guest is actually trying to do something
with them. While the lazy allocation was not wrong, allocating at init
time is cleaner, both for the architecture and the code. Let's adjust
this by having each device type (currently for PCI and virtio-ccw)
register the adapters for each ISC (as now we don't know which ISC the
guest will use) as soon as it initializes.
Use a two-dimensional array io_adapters[type][isc] to store adapters
in ChannelSubSys, so that we can conveniently get the adapter id by
the helper function css_get_adapter_id(type, isc).
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
s390_get_flic() is called many times to obtain the flic. This wastes a
lot of time as it calls object_resolve_path() every time. Let's cache
S390FLICState by defining it as static.
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's use an enum for io adapter type, and standardize its naming to
CSS_IO_ADAPTER_* by changing S390_PCIPT_ADAPTER to CSS_IO_ADAPTER_PCI.
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
'devno' should rather be a property of the ccw device, instead of a
property of a specific virtio-ccw device. Let's consolidate it.
While we are at here, also rename CcwDevice.bus_id to CcwDevice.devno to
make things clearer.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Expose the busids of the virtual I/O subchannel and the virtual CCW
device to ease debugging. This is needed because:
1. subchannel id are assigned dynamically, and cannot be set from
outside.
2. device busid could possibly be auto generated.
An example of using HMP to retrieve the property values of a
virtio-balloon-ccw device looks like:
[root@localhost ~]# lscss -d 0.0.0004
Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs
----------------------------------------------------------------------
0.0.0004 0.0.0003 0000/00 3832/05 yes 80 80 ff 00000000 00000000
(qemu) info qtree
... ...
dev: virtio-balloon-ccw, id "balloon0"
devno = "<unset>"
ioeventfd = true
max_revision = 2 (0x2)
dev_id = "fe.0.0004"
subch_id = "fe.0.0003"
... ...
After migration, if we have the same device that shows up on a
different subchannel, we must re-fill the subch_id of the ccw
device with the new schid, or the subch_id will have an old wrong
schid value. So this also re-fills the subch_id after migration.
While we are at it, also neaten the related error handling a bit.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's introduce a read-only property type that handles device ids of the
CssDevId type used for channel devices for future use. e.g. exposing the
busid of an I/O subchannel that is assigned to a ccw device.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The code was incorrectly calculating the end address rather than the size of
the required region.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This was an artifact from very early versions of the code from before the
memory API and is no longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
cannot_destroy_with_object_finalize_yet was added by 4c315c2
("qdev: Protect device-list-properties against broken devices")
because "realview_pci" and "versatile_pci" were hanging
during "device-list-properties" cleanup (an infinite loop in
bus_unparent()).
We have this problem because the child is not removed from
the list of the PCI bus children because it has no defined parent:
qdev_set_parent_bus() set the device parent_bus pointer to bus, and
adds the device in the bus children list, but doesn't update the
device parent pointer.
To fix the problem, move all the involved parts to the realize function.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170414083717.13641-4-lvivier@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This removes the assert(kvm_enabled()) from kvmppc_host_cpu_initfn()
This assert can never be triggered as the function is only registered
when KVM is available (see also 4c315c2
"qdev: Protect device-list-properties against broken devices").
So we can remove the cannot_destroy_with_object_finalize_yet from
kvmppc_host_cpu_class_init() without fear and beyond reproach.
(as it has already be done for i386 with 771a13e "i386: Unset
cannot_destroy_with_object_finalize_yet on "host" model" and
e435601 "target-i386: Remove assert(kvm_enabled()) from
host_x86_cpu_initfn()")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170414083717.13641-3-lvivier@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Inside qdev_prop_set_drive() the value returned by blk_bs() is passed
only as pointer to const to bdrv_get_node_name() and pointed values is
not modified in other places so this can be made const for code
safeness.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-Id: <20170310200550.13313-3-krzk@kernel.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
If the user currently tries to use the -kernel parameter, simply nothing
happens, and the user might get confused that there is nothing loaded
to memory, but also no error message has been issued. Since there is no
real generic way to load a kernel on all CPU types (but on some targets,
the generic loader can be used instead), issue an appropriate error
message here now to avoid the possible confusion.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1488271971-12624-1-git-send-email-thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The "hotplugged" property is user visible, but it was never meant
to be set by the user. There are probably multiple ways to break
or crash device code by overriding the property. For example, we
recently fixed a crash in rtc_set_memory() related to the
property (commit 26ef65beab).
There has been some discussion about making management software
use "hotplugged=on" on migration, to indicate devices that were
hotplugged in the migration source. There were other suggestions
to address this, like including the "hotplugged" field in the
migration stream instead of requiring it to be set explicitly.
Whatever solution we choose in the future, this patch disables
setting "hotplugged" explicitly in the command-line by now,
because the ability to set the property is unused, untested, and
undocumented.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170222192647.19690-1-ehabkost@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch
upstream:
"IOMMU: enable intel_iommu map and unmap notifiers"
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html
However I removed/fixed some content, and added my own codes.
Instead of translate() every page for iotlb invalidations (which is
slower), we walk the pages when needed and notify in a hook function.
This patch enables vfio devices for VT-d emulation.
And, since we already have vhost DMAR support via device-iotlb, a
natural benefit that this patch brings is that vt-d enabled vhost can
live even without ATS capability now. Though more tests are needed.
Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This is preparation work to finally enabled dynamic switching ON/OFF for
VT-d protection. The old VT-d codes is using static IOMMU address space,
and that won't satisfy vfio-pci device listeners.
Let me explain.
vfio-pci devices depend on the memory region listener and IOMMU replay
mechanism to make sure the device mapping is coherent with the guest
even if there are domain switches. And there are two kinds of domain
switches:
(1) switch from domain A -> B
(2) switch from domain A -> no domain (e.g., turn DMAR off)
Case (1) is handled by the context entry invalidation handling by the
VT-d replay logic. What the replay function should do here is to replay
the existing page mappings in domain B.
However for case (2), we don't want to replay any domain mappings - we
just need the default GPA->HPA mappings (the address_space_memory
mapping). And this patch helps on case (2) to build up the mapping
automatically by leveraging the vfio-pci memory listeners.
Another important thing that this patch does is to seperate
IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not
depend on the DMAR region (like before this patch). It should be a
standalone region, and it should be able to be activated without
DMAR (which is a common behavior of Linux kernel - by default it enables
IR while disabled DMAR).
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally consumes a lot of time (which looks like a dead loop).
The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.
To achieve this, we provided a page walk helper to do that, invoking
corresponding hook function when we found an page we are interested in.
vtd_page_walk_level() is the core logic for the page walking. It's
interface is designed to suite further use case, e.g., to invalidate a
range of addresses.
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Originally we have one memory_region_iommu_replay() function, which is
the default behavior to replay the translations of the whole IOMMU
region. However, on some platform like x86, we may want our own replay
logic for IOMMU regions. This patch adds one more hook for IOMMUOps for
the callback, and it'll override the default if set.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-6-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
In this patch, IOMMUNotifier.{start|end} are introduced to store section
information for a specific notifier. When notification occurs, we not
only check the notification type (MAP|UNMAP), but also check whether the
notified iova range overlaps with the range of specific IOMMU notifier,
and skip those notifiers if not in the listened range.
When removing an region, we need to make sure we removed the correct
VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well.
This patch is solving the problem that vfio-pci devices receive
duplicated UNMAP notification on x86 platform when vIOMMU is there. The
issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is
splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK
this (splitted IOMMU region) is only happening on x86.
This patch also helps vhost to leverage the new interface as well, so
that vhost won't get duplicated cache flushes. In that sense, it's an
slight performance improvement.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com>
[ehabkost: included extra vhost_iommu_region_del() change from Peter Xu]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We already require gcc 4.1 or newer (for the atomic
support), so the fallback codepaths for older gcc
versions than that are now dead code and we can
just delete them.
NB: clang reports itself as gcc 4.2 (regardless of
clang version), so clang won't be using the fallbacks
either.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
target-arm queue:
* implement M profile exception return properly
* cadence GEM: fix multiqueue handling bugs
* pxa2xx.c: QOMify a device
* arm/kvm: Remove trailing newlines from error_report()
* stellaris: Don't hw_error() on bad register accesses
* Add assertion about FSC format for syndrome registers
* Move excnames[] array into arm_log_exceptions()
* exynos: minor code cleanups
* hw/arm/boot: take Linux/arm64 TEXT_OFFSET header field into account
* Fix APSR writes via M profile MSR
# gpg: Signature made Thu 20 Apr 2017 17:39:35 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20170420: (24 commits)
arm: Remove workarounds for old M-profile exception return implementation
arm: Implement M profile exception return properly
arm: Track M profile handler mode state in TB flags
arm: Abstract out "are we singlestepping" test to utility function
arm: Move condition-failed codepath generation out of if()
arm: Move gen_set_condexec() and gen_set_pc_im() up in the file
arm: Factor out "generate right kind of step exception"
arm: Thumb shift operations should not permit interworking branches
arm: Don't implement BXJ on M-profile CPUs
xlnx-zynqmp: Set the Cadence GEM revision
cadence_gem: Make the revision a property
cadence_gem: Correct the interupt logic
cadence_gem: Correct the multi-queue can rx logic
cadence_gem: Read the correct queue descriptor
hw/arm: Qomify pxa2xx.c
arm/kvm: Remove trailing newlines from error_report()
stellaris: Don't hw_error() on bad register accesses
target/arm: Add assertion about FSC format for syndrome registers
arm: Move excnames[] array into arm_log_exceptions()
target/arm: Add missing entries to excnames[] for log strings
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On M profile, return from exceptions happen when code in Handler mode
executes one of the following function call return instructions:
* POP or LDM which loads the PC
* LDR to PC
* BX register
and the new PC value is 0xFFxxxxxx.
QEMU tries to implement this by not treating the instruction
specially but then catching the attempt to execute from the magic
address value. This is not ideal, because:
* there are guest visible differences from the architecturally
specified behaviour (for instance jumping to 0xFFxxxxxx via a
different instruction should not cause an exception return but it
will in the QEMU implementation)
* we have to account for it in various places (like refusing to take
an interrupt if the PC is at a magic value, and making sure that
the MPU doesn't deny execution at the magic value addresses)
Drop these hacks, and instead implement exception return the way the
architecture specifies -- by having the relevant instructions check
for the magic value and raise the 'do an exception return' QEMU
internal exception immediately.
The effect on the generated code is minor:
bx lr, old code (and new code for Thread mode):
TCG:
mov_i32 tmp5,r14
movi_i32 tmp6,$0xfffffffffffffffe
and_i32 pc,tmp5,tmp6
movi_i32 tmp6,$0x1
and_i32 tmp5,tmp5,tmp6
st_i32 tmp5,env,$0x218
exit_tb $0x0
set_label $L0
exit_tb $0x7f2aabd61993
x86_64 generated code:
0x7f2aabe87019: mov %ebx,%ebp
0x7f2aabe8701b: and $0xfffffffffffffffe,%ebp
0x7f2aabe8701e: mov %ebp,0x3c(%r14)
0x7f2aabe87022: and $0x1,%ebx
0x7f2aabe87025: mov %ebx,0x218(%r14)
0x7f2aabe8702c: xor %eax,%eax
0x7f2aabe8702e: jmpq 0x7f2aabe7c016
bx lr, new code when in Handler mode:
TCG:
mov_i32 tmp5,r14
movi_i32 tmp6,$0xfffffffffffffffe
and_i32 pc,tmp5,tmp6
movi_i32 tmp6,$0x1
and_i32 tmp5,tmp5,tmp6
st_i32 tmp5,env,$0x218
movi_i32 tmp5,$0xffffffffff000000
brcond_i32 pc,tmp5,geu,$L1
exit_tb $0x0
set_label $L1
movi_i32 tmp5,$0x8
call exception_internal,$0x0,$0,env,tmp5
x86_64 generated code:
0x7fe8fa1264e3: mov %ebp,%ebx
0x7fe8fa1264e5: and $0xfffffffffffffffe,%ebx
0x7fe8fa1264e8: mov %ebx,0x3c(%r14)
0x7fe8fa1264ec: and $0x1,%ebp
0x7fe8fa1264ef: mov %ebp,0x218(%r14)
0x7fe8fa1264f6: cmp $0xff000000,%ebx
0x7fe8fa1264fc: jae 0x7fe8fa126509
0x7fe8fa126502: xor %eax,%eax
0x7fe8fa126504: jmpq 0x7fe8fa122016
0x7fe8fa126509: mov %r14,%rdi
0x7fe8fa12650c: mov $0x8,%esi
0x7fe8fa126511: mov $0x56095dbeccf5,%r10
0x7fe8fa12651b: callq *%r10
which is a difference of one cmp/branch-not-taken. This will
be lost in the noise of having to exit generated code and
look up the next TB anyway.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-9-git-send-email-peter.maydell@linaro.org
For M profile exception-return handling we'd like to generate different
code for some instructions depending on whether we are in Handler
mode or Thread mode. This isn't the same as "are we privileged
or user", so we need an extra bit in the TB flags to distinguish.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-8-git-send-email-peter.maydell@linaro.org
We now test for "are we singlestepping" in several places and
it's not a trivial check because we need to care about both
architectural singlestep and QEMU gdbstub singlestep. We're
also about to add another place that needs to make this check,
so pull the condition out into a function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-7-git-send-email-peter.maydell@linaro.org
Move the code to generate the "condition failed" instruction
codepath out of the if (singlestepping) {} else {}. This
will allow adding support for handling a new is_jmp type
which can't be neatly split into "singlestepping case"
versus "not singlestepping case".
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1491844419-12485-6-git-send-email-peter.maydell@linaro.org
We currently have two places that do:
if (dc->ss_active) {
gen_step_complete_exception(dc);
} else {
gen_exception_internal(EXCP_DEBUG);
}
Factor this out into its own function, as we're about to add
a third place that needs the same logic.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1491844419-12485-4-git-send-email-peter.maydell@linaro.org
In Thumb mode, the only instructions which can cause an interworking
branch by writing the PC are BLX, BX, BXJ, LDR, POP and LDM. Unlike
ARM mode, data processing instructions which target the PC do not
cause interworking branches.
When we added support for doing interworking branches on writes to
PC from data processing instructions in commit 21aeb3430c, we
accidentally changed a Thumb instruction to have interworking
branch behaviour for writes to PC. (MOV, MOVS register-shifted
register, encoding T2; this is the standard encoding for
LSL/LSR/ASR/ROR (register).)
For this encoding, behaviour with Rd == R15 is specified as
UNPREDICTABLE, so allowing an interworking branch is within
spec, but it's confusing and differs from our handling of this
class of UNPREDICTABLE for other Thumb ALU operations. Make
it perform a simple (non-interworking) branch like the others.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-3-git-send-email-peter.maydell@linaro.org
This patch fixes two mistakes in the interrupt logic.
First we only trigger single-queue or multi-queue interrupts if the status
register is set. This logic was already used for non multi-queue interrupts
but it also applies to multi-queue interrupts.
Secondly we need to lower the interrupts if the ISR isn't set. As part
of this we can remove the other interrupt lowering logic and consolidate
it inside gem_update_int_status().
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 438bcc014f8f8a2f8f68f322cb6a53f4c04688c2.1491947224.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Current recommended style is to log a guest error on bad register
accesses, not kill the whole system with hw_error(). Change the
hw_error() calls to log as LOG_GUEST_ERROR or LOG_UNIMP or use
g_assert_not_reached() as appropriate.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491486314-25823-1-git-send-email-peter.maydell@linaro.org
In tlb_fill() we construct a syndrome register value from a
fault status register value which is filled in by arm_tlb_fill().
arm_tlb_fill() returns FSR values which might be in the format
used with short-format page descriptors, or the format used
with long-format (LPAE) descriptors. The syndrome register
always uses LPAE-format FSR status codes.
It isn't actually possible to end up delivering a syndrome
register value to the guest for a fault which is reported
with a short-format FSR (that kind of stage 1 fault will only
happen for an AArch32 translation regime which doesn't have
a syndrome register, and can never be redirected to an AArch64
or Hyp exception level). Add an assertion which checks this,
and adjust the code so that we construct a syndrome with
an invalid status code, rather than allowing set bits in
the FSR input to randomly corrupt other fields in the syndrome.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1491486152-24304-1-git-send-email-peter.maydell@linaro.org
The excnames[] array is defined in internals.h because we used
to use it from two different source files for handling logging
of AArch32 and AArch64 exception entry. Refactoring means that
it's now used only in arm_log_exception() in helper.c, so move
the array into that function.
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491821097-5647-1-git-send-email-peter.maydell@linaro.org
Recent changes have added new EXCP_ values to ARM but forgot
to update the excnames[] array which is used to provide
human-readable strings when printing information about the
exception for debug logging. Add the missing entries, and
add a comment to the list of #defines to help avoid the mistake
being repeated in future.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1491486340-25988-1-git-send-email-peter.maydell@linaro.org
Short declaration of 'i' was in the middle of declarations with
assignments. Make it a little bit more readable. Additionally switch
from "unsigned" to "unsigned int" as this pattern is more widely used.
No functional change.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170313184750.429-4-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The static array exynos4210_uart_regs with register values is not
modified so it can be made const.
Few other functions accept driver or uart state as an argument but they
do not change it and do not cast it so this can be made const for code
safeness.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170313184750.429-3-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu_log_mask() and error_report() are preferred over fprintf() for
logging errors. Also remove square brackets [] and additional new line
characters in printed messages.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170313184750.429-2-krzk@kernel.org
[PMM: wrapped long line]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The arm64 boot protocol stipulates that the kernel must be loaded
TEXT_OFFSET bytes beyond a 2 MB aligned base address, where TEXT_OFFSET
could be any 4 KB multiple between 0 and 2 MB, and whose value can be
found in the header of the Image file.
So after attempts to load the arm64 kernel image as an ELF file or as a
U-Boot image have failed (both of which have their own way of specifying
the load offset), try to determine the TEXT_OFFSET from the image after
loading it but before mapping it as a ROM mapping into the guest address
space.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1489414630-21609-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch creates inline wrapper functions in xen_common.h for all open
coded calls to xc_hvm_XXX() functions outside of xen_common.h so that use
of xen_xc can be made implicit. This again is in preparation for the move
to using libxendevicemodel.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Doing this will make the transition to using the new libxendevicemodel
interface less intrusive on the callers of these functions, since using
the new library will require a change of handle.
NOTE: The patch also moves the 'externs' for xen_xc and xen_fmem from
xen_backend.h to xen_common.h, and the declarations from
xen_backend.c to xen-common.c, which is where they belong.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-03-22 11:47:39 -07:00
691 changed files with 22798 additions and 8759 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.