Compare commits

..

258 Commits

Author SHA1 Message Date
Gerd Hoffmann
371ec54e9f ui: egl-headless requires dmabuf support
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170517122744.3541-1-kraxel@redhat.com
2017-05-19 10:46:00 +02:00
Stefan Hajnoczi
56821559f0 Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging
HMP pull

# gpg: Signature made Wed 17 May 2017 07:03:39 PM BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* dgilbert/tags/pull-hmp-20170517:
  ramblock: add new hmp command "info ramblock"
  utils: provide size_to_str()
  ramblock: add RAMBLOCK_FOREACH()

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-18 13:36:15 +01:00
Stefan Hajnoczi
2ccbd47c1d Merge remote-tracking branch 'quintela/tags/migration/20170517' into staging
migration/next for 20170517

# gpg: Signature made Wed 17 May 2017 11:46:36 AM BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* quintela/tags/migration/20170517:
  migration: Move check_migratable() into qdev.c
  migration: Move postcopy stuff to postcopy-ram.c
  migration: Move page_cache.c to migration/
  migration: Create migration/blocker.h
  ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO
  migration: Pass Error ** argument to {save,load}_vmstate
  migration: Fix regression with compression threads

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-18 10:05:52 +01:00
Stefan Hajnoczi
adb354dd1e Merge remote-tracking branch 'mst/tags/for_upstream' into staging
pci, virtio, vhost: fixes

A bunch of fixes that missed the release.
Most notably we are reverting shpc back to enabled by default state
as guests uses that as an indicator that hotplug is supported
(even though it's unused). Unfortunately we can't fix this
on the stable branch since that would break migration.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 17 May 2017 10:42:06 PM BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* mst/tags/for_upstream:
  exec: abstract address_space_do_translate()
  pci: deassert intx when pci device unrealize
  virtio: allow broken device to notify guest
  Revert "hw/pci: disable pci-bridge's shpc by default"
  acpi-defs: clean up open brace usage
  ACPI: don't call acpi_pcihp_device_plug_cb on xen
  iommu: Don't crash if machine is not PC_MACHINE
  pc: add 2.10 machine type
  pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot
  libvhost-user: fix crash when rings aren't ready
  hw/virtio: fix vhost user fails to startup when MQ
  hw/arm/virt: generate 64-bit addressable ACPI objects
  hw/acpi-defs: replace leading X with x_ in FADT field names

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-18 10:01:08 +01:00
Peter Xu
a764040cc8 exec: abstract address_space_do_translate()
This function is an abstraction helper for address_space_translate() and
address_space_get_iotlb_entry(). It does the lookup of address into
memory region section, then does proper IOMMU translation if necessary.
Refactor the two existing functions to use it.

This fixes vhost when IOMMU is disabled by guest.

Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-18 00:35:15 +03:00
Herongguang (Stephen)
3936161f1f pci: deassert intx when pci device unrealize
If a pci device is not reset by VM (by writing into config space)
and unplugged by VM, after that when VM reboots, qemu may assert:
pcibus_reset: Assertion `bus->irq_count[i] == 0' failed

Cc: qemu-stable@nongnu.org
Signed-off-by: herongguang <herongguang.he@huawei.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-18 00:35:15 +03:00
Greg Kurz
66453cff9e virtio: allow broken device to notify guest
According to section 2.1.2 of the virtio-1 specification:

"The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that
a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET,
the device MUST send a device configuration change notification to the
driver."

Commit "f5ed36635d8f virtio: stop virtqueue processing if device is broken"
introduced a virtio_error() call that just does that:

- internally mark the device as broken
- set the DEVICE_NEEDS_RESET bit in the status
- send a configuration change notification

Unfortunately, virtio_notify_vector(), called by virtio_notify_config(),
returns right away when the device is marked as broken and the notification
isn't sent in this case.

The spec doesn't say whether a broken device can send notifications
in other situations or not. But since the driver isn't supposed to do
anything but to reset the device, it makes sense to keep the check in
virtio_notify_config().

Marking the device as broken AFTER the configuration change notification was
sent is enough to fix the issue.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-18 00:35:15 +03:00
Marcel Apfelbaum
2fa356629e Revert "hw/pci: disable pci-bridge's shpc by default"
This reverts commit dc0ae76770.

Disabling the shpc controller has an undesired side effect.
The PCI bridge remains with no attached devices at boot time,
and the guest operating systems do not allocate any resources
for it, leaving the bridge unusable. Note that the behaviour
is dictated by the pci bridge specification.

Revert the commit and leave the shpc controller even if is not
actually used by any architecture. Slot 0 remains unusable at boot time.

Keep shpc off for QEMU 2.9 machines.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-18 00:35:15 +03:00
Peter Xu
be9b23c4a5 ramblock: add new hmp command "info ramblock"
To dump information about ramblocks. It looks like:

(qemu) info ramblock
              Block Name    PSize              Offset               Used              Total
            /objects/mem    2 MiB  0x0000000000000000 0x0000000080000000 0x0000000080000000
                vga.vram    4 KiB  0x0000000080060000 0x0000000001000000 0x0000000001000000
    /rom@etc/acpi/tables    4 KiB  0x00000000810b0000 0x0000000000020000 0x0000000000200000
                 pc.bios    4 KiB  0x0000000080000000 0x0000000000040000 0x0000000000040000
  0000:00:03.0/e1000.rom    4 KiB  0x0000000081070000 0x0000000000040000 0x0000000000040000
                  pc.rom    4 KiB  0x0000000080040000 0x0000000000020000 0x0000000000020000
    0000:00:02.0/vga.rom    4 KiB  0x0000000081060000 0x0000000000010000 0x0000000000010000
   /rom@etc/table-loader    4 KiB  0x00000000812b0000 0x0000000000001000 0x0000000000001000
      /rom@etc/acpi/rsdp    4 KiB  0x00000000812b1000 0x0000000000001000 0x0000000000001000

Ramblock is something hidden internally in QEMU implementation, and this
command should only be used by mostly QEMU developers on RAM stuff. It
is not a command suitable for QMP interface. So only HMP interface is
provided for it.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-4-git-send-email-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 17:31:16 +01:00
Peter Xu
22951aaaeb utils: provide size_to_str()
Moving the algorithm from print_type_size() into size_to_str() so that
other component can also leverage it. With that, refactor
print_type_size().

The assert() in that logic is removed though, since even UINT64_MAX
would not overflow.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-3-git-send-email-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 17:30:45 +01:00
Peter Xu
99e15582de ramblock: add RAMBLOCK_FOREACH()
So that it can simplifies the iterators.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-2-git-send-email-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 17:30:37 +01:00
Stefan Hajnoczi
897eee242b Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging
x86 and machine queue, 2017-05-17

# gpg: Signature made Wed 17 May 2017 02:37:54 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (22 commits)
  tests: Add [+-]feature and feature=on|off test cases
  s390-pcibus: No need to set user_creatable=false explicitly
  xen-sysdev: Remove user_creatable flag
  virtio-mmio: Remove user_creatable flag
  sysbus-ohci: Remove user_creatable flag
  hpet: Remove user_creatable flag
  generic-sdhci: Remove user_creatable flag
  esp: Remove user_creatable flag
  fw_cfg: Remove user_creatable flag
  unimplemented-device: Remove user_creatable flag
  isabus-bridge: Remove user_creatable flag
  allwinner-ahci: Remove user_creatable flag
  sysbus-ahci: Remove user_creatable flag
  kvmvapic: Remove user_creatable flag
  ioapic: Remove user_creatable flag
  kvmclock: Remove user_creatable flag
  pflash_cfi01: Remove user_creatable flag
  fdc: Remove user_creatable flag from sysbus-fdc & SUNW,fdtwo
  iommu: Remove FIXME comment about user_creatable=true
  xen-backend: Remove FIXME comment about user_creatable flag
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-17 16:34:35 +01:00
Eduardo Habkost
17e8f54126 tests: Add [+-]feature and feature=on|off test cases
Add test code to ensure features are enabled/disabled correctly in the
command-line. The test case use the "feature-words" and
"filtered-features" properties to check if the features were
enabled/disabled correctly.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170508183205.10884-1-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:15 -03:00
Eduardo Habkost
8ae5059df5 s390-pcibus: No need to set user_creatable=false explicitly
TYPE_S390_PCI_HOST_BRIDGE is a subclass of TYPE_PCI_HOST_BRIDGE,
which is a subclass of TYPE_SYS_BUS_DEVICE. TYPE_SYS_BUS_DEVICE
already sets user_creatable=false, so we don't require an
explicit user_creatable=false assignment in
s390_pcihost_class_init().

Cc: Alexander Graf <agraf@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Pierre Morel <pmorel@linux.vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-22-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
74c505c6fd xen-sysdev: Remove user_creatable flag
TYPE_XENSYSDEV is only used internally by xen_be_init(), and is
not supposed to be plugged/unplugged dynamically. Remove the
user_creatable flag from the device class.

Cc: Juergen Gross <jgross@suse.com>,
Cc: Peter Maydell <peter.maydell@linaro.org>,
Cc: Thomas Huth <thuth@redhat.com>
Cc: sstabellini@kernel.org
Cc: Markus Armbruster <armbru@redhat.com>,
Cc: Marcel Apfelbaum <marcel@redhat.com>,
Cc: Laszlo Ersek <lersek@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-21-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
ae3ac6caca virtio-mmio: Remove user_creatable flag
virtio-mmio needs to be wired and mapped by other device or board
code, and won't work with -device. Remove the user_creatable flag
from the device class.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <zhaoshenglong@huawei.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-20-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
8f04d26a4e sysbus-ohci: Remove user_creatable flag
sysbus-ohci needs to be mapped and wired by device or board code,
and won't work with -device. Remove the user_creatable flag from
the device class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-19-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
cae9d4cdd4 hpet: Remove user_creatable flag
hpet needs to be mapped and wired by the board code and won't
work with -device. Remove the user_creatable flag from the device
class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-18-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
bdbae0ef01 generic-sdhci: Remove user_creatable flag
generic-sdhci needs to be wired by other devices' code, so it
can't be used with -device. Remove the user_creatable flag from
the device class.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-17-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
f4afad3878 esp: Remove user_creatable flag
esp devices aren't going to work with -device, as they need IRQs
to be connected and mmio to be mapped (this is done by
esp_init()). Remove the user_creatable flag from the device
class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
731fec79ae fw_cfg: Remove user_creatable flag
fw_cfg won't work with -device, as:
* fw_cfg_init1() won't get called for the device;
* The device won't appear at /machine/fw_cfg, and won't work with
  the -fw_cfg command-line option.

Remove the user_creatable flag from the device class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-15-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
68aecefcd4 unimplemented-device: Remove user_creatable flag
unimplemented-device needs to be created and mapped using
create_unimplemented_device() (or equivalent code), and won't
work with -device. Remove the user_creatable flag from the device
class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-14-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
0980a1c0e2 isabus-bridge: Remove user_creatable flag
isabus-bridge needs to be created by isa_bus_new(), and won't
work with -device, as it won't create the TYPE_ISA_BUS bus
itself. Remove the user_creatable flag from the device class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-13-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
c803129ce5 allwinner-ahci: Remove user_creatable flag
allwinner-ahci needs its IRQ to be connected and mmio to be
mapped (this is done by the alwinner-a10 device realize method),
and won't work with -device. Remove the user_creatable flag from
the device class.

Cc: John Snow <jsnow@redhat.com>
Cc: qemu-block@nongnu.org
Cc: Beniamino Galvani <b.galvani@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-arm@nongnu.org
Cc: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-12-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
a081ab8f7b sysbus-ahci: Remove user_creatable flag
The sysbus-ahci devices are supposed to be created and wired by
code from other devices, like calxeda_init() and
xlnx_zynqmp_realize(), and won't work with -device. Remove the
user_creatable flag from the device class.

Cc: John Snow <jsnow@redhat.com>
Cc: qemu-block@nongnu.org
Cc: Rob Herring <robh@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
040e99686a kvmvapic: Remove user_creatable flag
The kvmvapic device is only usable when created by
apic_common_realize(), not using -device. Remove the
user_creatable flag from the device class.

Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-10-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
6c4672cae7 ioapic: Remove user_creatable flag
An ioapic device is already created by the q35 initialization
code, and using "-device ioapic" or "-device kvm-ioapic" will
always fail with "Only 1 ioapics allowed". Remove the
user_creatable flag from the ioapic device classes.

Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-9-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
642c1e0546 kvmclock: Remove user_creatable flag
kvmclock should be used by guests only when the appropriate CPUID
feature flags are set on the VCPU, and it is automatically
created by kvmclock_create() when those feature flags are set.
This means creating a kvmclock device using -device is useless.
Remove user_creatable from its device class.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-8-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
c1ce65f710 pflash_cfi01: Remove user_creatable flag
TYPE_CFI_PFLASH01 devices need to be mapped by
pflash_cfi01_register() (or equivalent) and can't be used with
-device. Remove user_creatable from the device class.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-7-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
8ca97a1ff7 fdc: Remove user_creatable flag from sysbus-fdc & SUNW,fdtwo
sysbus-fdc and SUNW,fdtwo devices need IRQs to be wired and mmio
to be mapped, and can't be used with -device. Unset
user_creatable on their device classes.

Cc: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
8ab5700ca0 iommu: Remove FIXME comment about user_creatable=true
amd-iommu and intel-iommu are really meant to be used with
-device, so they need user_creatable=true. Remove the FIXME
comment.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-5-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
950b31dd17 xen-backend: Remove FIXME comment about user_creatable flag
xen-backend can be plugged/unplugged dynamically when using the
Xen accelerator, so keep the user_creatable flag on the device
class and remove the FIXME comment.

Cc: Juergen Gross <jgross@suse.com>,
Cc: Peter Maydell <peter.maydell@linaro.org>,
Cc: Thomas Huth <thuth@redhat.com>
Cc: sstabellini@kernel.org
Cc: Markus Armbruster <armbru@redhat.com>,
Cc: Marcel Apfelbaum <marcel@redhat.com>,
Cc: Laszlo Ersek <lersek@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
e4f4fb1eca sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE
commit 33cd52b5d7 unset
cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
sysbus devices appear on "-device help" and lack the "no-user"
flag in "info qdm".

To fix this, we can set user_creatable=false by default on
TYPE_SYS_BUS_DEVICE, but this requires setting
user_creatable=true explicitly on the sysbus devices that
actually work with -device.

Fortunately today we have just a few has_dynamic_sysbus=1
machines: virt, pc-q35-*, ppce500, and spapr.

virt, ppce500, and spapr have extra checks to ensure just a few
device types can be instantiated:

* virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE.
* ppce500 supports only TYPE_ETSEC_COMMON.
* spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.

This patch sets user_creatable=true explicitly on those 4 device
classes.

Now, the more complex cases:

pc-q35-*: q35 has no sysbus device whitelist yet (which is a
separate bug). We are in the process of fixing it and building a
sysbus whitelist on q35, but in the meantime we can fix the
"-device help" and "info qdm" bugs mentioned above. Also, despite
not being strictly necessary for fixing the q35 bug, reducing the
list of user_creatable=true devices will help us be more
confident when building the q35 whitelist.

xen: We also have a hack at xen_set_dynamic_sysbus(), that sets
has_dynamic_sysbus=true at runtime when using the Xen
accelerator. This hack is only used to allow xen-backend devices
to be dynamically plugged/unplugged.

This means today we can use -device with the following 22 device
types, that are the ones compiled into the qemu-system-x86_64 and
qemu-system-i386 binaries:

* allwinner-ahci
* amd-iommu
* cfi.pflash01
* esp
* fw_cfg_io
* fw_cfg_mem
* generic-sdhci
* hpet
* intel-iommu
* ioapic
* isabus-bridge
* kvmclock
* kvm-ioapic
* kvmvapic
* SUNW,fdtwo
* sysbus-ahci
* sysbus-fdc
* sysbus-ohci
* unimplemented-device
* virtio-mmio
* xen-backend
* xen-sysdev

This patch adds user_creatable=true explicitly to those devices,
temporarily, just to keep 100% compatibility with existing
behavior of q35. Subsequent patches will remove
user_creatable=true from the devices that are really not meant to
user-creatable on any machine, and remove the FIXME comment from
the ones that are really supposed to be user-creatable. This is
being done in separate patches because we still don't have an
obvious list of devices that will be whitelisted by q35, and I
would like to get each device reviewed individually.

Cc: Alexander Graf <agraf@suse.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Beniamino Galvani <b.galvani@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Pierre Morel <pmorel@linux.vnet.ibm.com>
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-arm@nongnu.org
Cc: qemu-block@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Shannon Zhao <zhaoshenglong@huawei.com>
Cc: sstabellini@kernel.org
Cc: Thomas Huth <thuth@redhat.com>
Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[ehabkost: Small changes at sysbus_device_class_init() comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:01 -03:00
Eduardo Habkost
e90f2a8c3e qdev: Replace cannot_instantiate_with_device_add_yet with !user_creatable
cannot_instantiate_with_device_add_yet was introduced by commit
efec3dd631 to replace no_user. It was
supposed to be a temporary measure.

When it was introduced, we had 54
cannot_instantiate_with_device_add_yet=true lines in the code.
Today (3 years later) this number has not shrunk: we now have
57 cannot_instantiate_with_device_add_yet=true lines. I think it
is safe to say it is not a temporary measure, and we won't see
the flag go away soon.

Instead of a long field name that misleads people to believe it
is temporary, replace it a shorter and less misleading field:
user_creatable.

Except for code comments, changes were generated using the
following Coccinelle patch:

  @@
  expression DC;
  @@
  (
  -DC->cannot_instantiate_with_device_add_yet = false;
  +DC->user_creatable = true;
  |
  -DC->cannot_instantiate_with_device_add_yet = true;
  +DC->user_creatable = false;
  )

  @@
  typedef ObjectClass;
  expression dc;
  identifier class, data;
  @@
   static void device_class_init(ObjectClass *class, void *data)
   {
   ...
   dc->hotpluggable = true;
  +dc->user_creatable = true;
   ...
   }

  @@
  @@
   struct DeviceClass {
   ...
  -bool cannot_instantiate_with_device_add_yet;
  +bool user_creatable;
   ...
  }

  @@
  expression DC;
  @@
  (
  -!DC->cannot_instantiate_with_device_add_yet
  +DC->user_creatable
  |
  -DC->cannot_instantiate_with_device_add_yet
  +!DC->user_creatable
  )

Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-2-ehabkost@redhat.com>
[ehabkost: kept "TODO remove once we're there" comment]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-17 10:37:00 -03:00
Stefan Hajnoczi
599c9cb641 Merge remote-tracking branch 'sstabellini/tags/xen-20170516-tag' into staging
Xen 2017/05/16

# gpg: Signature made Tue 16 May 2017 08:18:32 PM BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg:                 aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* sstabellini/tags/xen-20170516-tag:
  xen: call qemu_set_cloexec instead of fcntl
  xen/9pfs: fix two resource leaks on error paths, discovered by Coverity
  configure: Remove -lxencall for Xen detection
  xen/mapcache: store dma information in revmapcache entries for debugging

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-17 14:03:35 +01:00
Stefan Hajnoczi
fefb28a471 Merge remote-tracking branch 'jtc/tags/block-pull-request' into staging
# gpg: Signature made Tue 16 May 2017 04:47:09 PM BST
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* jtc/tags/block-pull-request:
  curl: do not do aio_poll when waiting for a free CURLState
  curl: convert readv to coroutines
  curl: convert CURLAIOCB to byte values
  curl: split curl_find_state/curl_init_state
  curl: avoid recursive locking of BDRVCURLState mutex
  curl: never invoke callbacks with s->mutex held
  curl: strengthen assertion in curl_clean_state
  block: curl: Allow passing cookies via QCryptoSecret

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-17 13:52:07 +01:00
Juan Quintela
1bfe5f0586 migration: Move check_migratable() into qdev.c
The function is only used once, and nothing else in migration knows
about objects.  Create the function vmstate_device_is_migratable() in
savem.c that really do the bit that is related with migration.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
bac3b21218 migration: Move postcopy stuff to postcopy-ram.c
Yes, we don't have a good place to put that stuff.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
aa3544c371 migration: Move page_cache.c to migration/
It is only used by migration, so move it there.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
795c40b8bd migration: Create migration/blocker.h
This allows us to remove lots of includes of migration/migration.h

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
bb890ed551 ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO
Reflects better what it does now, and avoid confussions with
RAM_SAVE_FLAG_COMPRESS_PAGE.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
927d663819 migration: Pass Error ** argument to {save,load}_vmstate
This way we use the "normal" way of printing errors for hmp commands.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-05-17 12:04:59 +02:00
Juan Quintela
2bf3aa85f0 migration: Fix regression with compression threads
Compression threads got broken on commit

  commit 2479569466
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Tue Mar 21 11:45:01 2017 +0100

      ram: reorganize last_sent_block

On do_compress_ram_page() we use a different QEMUFile than the
migration one.  We need to pass it there.  The failure can be seen as:

(qemu) qemu-system-x86_64: Unknown combination of migration flags: 0
qemu-system-x86_64: error while loading state section id 3(ram)
qemu-system-x86_64: load of migration failed: Invalid argument

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
2017-05-17 12:04:59 +02:00
Stefano Stabellini
01cd90b641 xen: call qemu_set_cloexec instead of fcntl
Use the common utility function, which contains checks on return values
and first calls F_GETFD as recommended by POSIX.1-2001, instead of
manually calling fcntl.

CID: 1374831

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: groug@kaod.org
CC: aneesh.kumar@linux.vnet.ibm.com
CC: Eric Blake <eblake@redhat.com>
2017-05-16 11:51:25 -07:00
Stefano Stabellini
c0c24b9554 xen/9pfs: fix two resource leaks on error paths, discovered by Coverity
CID: 1374836

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: groug@kaod.org
CC: aneesh.kumar@linux.vnet.ibm.com
2017-05-16 11:50:30 -07:00
Anthony PERARD
d9506cab36 configure: Remove -lxencall for Xen detection
QEMU does not depends on libxencall, it was added because it was a
missing link dependency of libxendevicemodel, but now the later should
be built properly.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-05-16 11:49:46 -07:00
Stefano Stabellini
1ff7c5986a xen/mapcache: store dma information in revmapcache entries for debugging
The Xen mapcache is able to create long term mappings, they are called
"locked" mappings. The third parameter of the xen_map_cache call
specifies if a mapping is a "locked" mapping.

>From the QEMU point of view there are two kinds of long term mappings:

[a] device memory mappings, such as option roms and video memory
[b] dma mappings, created by dma_memory_map & friends

After certain operations, ballooning a VM in particular, Xen asks QEMU
kindly to destroy all mappings. However, certainly [a] mappings are
present and cannot be removed. That's not a problem as they are not
affected by balloonning. The *real* problem is that if there are any
mappings of type [b], any outstanding dma operations could fail. This is
a known shortcoming. In other words, when Xen asks QEMU to destroy all
mappings, it is an error if any [b] mappings exist.

However today we have no way of distinguishing [a] from [b]. Because of
that, we cannot even print a decent warning.

This patch introduces a new "dma" bool field to MapCacheRev entires, to
remember if a given mapping is for dma or is a long term device memory
mapping. When xen_invalidate_map_cache is called, we print a warning if
any [b] mappings exist. We ignore [a] mappings.

Mappings created by qemu_map_ram_ptr are assumed to be [a], while
mappings created by address_space_map->qemu_ram_ptr_length are assumed
to be [b].

The goal of the patch is to make debugging and system understanding
easier.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
2017-05-16 11:49:09 -07:00
Paolo Bonzini
2bb5c936c5 curl: do not do aio_poll when waiting for a free CURLState
Instead, put the CURLAIOCB on a wait list and yield; curl_clean_state will
wake the corresponding coroutine.

Because of CURL's callback-based structure, we cannot easily convert
everything to CoMutex/CoQueue; keeping the QemuMutex is simpler.  However,
CoQueue is a simple wrapper around a linked list, so we can easily
use QSIMPLEQ and open-code a CoQueue, protected by the BDRVCURLState
QemuMutex instead of a CoMutex.

Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-8-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:50 -04:00
Paolo Bonzini
28256d8246 curl: convert readv to coroutines
This is pretty simple.  The bottom half goes away because, unlike
bdrv_aio_readv, coroutine-based read can return immediately without
yielding.  However, for simplicity I kept the former bottom half
handler in a separate function.

Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-7-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:50 -04:00
Paolo Bonzini
2125e5ea6e curl: convert CURLAIOCB to byte values
This is in preparation for the conversion from bdrv_aio_readv to
bdrv_co_preadv, and it also requires changing some of the size_t values
to uint64_t.  This was broken before for disks > 2TB, but now it would
break at 4GB.

Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-6-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:50 -04:00
Paolo Bonzini
3ce6a729b5 curl: split curl_find_state/curl_init_state
If curl_easy_init fails, a CURLState is left with s->in_use = 1.  Split
curl_init_state in two, so that we can distinguish the two failures and
call curl_clean_state if needed.

While at it, simplify curl_find_state, removing a dummy loop.  The
aio_poll loop is moved to the sole caller that needs it.

Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-5-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:45 -04:00
Gerd Hoffmann
cdece0467c block/win32: fix 'ret not initialized' warning
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170516074256.24731-1-kraxel@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:34:18 +01:00
Paolo Bonzini
456af34629 curl: avoid recursive locking of BDRVCURLState mutex
The curl driver has a ugly hack where, if it cannot find an empty CURLState,
it just uses aio_poll to wait for one to be empty.  This is probably
buggy when used together with dataplane, and the simplest way to fix it
is to use coroutines instead.

A more immediate effect of the bug however is that it can cause a
recursive call to curl_readv_bh_cb and recursively taking the
BDRVCURLState mutex.  This causes a deadlock.

The fix is to unlock the mutex around aio_poll, but for cleanliness we
should also take the mutex around all calls to curl_init_state, even if
reaching the unlock/lock pair is impossible.  The same is true for
curl_clean_state.

Reported-by: Kun Wei <kuwei@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170515100059.15795-4-pbonzini@redhat.com
Cc: qemu-stable@nongnu.org
Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:17 -04:00
Paolo Bonzini
34db05e7ff curl: never invoke callbacks with s->mutex held
All curl callbacks go through curl_multi_do, and hence are called with
s->mutex held.  Note that with comments, and make curl_read_cb drop the
lock before invoking the callback.

Likewise for curl_find_buf, where the callback can be invoked by the
caller.

Cc: qemu-stable@nongnu.org
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-3-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:17 -04:00
Paolo Bonzini
675a775633 curl: strengthen assertion in curl_clean_state
curl_clean_state should only be called after all AIOCBs have been
completed.  This is not so obvious for the call from curl_detach_aio_context,
so assert that.

Cc: qemu-stable@nongnu.org
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:34:03 -04:00
Gerd Hoffmann
612fc05ad2 fix mingw build failure
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170516052439.16214-1-kraxel@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:33:25 +01:00
Kamil Rytarowski
3c2bdbc1e4 maintainers: Add myself as a NetBSD reviewer
I volunteer to review NetBSD patches.
Adding myself will help to not miss some of them.

Restore NetBSD as a maintained host.

All patches to make qemu/pkgsrc building have been emitted to review.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Message-id: 20170513022143.2838-1-n54@gmx.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:33:06 +01:00
Peter Krempa
327c8ebd70 block: curl: Allow passing cookies via QCryptoSecret
Since cookies can contain sensitive data (session ID, etc ...) it is
desired to hide them from the prying eyes of users. Add a possibility to
pass them via the secret infrastructure.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1447413

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: f4a22cdebdd0bca6a13a43a2a6deead7f2ec4bb3.1493906281.git.pkrempa@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16 10:31:08 -04:00
Stefan Hajnoczi
96cd599818 Merge remote-tracking branch 'gkurz/tags/security-fix-for-2.10' into staging
Fix for CVE-2017-7493.

# gpg: Signature made Mon 15 May 2017 07:48:20 PM BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg:                 aka "Greg Kurz <groug@free.fr>"
# gpg:                 aka "Greg Kurz <gkurz@fr.ibm.com>"
# gpg:                 aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg:                 aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg:                 aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg:                 aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* gkurz/tags/security-fix-for-2.10:
  9pfs: local: forbid client access to metadata (CVE-2017-7493)

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:26:24 +01:00
Stefan Hajnoczi
6a8d834986 Merge remote-tracking branch 'aurel32/tags/pull-target-sh4-20170513' into staging
Queued target/sh4 patches

# gpg: Signature made Sat 13 May 2017 10:25:41 AM BST
# gpg:                using RSA key 0xBA9C78061DDD8C9B
# gpg: Good signature from "Aurelien Jarno <aurelien@aurel32.net>"
# gpg:                 aka "Aurelien Jarno <aurelien@jarno.fr>"
# gpg:                 aka "Aurelien Jarno <aurel32@debian.org>"
# Primary key fingerprint: 7746 2642 A9EF 94FD 0F77  196D BA9C 7806 1DDD 8C9B

* aurel32/tags/pull-target-sh4-20170513:
  target/sh4: use cpu_loop_exit_restore
  target/sh4: trap unaligned accesses
  target/sh4: movua.l is an SH4-A only instruction
  target/sh4: implement tas.b using atomic helper
  target/sh4: generate fences for SH4
  target/sh4: optimize gen_write_sr using extract op
  target/sh4: optimize gen_store_fpr64
  target/sh4: fold ctx->bstate = BS_BRANCH into gen_conditional_jump
  target/sh4: only save flags state at the end of the TB
  target/sh4: fix BS_EXCP exit
  target/sh4: fix BS_STOP exit
  target/sh4: move DELAY_SLOT_TRUE flag into a separate global
  target/sh4: do not include DELAY_SLOT_TRUE in the TB state
  target/sh4: get rid of DELAY_SLOT_CLEARME
  target/sh4: split ctx->flags into ctx->tbflags and ctx->envflags

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:26:17 +01:00
Stefan Hajnoczi
eba0161990 Merge remote-tracking branch 'rth/tags/pull-s390-20170512' into staging
Queued target/s390 patches

# gpg: Signature made Sat 13 May 2017 12:33:08 AM BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* rth/tags/pull-s390-20170512:
  target/s390x: implement serialization in BRANCH CONDITION
  target/s390x: fix SIGNAL PROCESSOR return value
  target/s390x: mask the SIGP order_code using SIGP_ORDER_MASK
  target/s390x: Use atomic operations for LOAD AND OP
  target/s390x: Use atomic operations for COMPARE SWAP
  target/s390x: Implement LOAD PAIR DISJOINT
  target/s390x: Diagnose specification exception for atomics
  target/s390x: Implement LOAD PROGRAM PARAMETER
  target/s390x: Implement STORE FACILITIES LIST EXTENDED

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-16 15:26:06 +01:00
Stefan Hajnoczi
8a813c9868 Merge remote-tracking branch 'kraxel/tags/pull-usb-20170512-1' into staging
usb: bugfixes, doc update

# gpg: Signature made Fri 12 May 2017 01:20:29 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-usb-20170512-1:
  hw/usb/dev-serial: Do not try to set vendorid or productid properties
  xhci: relax link check
  usb-hub: clear PORT_STAT_SUSPEND on wakeup
  xhci: fix logging
  usb-redir: fix stack overflow in usbredir_log_data
  qemu-doc: Update to use the new way of attaching USB devices

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:29:58 +01:00
Stefan Hajnoczi
384d9d554a Merge remote-tracking branch 'kraxel/tags/pull-ui-20170512-1' into staging
ui: add egl-headless
ui: some vnc cleanups
ui: absolute events for input-linux

# gpg: Signature made Fri 12 May 2017 12:50:07 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-ui-20170512-1:
  vnc: replace hweight_long() with ctpopl()
  vnc: simple clean up
  opengl: add egl-headless display
  egl: explicitly ask for core context
  egl-helpers: add missing error check
  egl-helpers: fix display init for x11
  egl-helpers: drop support for gles and debug logging
  virtio-gpu: move virtio_gpu_gl_block
  ui: input-linux: Add absolute event support
  ui: Support non-zero minimum values for absolute input axes

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:26:47 +01:00
Greg Kurz
7a95434e0c 9pfs: local: forbid client access to metadata (CVE-2017-7493)
When using the mapped-file security mode, we shouldn't let the client mess
with the metadata. The current code already tries to hide the metadata dir
from the client by skipping it in local_readdir(). But the client can still
access or modify it through several other operations. This can be used to
escalate privileges in the guest.

Affected backend operations are:
- local_mknod()
- local_mkdir()
- local_open2()
- local_symlink()
- local_link()
- local_unlinkat()
- local_renameat()
- local_rename()
- local_name_to_path()

Other operations are safe because they are only passed a fid path, which
is computed internally in local_name_to_path().

This patch converts all the functions listed above to fail and return
EINVAL when being passed the name of the metadata dir. This may look
like a poor choice for errno, but there's no such thing as an illegal
path name on Linux and I could not think of anything better.

This fixes CVE-2017-7493.

Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-15 15:20:57 +02:00
Stefan Hajnoczi
ba9915e1f8 Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging
x86 and machine queue, 2017-05-11

Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification

# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
  migration/i386: Remove support for pre-0.12 formats
  vmstatification: i386 FPReg
  migration/i386: Remove old non-softfloat 64bit FP support
  tests: check -numa node,cpu=props_list usecase
  numa: add '-numa cpu,...' option for property based node mapping
  numa: remove node_cpu bitmaps as they are no longer used
  numa: use possible_cpus for not mapped CPUs check
  machine: call machine init from wrapper
  numa: remove no longer need numa_post_machine_init()
  tests: numa: add case for QMP command query-cpus
  QMP: include CpuInstanceProperties into query_cpus output output
  virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: add check that board supports cpu_index to node mapping
  virt-arm: add node-id property to CPU
  pc: add node-id property to CPU
  spapr: add node-id property to sPAPR core
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:12:03 +01:00
Stefan Hajnoczi
43ad494c04 Merge remote-tracking branch 'kraxel/tags/pull-vga-20170511-1' into staging
make display updates thread safe, batch #2

# gpg: Signature made Thu 11 May 2017 03:41:51 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-vga-20170511-1:
  vga: fix display update region calculation
  sm501: make display updates thread safe
  tcx: make display updates thread safe
  cg3: make display updates thread safe

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:07:07 +01:00
Stefan Hajnoczi
2f77ec7390 Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170511' into staging
ppc patch queue for 2017-05-11

This pull request supersedes the one from yesterday (20170510), fixing
an important style bug in one patch, and adding an extra couple of
simple patches.

Highlights of this set:
  * Some fixes for POWER9
  * TCG support for POWER9 radix MMU
  * VGA rom for Mac machine types
  * Fixes for the XICS interrupt controller
  * MTTCG support for ppc targets

As suggested by Paolo, I've tried to add the Docker tests to my
standard pre-pull-request tests.  I haven't wholly suceeded; this has
been tested with some of the Docker images, but others I haven't
managed due to problems that as best I can tell are not due to
problems in this patch series.  I'll continue working on this for
future pull requests.  Specifically, 'travis', 'fedora', and 'centos6'
seem to work.  'min-glib' jammed while gtesting moxie, which seems
very unlikely to be caused by this series.  'ubuntu', 'debian' and
'debian-bootstrap' hit build errors almost immediately that look like
problems with the container configuration, and 'debian-*-cross' hit
build errors later on which also look like missing dependencies from
the container.

# gpg: Signature made Thu 11 May 2017 05:13:46 AM BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* dgibson/tags/ppc-for-2.10-20170511: (23 commits)
  target/ppc: Avoid printing wrong aliases in CPU help text
  pnv: Fix build failures on some host platforms
  target/ppc: Allow workarounds for POWER9 DD1
  spapr: Don't accidentally advertise HTM support on POWER9
  ppc: xics: fix compilation with CentOS 6
  target/ppc: Enable RADIX mmu mode for pseries TCG guest
  target/ppc: Implement ISA V3.00 radix page fault handler
  target/ppc: Change tlbie invalid fields for POWER9 support
  target/ppc: Update tlbie to check privilege level based on GTSE
  target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE
  ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs
  ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs
  Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule
  Add QemuMacDrivers as submodule
  ppc/xics: preserve P and Q bits for KVM IRQs
  ppc/xics: Fix stale irq->status bits after get
  target/ppc: do not reset reserve_addr in exec_enter
  tcg: enable MTTCG by default for PPC64 on x86
  cpus: Fix CPU unplug for MTTCG
  target/ppc: Generate fence operations
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:00:15 +01:00
Aurelien Jarno
57e2d417d3 target/sh4: use cpu_loop_exit_restore
Use cpu_loop_exit_restore when using cpu_restore_state and cpu_loop_exit
together.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:27 +02:00
Aurelien Jarno
34257c2117 target/sh4: trap unaligned accesses
SH4 requires that memory accesses are naturally aligned, except for the
SH4-A movua.l instructions which can do unaligned loads.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:27 +02:00
Aurelien Jarno
143021b26f target/sh4: movua.l is an SH4-A only instruction
At the same time change the comment describing the instruction the same
way than other instruction, so that the code is easier to read and search.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:27 +02:00
Aurelien Jarno
cb32f179e0 target/sh4: implement tas.b using atomic helper
We only emulate UP SH4, however as the tas.b instruction is used in the GNU
libc, this improve linux-user emulation.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:27 +02:00
Aurelien Jarno
aa3513176f target/sh4: generate fences for SH4
synco is a SH4-A only instruction.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:26 +02:00
Aurelien Jarno
a380f9db96 target/sh4: optimize gen_write_sr using extract op
This doesn't change the generated code on x86, but optimizes it on most
RISC architectures and makes the code simpler to read.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:26 +02:00
Aurelien Jarno
58d2a9aef4 target/sh4: optimize gen_store_fpr64
Using extr and avoiding intermediate temps.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:26 +02:00
Aurelien Jarno
b3995c23ed target/sh4: fold ctx->bstate = BS_BRANCH into gen_conditional_jump
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:22 +02:00
Aurelien Jarno
ac9707eaf6 target/sh4: only save flags state at the end of the TB
There is no need to save flags when entering and exiting the delay slot.
They can be saved only when reaching the end of the TB. If the TB is
interrupted before by an exception, they will be restored using
restore_state_to_opc.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:22 +02:00
Aurelien Jarno
632056651a target/sh4: fix BS_EXCP exit
In case of exception, there is no need to call tcg_gen_exit_tb as the
exception helper won't return.

Also fix a few cases where BS_BRANCH is called instead of BS_EXCP.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:22 +02:00
Aurelien Jarno
0fc37a8b0c target/sh4: fix BS_STOP exit
When stopping the translation because the state has changed, goto_tb
should not be used as it might link TB with different flags.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:18:13 +02:00
Aurelien Jarno
47b9f4d5a4 target/sh4: move DELAY_SLOT_TRUE flag into a separate global
Instead of using one bit of the env flags to store the condition of the
next delay slot, use a separate global. It simplifies reading and
writing the flags variable and also removes some confusion between
ctx->envflags and env->flags.

Note that the global is first transfered to a temp in order to be
able to discard the global before the brcond.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:17:29 +02:00
Aurelien Jarno
24b09d9d8b target/sh4: do not include DELAY_SLOT_TRUE in the TB state
DELAY_SLOT_TRUE is used as a dynamic condition for the branch after the
delay slot instruction. It is not used in code generation, so there is
no need to including in the TB state.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:17:29 +02:00
Aurelien Jarno
3968260811 target/sh4: get rid of DELAY_SLOT_CLEARME
Now that ctx->flags has been split, it becomes clear that
DELAY_SLOT_CLEARME has not impact on the code generation: in both case
ctx->envflags is cleared, either by clearing all the flags, or by
setting it to 0. This is left-over from pre-TCG era.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:17:29 +02:00
Aurelien Jarno
a6215749dc target/sh4: split ctx->flags into ctx->tbflags and ctx->envflags
There is a confusion (and not only in the SH4 target) between tb->flags,
env->flags and ctx->flags. To avoid it, split ctx->flags into
ctx->tbflags and ctx->envflags. ctx->tbflags stays unchanged during the
whole TB translation, while ctx->envflags evolves and is kept in sync
with env->flags using TCG instructions. ctx->envflags now only contains
the part that of env->flags that is contained in the TB state, i.e. the
DELAY_SLOT* flags.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-13 11:17:28 +02:00
Aurelien Jarno
538fad597d target/s390x: implement serialization in BRANCH CONDITION
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170509082800.10756-4-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:48:41 -07:00
Aurelien Jarno
1e8e69f08b target/s390x: fix SIGNAL PROCESSOR return value
The SIGNAL PROCESSOR helper returns its value through the CC register.
set_cc_static should be called just after the helper.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170509082800.10756-3-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:48:02 -07:00
Aurelien Jarno
a7c1fadf00 target/s390x: mask the SIGP order_code using SIGP_ORDER_MASK
For that move the definition from kvm.c to cpu.h

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170509082800.10756-2-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:47:13 -07:00
Richard Henderson
4dba4d6fef target/s390x: Use atomic operations for LOAD AND OP
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:47:13 -07:00
Richard Henderson
303a9ab887 target/s390x: Use atomic operations for COMPARE SWAP
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:47:13 -07:00
Eric Bischoff
1807aaa565 target/s390x: Implement LOAD PAIR DISJOINT
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Eric Bischoff <ebischoff@nerim.net>
Message-Id: <20170228120134.7921-1-ebischoff@suse.com>
[rth: Combine the two via insn->data; free the address temps.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:47:13 -07:00
Richard Henderson
44977a8fe7 target/s390x: Diagnose specification exception for atomics
All of the interlocked access facility instructions raise a
specification exception for unaligned accesses.  Do this by
using the (previously unused) unaligned_access hook.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:40:29 -07:00
Miroslav Benes
190b2422e6 target/s390x: Implement LOAD PROGRAM PARAMETER
Linux arch/s390/kernel/head(64).S uses LPP instruction if it is
available in facilities list provided by stfl/stfle instruction.
This is the case of newer z/System generations and their qemu
definition.

The description of LPP is at
http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Message-Id: <20170227085353.20787-1-mbenes@suse.cz>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:40:29 -07:00
Richard Henderson
5bf83628dc target/s390x: Implement STORE FACILITIES LIST EXTENDED
At the same time, improve STORE FACILITIES LIST
so that we don't hard-code the list for all cpus.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-05-12 15:40:29 -07:00
Stefan Hajnoczi
3a8760664d Merge tag 'tracing-pull-request' into staging
# gpg: Signature made Fri 12 May 2017 10:38:07 AM EDT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'tracing-pull-request':
  trace: add sanity check

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:39:35 -04:00
Stefan Hajnoczi
b54933eed5 Merge tag 'block-pull-request' into staging
# gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request':
  aio: add missing aio_notify() to aio_enable_external()
  block: Simplify BDRV_BLOCK_RAW recursion
  coroutine: remove GThread implementation

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:39:23 -04:00
Stefan Hajnoczi
3753e255da Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream: (58 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:39:08 -04:00
Anthony Xu
5651743c90 trace: add sanity check
If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count
returns 0, cause bitmap_new call abort.

The abort can be triggered as follows:

  $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu
  $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G
  (gdb) bt
  #0  0x00007ffff04e25f7 in raise () from /lib64/libc.so.6
  #1  0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6
  #2  0x00005555559de905 in bitmap_new (nbits=<optimized out>)
      at /home/root/git/qemu2.git/include/qemu/bitmap.h:96
  #3  cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399
  #4  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341
  #5  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341
  #6  0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560,
      type=type@entry=0x55555656bd30) at qom/object.c:376
  #7  0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484
  #8  0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu")
      at qom/object.c:494
  #9  0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0,
      errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101
  #10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690)
      at /home/root/git/qemu2.git/hw/i386/pc.c:1184
  #11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121
  #12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:37:40 -04:00
Stefan Hajnoczi
321d1dba8b aio: add missing aio_notify() to aio_enable_external()
The main loop uses aio_disable_external()/aio_enable_external() to
temporarily disable processing of external AioContext clients like
device emulation.

This allows monitor commands to quiesce I/O and prevent the guest from
submitting new requests while a monitor command is in progress.

The aio_enable_external() API is currently broken when an IOThread is in
aio_poll() waiting for fd activity when the main loop re-enables
external clients.  Incrementing ctx->external_disable_cnt does not wake
the IOThread from ppoll(2) so fd processing remains suspended and leads
to unresponsive emulated devices.

This patch adds an aio_notify() call to aio_enable_external() so the
IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.

The bug can be reproduced as follows:

  $ qemu -M accel=kvm -m 1024 \
         -object iothread,id=iothread0 \
         -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
         -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
         -device scsi-hd,id=scsi-hd0,drive=drive0 \
         -qmp tcp::5555,server,nowait

  $ scripts/qmp/qmp-shell localhost:5555
  (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
         mode=absolute-paths format=qcow2

After blockdev-snapshot-sync completes the SCSI disk will be
unresponsive.  This leads to request timeouts inside the guest.

Reported-by: Qianqian Zhu <qizhu@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170508180705.20609-1-stefanha@redhat.com
Suggested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:36:46 -04:00
Eric Blake
ee29d6adef block: Simplify BDRV_BLOCK_RAW recursion
Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.

Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170504173745.27414-1-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:36:46 -04:00
Daniel P. Berrange
33c53c54e4 coroutine: remove GThread implementation
The GThread implementation is not functional enough to actually
run QEMU reliably. While it was potentially useful for debugging,
we have a scripts/qemugdb/coroutine.py to enable tracing of
ucontext coroutines in GDB, so that removes the only reason for
GThread to exist.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12 10:36:46 -04:00
Thomas Huth
aa612b364e hw/usb/dev-serial: Do not try to set vendorid or productid properties
When starting QEMU with the legacy USB serial device like this:

 qemu-system-x86_64 -usbdevice serial:vendorid=0x1234:stdio

it currently aborts since the vendorid property does not exist
anymore (it has been removed by commit f29783f72e):

 Unexpected error in object_property_find() at qemu/qom/object.c:1008:
 qemu-system-x86_64: -usbdevice serial:vendorid=0x1234:stdio: Property
                     '.vendorid' not found
 Aborted (core dumped)

Fix this crash by issuing a more friendly error message instead
(and simplify the code also a little bit this way).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1493883704-27604-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-12 12:30:23 +02:00
Ladi Prosek
99f9aeba5d xhci: relax link check
The strict td link limit added by commit "05f43d4 xhci: limit the
number of link trbs we are willing to process" causes problems with
Windows guests. Let's raise the limit.

This change is analogous to:

  commit ab6b1105a2
  Author: Gerd Hoffmann <kraxel@redhat.com>
  Date:   Tue Mar 7 09:40:18 2017 +0100

      ohci: relax link check

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 20170512102100.22675-1-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-12 12:26:40 +02:00
Ladi Prosek
66849dcfbe usb-hub: clear PORT_STAT_SUSPEND on wakeup
The spec says:

  Suspend: (PORT_SUSPEND) This field indicates whether or not the device
  on this port is suspended. Setting this field causes the device to
  suspend by not propagating bus traffic downstream. This field may be
  reset by a request or by resume signaling from the device attached to
  the port.

I can't find any specific statement like "the PORT_SUSPEND field is reset
automatically on remote wakeup", but without this patch, the only way to
reset it is via the ClearPortFeature request so the ".. or by resume
signaling from the device" clause is clearly not implemented on the remote
wakeup path.

The default xhci Windows driver does not issue the ClearPortFeature request
and suspended devices attached to a hub don't properly get out of the
suspended state. Interestingly, the default uhci Windows driver *does*
issue the ClearPortFeature request and does not exhibit this problem.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 20170511125314.24549-3-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-12 12:26:40 +02:00
Ladi Prosek
ee56264af8 xhci: fix logging
slotid and epid were deleted from XHCITransfer in commit d6fcb29.
Also deleting one unused forward declaration.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 20170511125314.24549-2-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-12 12:26:40 +02:00
Gerd Hoffmann
bd4a683505 usb-redir: fix stack overflow in usbredir_log_data
Don't reinvent a broken wheel, just use the hexdump function we have.

Impact: low, broken code doesn't run unless you have debug logging
enabled.

Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509110128.27261-1-kraxel@redhat.com
2017-05-12 12:26:40 +02:00
Thomas Huth
a92ff8c123 qemu-doc: Update to use the new way of attaching USB devices
The preferred way of adding USB devices is via "-device" and
"device_add" nowadays, so let's start to get rid of "-usbdevice"
and "usb_add" in the documentation. While we're at it, also
add the new USB devices there which have been added to QEMU
during the last years, and get rid of the old "vendorid" and
"productid" parameters of "-usbdevice serial" which have been
removed in QEMU version 0.14.0 already.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1494256429-31720-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-12 12:26:40 +02:00
Dr. David Alan Gilbert
08b277ac46 migration/i386: Remove support for pre-0.12 formats
Remove support for versions of the CPU state prior to 11
which is the version used in qemu 0.12 - you'd be pretty
lucky if you got a migration stream to work from anything
that old anyway.  This doesn't affect the machine type
definition in any way.

My main reason for doing this is the hack for sysenter_esp/eip
that uses .get/.put's in state versions less than 7 (that's
prior to somewhere before 0.10).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170405190024.27581-4-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:51 -03:00
Dr. David Alan Gilbert
ab808276f8 vmstatification: i386 FPReg
Convert the fpreg save/restore to use VMSTATE_ macros rather than
.get/.put.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170405190024.27581-3-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Dr. David Alan Gilbert
46baa9007f migration/i386: Remove old non-softfloat 64bit FP support
Long long ago, we used to support storing the x86 FP registers in
a 64bit format.

Then c31da136a0 in v0.14-rc0 removed
the last support for writing that in the migration format.
Even before that, it was only used if you had softfloat disabled
 (i.e. !USE_X86LDOUBLE) so in practice use of it in even earlier
qemu is unlikely for most users.

Kill it off, it's complicated, and possibly broken.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170405190024.27581-2-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
2941020a47 tests: check -numa node,cpu=props_list usecase
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-19-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
419fcdec3c numa: add '-numa cpu,...' option for property based node mapping
legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.

This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).

At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.

Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.

For example:

   # exact mapping x86
   -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n

   # exact mapping SPAPR
   -numa cpu,node-id=x,core-id=y

   # wildcard mapping, all cpu objects that match socket-id=y
   # are mapped to node-id=x
   -numa cpu,node-id=x,socket-id=y

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
1171ae9a5b numa: remove node_cpu bitmaps as they are no longer used
Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.

Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
ec78f8114b numa: use possible_cpus for not mapped CPUs check
and remove corresponding part in numa.c that uses
node_cpu bitmaps.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-16-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
482dfe9a9e machine: call machine init from wrapper
add machine_run_board_init() wrapper that calls machine
init for now but in follow up patches it will be used
to run generic machine code that should run before
machine init.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-15-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
3b8a8557f7 numa: remove no longer need numa_post_machine_init()
CPUState::numa_node is still in use but now it's set by
board when it creates CPU objects. So there isn't any
need to set it again after all CPU's are created,
since it's been already set.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-14-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
6accfb7823 tests: numa: add case for QMP command query-cpus
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-13-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov
afed5a5a70 QMP: include CpuInstanceProperties into query_cpus output output
if board supports CpuInstanceProperties, report them for
each CPU thread listed. Main motivation for this is to
provide these properties introspection via QMP interface
for using in test cases to verify numa node to cpu mapping,
which includes not only boards that support cpu hotplug
and have this info in query-hotpluggable-cpus (pc/spapr)
but also for boards that don't not support hotpluggable-cpus
but support numa mapping (virt-arm).

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1494415802-227633-12-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
4ccf5826f9 virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-11-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
722387e78d spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
ea2650724c pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-9-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
af9b20e8d2 numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-8-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
7c88e65d9e numa: mirror cpu to node mapping in MachineState::possible_cpus
Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
64c2a8f6d3 numa: add check that board supports cpu_index to node mapping
Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-6-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
bd4c1bfe3e virt-arm: add node-id property to CPU
it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-5-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov
93b2a8cb0b pc: add node-id property to CPU
it will allow switching from cpu_index to property based
numa mapping in follow up patches.

PS:
patch changes default value of CPUState::numa_node from 0
to CPU_UNSET_NUMA_NODE_ID. The only place for x86 that
would affected is monitor's 'infor numa' command which
uses that field. However legacy 0 value is still preserved
by pc_cpu_pre_plug() in this patch if user/numa.c hasn't
set it explicitly, so there is no change in behavior.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
0b8497f08c spapr: add node-id property to sPAPR core
it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
ea089eebbd numa: move source of default CPUs to NUMA node mapping into boards
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
d9c34f9c6c hw/arm/virt: explicitly allocate cpu_index for cpus
Currently cpu_index is implicitly auto assigned during
cpu.realize() time cpu_exec_realizefn()->cpu_list_add().

It happens to match index in possible_cpus so take
control over it and make board initialize cpu_index
to possible_cpus index explicitly. It will at least
document that board is in control of it and when
'-device cpu' support comes it will keep cpu_index
stable regardless of order cpus are created so it won't
break migration.
Within this series it will be used for internal
conversion from storing cpu_index based NUMA node
bitmaps to property based mapping with possible_cpus,
And will allow map cpu_index to a CPU entry in
possible_cpus array.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1493816238-33120-5-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
17d3d0e2d9 hw/arm/virt: use machine->possible_cpus for storing possible topology info
for now precalculate and store mp_afinity in possible_cpus
as ARM cpus don't have socket/core/thread-id properties yet.
In follow patches possible_cpus will be used for storing
and setting NUMA node mapping and replace legacy bitmap
based numa_info[node_id].node_cpu/numa_get_node_for_cpu()

For the lack of better idea, this patch cannibalizes
possible_cpus.cpus[x].props.thread_id so that
*_cpu_index_to_props() callback could return addressable
by props CPU which will be used by machine_set_cpu_numa_node()
in follow up patches to assign a CPU to node. But
cannibalizing is fine for now as that thread_id isn't exposed
to users (no hotpluggable_cpus callback support for ARM yet)
and it will be used only internally until 'device_add cpu'
is supported where we can decide on which properties to use.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1493816238-33120-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
46de5913b6 hw/arm/virt: extract mp-affinity calculation in separate function
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1493816238-33120-3-git-send-email-imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov
63baf8bf01 tests: add CPUs to numa node mapping test
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1493816238-33120-2-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
He Chen
fda4096fca tests: acpi: extend cphp and memhp testcase with numa distance check
Signed-off-by: He Chen <he.chen@linux.intel.com>
Message-Id: <1493803036-4048-1-git-send-email-he.chen@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[ehabkost: regenerated tests/acpi-tst-data, included SLIT table]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Laurent Vivier
3bfe57165b numa: equally distribute memory on nodes
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:47 -03:00
He Chen
0f203430dd numa: Allow setting NUMA distance for different NUMA nodes
This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=3,val=21 \
```

Signed-off-by: He Chen <he.chen@linux.intel.com>
Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:37 -03:00
Laurent Vivier
ecc1f5adee maintainers: Add myself as linux-user reviewer
I volunteer to review linux-user patches.
Adding myself will help to not miss some of them.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Message-id: 20170510153950.29343-1-laurent@vivier.eu
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-11 13:31:11 -04:00
Daniel P. Berrange
4ed3d478c6 i386: rewrite way CPUID index is validated
Change the nested if statements into a flat format, to make
it clearer what validation / capping is being performed on
different CPUID index values.

NB this changes behaviour when "index > env->cpuid_xlevel2".
This won't have any guest-visible effect because no there is
no CPUID[0xC0000001] feature supported by TCG, and KVM code
will never call cpu_x86_cpuid() with such an index value.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170509132736.10071-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 10:54:04 -03:00
Kevin Wolf
d541e201bd Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block
Block patches for the block queue.

# gpg: Signature made Thu May 11 14:28:41 2017 CEST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-05-11: (22 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 14:34:56 +02:00
Max Reitz
8dd30c86dd MAINTAINERS: Add qemu-progress to the block layer
util/qemu-progress.c is currently unmaintained. The only user of its
functionality is qemu-img, so it effectively is part of the block layer.

Suggested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170428165517.30341-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
d2cb36af2b qcow2: Discard/zero clusters by byte count
Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-13-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
f10ee139ad qcow2: Assert that cluster operations are aligned
We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-12-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
fbaa6bb3d3 qcow2: Optimize write zero of unaligned tail cluster
We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes.  The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

However, note that for now, the special case of end-of-file is only
recognized if there is no backing file, or if the backing file has
the same length; that's because when the backing file is shorter
than the active layer, we don't have code in place to recognize
that reads of a sector unallocated at the top and beyond the backing
end-of-file are implicitly zero.  It's not much of a real loss,
because most people don't use images that aren't cluster-aligned,
or where the active layer is a different size than the backing
layer (especially where the difference falls within a single cluster).

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-11-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
e249d51952 iotests: Add test 179 to cover write zeroes with unmap
No tests were covering write zeroes with unmap.  Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.

The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters - in part, the largest consecutive allocation
we tend to get is often bounded by the size covered by one L2
table).

Note that testing for zero clusters is tricky: 'qemu-io map'
reports whether data comes from the current layer of the image
(useful for sniffing out which regions of the file have
QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
while 'qemu-img map' sees "zero":true for both unallocated and
zero clusters for any qcow2 with no backing layer (so less useful
at detecting true zero clusters), but reliably shows mappings.
So we have to rely on both queries side-by-side at each point of
the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-10-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
d9ca2214bd iotests: Improve _filter_qemu_img_map
Although _filter_qemu_img_map documents that it scrubs offsets, it
was only doing so for human mode.  Of the existing tests using the
filter (97, 122, 150, 154, 176), two of them are affected, but it
does not hurt the validity of the tests to not require particular
mappings (another test, 66, uses offsets but intentionally does not
pass through _filter_qemu_img_map, because it checks that offsets
are unchanged before and after an operation).

Another justification for this patch is that it will allow a future
patch to utilize 'qemu-img map --output=json' to check the status of
preallocated zero clusters without regards to the mapping (since
the qcow2 mapping can be very sensitive to the chosen cluster size,
when preallocation is not in use).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-9-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
06cc5e2b2d qcow2: Optimize zero_single_l2() to minimize L2 churn
Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.

Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.

Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated.  But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request.  Thus, this patch is intentionally limited to just clusters
that are already marked as zero.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-8-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
fdfab37dfe qcow2: Make distinction between zero cluster types obvious
Treat plain zero clusters differently from allocated ones, so that
we can simplify the logic of checking whether an offset is present.
Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.

I tried to arrange the enum so that we could use
'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
I didn't actually end up taking advantage of the layout.

In many cases, this leads to simpler code, by properly combining
cases (sometimes, both zero types pair together, other times,
plain zero is more like unallocated while allocated zero is more
like normal).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-7-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:07 +02:00
Eric Blake
3ef9521893 qcow2: Name typedef for cluster type
Although it doesn't add all that much type safety (this is C, after
all), it does add a bit of legibility to use the name QCow2ClusterType
instead of a plain int.

In particular, qcow2_get_cluster_offset() has an overloaded return
type; a QCow2ClusterType on success, and -errno on failure; keeping
the cluster type in a separate variable makes it slightly easier for
the next patch to make further computations based on the type.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-6-eblake@redhat.com
[mreitz: Use the new type in two more places (one of them pulled from
         the next patch)]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
4341df8a83 qcow2: Correctly report status of preallocated zero clusters
We were throwing away the preallocation information associated with
zero clusters.  But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

count_contiguous_clusters_by_type() is now used only for unallocated
cluster runs, hence it gets renamed and tightened.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json.  The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-5-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
4c41cb4955 block: Update comments on BDRV_BLOCK_* meanings
We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly.  Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back a specific
BDS (not necessarily bs->file).  As the 8-way table is the
intended semantics, simplify the rest of the text to get rid of
the confusion.

ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it).  RAW is used only internally, but
by more than the raw driver.  Document these additional items on
the driver callback.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-4-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
bbd995d830 qcow2: Use consistent switch indentation
Fix a couple of inconsistent indentations, before an upcoming
patch further tweaks the switch statements.
(best viewed with 'git diff -b').

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-3-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
b32cbae111 qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
In order to keep checkpatch happy when the next patch changes
indentation, we first have to shorten some long lines.  The easiest
approach is to use a new variable in place of
'offset & L2E_OFFSET_MASK', except that 'offset' is the best name
for that variable.  Change '[old_]offset' to '[old_]entry' to
make room.

While touching things, also fix checkpatch warnings about unusual
'for' statements.

Suggested by Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-2-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
40812d9373 tests: Add coverage for recent block geometry fixes
Use blkdebug's new geometry constraints to emulate setups that
have needed past regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-10-eblake@redhat.com
[mreitz: For cooperation with image locking, add -r to the qemu-io
         invocation which verifies the image content]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
430b26a82d blkdebug: Add ability to override unmap geometries
Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-9-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
3dc834f879 blkdebug: Simplify override logic
Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Likewise, setting
a sane errno value in ret prior to the sequence of parsing and
jumping to out: on error makes it easier for the next patch to
add a chain of similar checks.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-8-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
63188c2450 blkdebug: Add pass-through write_zero and discard support
In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-7-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
d157ed5f72 blkdebug: Refactor error injection
Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-6-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
e0ef439588 blkdebug: Sanity check block layer guarantees
Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-5-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
6f3c90af3c qemu-io: Switch 'map' output to byte-based reporting
Mixing byte offset and sector allocation counts is a bit
confusing.  Also, reporting n/m sectors, where m decreases
according to the remaining size of the file, isn't really
adding any useful information; and reporting an offset at
both the front and end of the line, with large amounts of
whitespace, is pointless.  Update the output to use byte
counts and shorter lines, then adjust the affected tests
(./check -qcow2 102, ./check -vpc 146).

Note that 'qemu-io map' is MUCH weaker than 'qemu-img map';
the former only shows which regions of the active layer are
allocated, without regards to where the allocation comes from
or whether the allocated portion is known to read as zero
(because it is using the weaker bdrv_is_allocated()); while the
latter (especially in --output=json mode) reports more details
from bdrv_get_block_status().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-4-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Eric Blake
4401fdc77c qemu-io: Switch 'alloc' command to byte-based length
For the 'alloc' command, accepting an offset in bytes but a length
in sectors, and reporting output in sectors, is confusing.  Do
everything in bytes, and adjust the expected output accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-3-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:05 +02:00
Eric Blake
1bce6b4ce3 qemu-io: Improve alignment checks
Several copy-and-pasted alignment checks exist in qemu-io, which
could use some minor improvements:

- Manual comparison against 0x1ff is not as clean as using our
alignment macros (QEMU_IS_ALIGNED) from osdep.h.

- The error messages aren't quite grammatically correct.

Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-2-eblake@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:05 +02:00
John Snow
698bdfa07d blockdev: use drained_begin/end for qmp_block_resize
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551

If one tries to issue a block_resize while a guest is busy
accessing the disk, it is possible that qemu may deadlock
when invoking aio_poll from both the main loop and the iothread.

Replace another instance of bdrv_drain_all that doesn't
quite belong.

Cc: qemu-stable@nongnu.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Christoph Hellwig
c03e7ef12a nvme: Implement Write Zeroes
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: ported over from qemu-nvme.git to mainline]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Anton Nefedov
b91127edd0 qemu-img: wait for convert coroutines to complete
On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
22d5cd82e9 file-posix: Remove .bdrv_inactivate/invalidate_cache
Now that the block layer takes care to request a lot less permissions
for inactive nodes, the special-casing in file-posix isn't necessary any
more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
9c5e6594f1 block: Fix write/resize permissions for inactive images
Format drivers for inactive nodes don't need write/resize permissions on
their bs->file and can share write/resize with another VM (in fact, this
is the whole point of keeping images inactive). Represent this fact in
the op blocker system, so that image locking does the right thing
without special-casing inactive images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
38701b6aef block: Inactivate parents before children
The proper order for inactivating block nodes is that first the parents
get inactivated and then the children. If we do things in this order, we
can assert that we didn't accidentally leave a parent activated when one
of its child nodes is inactive.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
cfa1a5723f block: Drop permissions when migration completes
With image locking, permissions affect other qemu processes as well. We
want to be sure that the destination can run, so let's drop permissions
on the source when migration completes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
4417ab7adf block: New BdrvChildRole.activate() for blk_resume_after_migration()
Instead of manually calling blk_resume_after_migration() in migration
code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend
activation with cache invalidation into a single function. This is
achieved with a new callback in BdrvChildRole that is called by
bdrv_invalidate_cache_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf
ace21a5875 migration: Unify block node activation error handling
Migration code activates all block driver nodes on the destination when
the migration completes. It does so by calling
bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is
one code path for precopy and one for postcopy migration, resulting in
four function calls, which used to have three different failure modes.

This patch unifies the behaviour so that failure to activate all block
nodes is non-fatal, but the error message is logged and the VM isn't
automatically started. 'cont' will retry activating the block nodes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Max Reitz
aa93c834f9 iotests: Extend test 066
066 was supposed to be a test "for discarding preallocated zero
clusters", but it did so incompletely: While it did check the image
file's integrity after the operation, it did not confirm that the
clusters are indeed freed. This patch adds this test.

In addition, new cases for writing to preallocated zero clusters are
added.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Max Reitz
293073a56c qcow2: Discard preallocated zero clusters
In discard_single_l2(), we completely discard normal clusters instead of
simply turning them into preallocated zero clusters. That means we
should probably do the same with such preallocated zero clusters:
Discard them instead of keeping them allocated.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Max Reitz
564a6b6938 qcow2: Reuse preallocated zero clusters
Instead of just freeing preallocated zero clusters and completely
allocating them from scratch, reuse them.

We cannot do this in handle_copied(), however, since this is a COW
operation. Therefore, we have to add the new logic to handle_alloc() and
simply return the existing offset if it exists. The only catch is that
we have to convince qcow2_alloc_cluster_link_l2() not to free the old
clusters (because we have reused them).

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Max Reitz
92413c16be qcow2: Fix preallocation size formula
When calculating the number of reftable entries, we should actually use
the number of refblocks and not (wrongly[1]) re-calculate it.

[1] "Wrongly" means: Dividing the number of clusters by the number of
    entries per refblock and rounding down instead of up.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:24 +02:00
Fam Zheng
de9efdb334 tests: Add POSIX image locking test case 182
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 12:08:20 +02:00
Fam Zheng
ba8980784d qemu-iotests: Add test case 153 for image locking
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:15:32 +02:00
Fam Zheng
244a566810 file-posix: Add image locking to perm operations
This extends the permission bits of op blocker API to external using
Linux OFD locks.

Each permission in @perm and @shared_perm is represented by a locked
byte in the image file.  Requesting a permission in @perm is translated
to a shared lock of the corresponding byte; rejecting to share the same
permission is translated to a shared lock of a separate byte. With that,
we use 2x number of bytes of distinct permission types.

virtlockd in libvirt locks the first byte, so we do locking from a
higher offset.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:15:32 +02:00
Fam Zheng
e8c1094a0e osdep: Fall back to posix lock when OFD lock is unavailable
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:15:32 +02:00
Fam Zheng
13461fdba6 osdep: Add qemu_lock_fd and qemu_unlock_fd
They are wrappers of POSIX fcntl "file private locking", with a
convenient "try lock" wrapper implemented with F_OFD_GETLK.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:15:32 +02:00
Fam Zheng
fc0932fdcf block: Reuse bs as backing hd for drive-backup sync=none
Opening the backing image for the second time is bad, especially here
when it is also in use as the active image as the source. The
drive-backup job itself doesn't read from target->backing for COW,
instead it gets data from the write notifier, so it's not a big problem.
However, exporting the target to NBD etc. won't work, because of the
likely stale metadata cache.

Use BDRV_O_NO_BACKING in this case and manually set up the backing
BdrvChild.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:15:32 +02:00
Fam Zheng
9c77fec2d3 tests: Disable image lock in test-replication
The COLO block replication architecture requires one disk to be shared
between primary and secondary, in the test both processes use posix file
protocol (instead of over NBD) so it is affected by image locking.
Disable the lock.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:41 +02:00
Fam Zheng
1c3a555c35 file-win32: Error out if locking=on
We share the same set of QAPI options with file-posix, but locking is
not supported here. So error out if it is specified as 'on' for now.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:41 +02:00
Fam Zheng
16b48d5d66 file-posix: Add 'locking' option
Making this option available even before implementing it will let
converting tests easier: in coming patches they can specify the option
already when necessary, before we actually write code to lock the
images.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
2420d369a2 tests: Use null-co:// instead of /dev/null as the dummy image
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
7ceb4fc114 iotests: 172: Use separate images for multiple devices
To avoid image lock failures.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
8b084489b0 iotests: 091: Quit QEMU before checking image
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
d5b8336a62 iotests: 087: Don't attach test image twice
The test scenario doesn't require the same image, instead it focuses on
the duplicated node-name, so use null-co to avoid locking conflict.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
ecffa63421 iotests: 085: Avoid image locking conflict
In the case where we test the expected error when a blockdev-snapshot
target already has a backing image, the backing chain is opened multiple
times. This will be a problem when we use image locking, so use a
different backing file that is not already open.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
4797aeabdc iotests: 055: Don't attach the target image already for drive-backup
Double attach is not a valid usage of the target image, drive-backup
will open the blockdev itself so skip the add_drive call in this case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
55e5a3b65e iotests: 046: Prepare for image locking
The qemu-img info command is executed while VM is running, add -U option
to avoid the image locking error.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
aca7063a56 iotests: 030: Prepare for image locking
qemu-img and qemu-io commands when guest is running need "-U" option,
add it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
459571f7b2 qemu-io: Add --force-share option
Add --force-share/-U to program options and -U to open subcommand.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
a8d16f9ca2 qemu-img: Update documentation for -U
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
335e993784 qemu-img: Add --force-share option to subcommands
This will force the opened images to allow sharing all permissions with other
programs.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:08:40 +02:00
Fam Zheng
ffd1a5a25c block: Respect "force-share" in perm propagating
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:02:38 +02:00
Fam Zheng
5a9347c673 block: Add, parse and store "force-share" option
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:02:38 +02:00
Fam Zheng
5176196c32 block: Make bdrv_perm_names public
It can be used outside of block.c for making user friendly messages.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:02:38 +02:00
Gerd Hoffmann
bfc56535f7 vga: fix display update region calculation
vga display update mis-calculated the region for the dirty bitmap
snapshot in case the scanlines are padded.  This can triggere an
assert in cpu_physical_memory_snapshot_get_dirty().

Fixes: fec5e8c92b
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509104839.19415-1-kraxel@redhat.com
2017-05-11 09:50:32 +02:00
Gerd Hoffmann
ca7f544123 sm501: make display updates thread safe
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509111928.30935-1-kraxel@redhat.com
2017-05-11 09:50:29 +02:00
Mark Cave-Ayland
2dd285b5f3 tcx: make display updates thread safe
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 1494449551-20227-3-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-11 09:49:27 +02:00
Mark Cave-Ayland
344a68bf9d cg3: make display updates thread safe
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 1494449551-20227-2-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-05-11 09:49:27 +02:00
Thomas Huth
e9edd931eb target/ppc: Avoid printing wrong aliases in CPU help text
When running with KVM, we update the "family" CPU alias to point
to the right host CPU type, so that it for example possible to
use "-cpu POWER8" on a POWER8NVL host. However, the function for
printing the list of available CPU models is called earlier than
the KVM setup code, so the output of "-cpu help" is wrong in that
case. Since it would be somewhat ugly anyway to have different
help texts depending on whether "-enable-kvm" has been specified
or not, we should better always print the same text, so fix this
issue by printing "alias for preferred XXX CPU" instead.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
David Gibson
eaf87a3976 pnv: Fix build failures on some host platforms
This makes some changes to fix build failures on the 'min-glib' docker
image, and maybe other platforms with a buildchain that's less tolerant
about duplicated typedefs.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
David Gibson
5f3066d8b1 target/ppc: Allow workarounds for POWER9 DD1
POWER9 DD1 silicon has some bugs which mean it a) isn't really compliant
with the ISA v3.00 and b) require a number of special workarounds in the
kernel.

At the moment, qemu isn't aware of DD1.  For TCG we don't really want it to
be (why bother emulating buggy silicon).  But with KVM, the guest does need
to be aware of DD1 so it can apply the necessary workarounds.

Meanwhile, the feature negotiation between qemu and the guest strongly
favours architected compatibility modes to "raw" CPU modes.  In combination
with the above, this means the guest sees architected POWER9 mode, and
doesn't apply the DD1 workarounds.  Well, unless it has yet another
workaround to partially ignore what qemu tells it.

This patch addresses this by disabling support for compatibility modes when
using KVM on a POWER9 DD1 host.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
David Gibson
9bf502fe12 spapr: Don't accidentally advertise HTM support on POWER9
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.

However, this assumes that the HTM bit is off in the base template used for
the device tree value.  That is true for POWER8, but not for POWER9.

It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".

Fixes: 9fb4541f58

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-05-11 09:45:15 +10:00
Paolo Bonzini
5c6b487d67 ppc: xics: fix compilation with CentOS 6
The PowerPCCPU typedef is included twice if a file includes
both hw/ppc/xics.h and target/ppc/cpu-qom.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh
545d6e2b5c target/ppc: Enable RADIX mmu mode for pseries TCG guest
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.

In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.

A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9

Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh
d5fee0bbe6 target/ppc: Implement ISA V3.00 radix page fault handler
ISA V3.00 introduced a new radix mmu model. Implement the page fault
handler for this so we can run a tcg guest in radix mode and perform
address translation correctly.

In real mode (mmu turned off) addresses are masked to remove the top
4 bits and then are subject to partition scoped translation, since we only
support pseries at this stage it is only necessary to perform the masking
and then we're done.

In virtual mode (mmu turned on) address translation if performed as
follows:

1. Use the quadrant to determine the fully qualified address.

The fully qualified address is defined as the combination of the effective
address, the effective logical partition id (LPID) and the effective
process id (PID). Based on the quadrant (EA63:62) we set the pid and lpid
like so:

quadrant 0: lpid = LPIDR, pid = PIDR
quadrant 1: HV only (not allowed in pseries)
quadrant 2: HV only (not allowed in pseries)
quadrant 3: lpid = LPIDR, pid = 0

If we can't get the fully qualified address we raise a segment interrupt.

2. Find the guest radix tree

We ask the virtual hypervisor for the partition table which was registered
with H_REGISTER_PROC_TBL which points us to the process table in guest
memory. We then index this table by pid to get the process table entry
which points us to the appropriate radix tree to translate the address.

If the process table isn't big enough to contain an entry for the current
pid then we raise a storage interrupt.

3. Walk the radix tree

Next we walk the radix tree where each level is a table of page directory
entries indexed by some number of bits from the effective address, where
the number of bits is determined by the table size. We continue to walk
the tree (while entries are valid and the table is of minimum size) until
we reach a table of page table entries, indicated by having the leaf bit
set. The appropriate pte is then checked for sufficient access permissions,
the reference and change bits are updated and the real address is
calculated from the real page number bits of the pte and the low bits of
the effective address.

If we can't find an entry or can't access the entry bacause of permissions
then we raise a storage interrupt.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Add missing parentheses to macro]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh
c88305027d target/ppc: Change tlbie invalid fields for POWER9 support
The tlbie[l] instructions are used to invalidate TLB entries used to cache
address translations.

In ISAv3.00 (POWER9) more fields were added to the tblie[l] instructions
which were previously invalid. We don't care about any of these new fields
since we just invalidate the whole world anyway but we need to not
cause an illegal instruction exception when the instructions are called.
We also don't want to allow an older processor to have these fields set
since that would be invalid.

Add a new GEN_HANDLER for the ISAv3 instructions with the correct invalid
mask. These will only be generated to a POWER9 processor for now based on
the instruction flag. Also remove the PPC_MEM_TLBIE instruction flag from
the POWER9 processor definition to ensure the old tlbie isn't generated.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh
c6fd28fd57 target/ppc: Update tlbie to check privilege level based on GTSE
The Guest Translation Shootdown Enable (GTSE) bit in the Logical Partition
Control Register (LPCR) can be set to enable a guest to use the tlbie
instruction directly to invalidate translations.

When the GTSE bit is set then the tlbie instruction is supervisor
privileged, otherwise it is hypervisor privileged.

Add a guest translation shootdown enable (gtse) field to the diassembly
context and use this to check the correct privilege level at code
generation time.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh
6de833070c target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE
The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE
to choose determine how address translation is performed. Currently these
bits in the LPCR are only set for the cpu which handles the H_CALL, however
they need to be set for all cpus for that guest as address translation
cannot be performed differently on a per cpu basis.

Update the H_CALL handler to set these bits in the LPCR correctly for all
cpus of the guest.

Note it is the reponsibility of the guest to ensure that any secondary cpus
are suspended when the H_CALL is made and thus we can safely update these
values here.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Mark Cave-Ayland
53ecf09df3 ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Mark Cave-Ayland
b50de5cd77 ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Mark Cave-Ayland
fbe9214318 Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Mark Cave-Ayland
0806b30c8d Add QemuMacDrivers as submodule
The QemuMacDrivers project provides virtualisation drivers for PPC MacOS
guests.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Sam Bobroff
229e16fd24 ppc/xics: preserve P and Q bits for KVM IRQs
Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS
P/Q states") added new bits to the state used by KVM IRQs. Currently,
QEMU does not preserve these bits, so migrating (or otherwise saving
and restoring) the guest state causes the P and Q bits to be cleared.

Clearing the P bit has no effect, because the kernel will set it based
on other data, but the loss of a set Q bit will cause a lost
interrupt.

This patch preserves the P and Q bits, correcting the problem.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Sam Bobroff
063cb7cbc9 ppc/xics: Fix stale irq->status bits after get
ics_get_kvm_state() "or"s set bits into irq->status but does not mask
out clear bits.

Correct this by initializing the IRQ status to zero before adding bits
to it.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Nikunj A Dadhania
139d9023f1 target/ppc: do not reset reserve_addr in exec_enter
In case when atomic operation is not supported, exit_atomic is called
and we stop the world and execute the atomic operation. This results
in a following call chain:

tcg_gen_atomic_cmpxchg_tl()
  -> gen_helper_exit_atomic()
     -> HELPER(exit_atomic)
        -> cpu_loop_exit_atomic() -> EXCP_ATOMIC
           -> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC
              -> cpu_exec_step_atomic()
                 -> cpu_step_atomic()
                    -> cc->cpu_exec_enter() = ppc_cpu_exec_enter()
                       Sets env->reserve_addr = -1;

But by the time it return back, the reservation is erased and the code
fails, this continues forever and the lock is never taken.

Instead set this in powerpc_excp()

Now that ppc_cpu_exec_enter() doesn't have anything meaningful to do,
let us get rid of the function.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Nikunj A Dadhania
f0b0685d66 tcg: enable MTTCG by default for PPC64 on x86
This enables the multi-threaded system emulation by default for PPC64
guests using the x86_64 TCG back-end.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Bharata B Rao
a3e53273ad cpus: Fix CPU unplug for MTTCG
Ensure that the unplugged CPU thread is destroyed and the waiting
thread is notified about it. This is needed for CPU unplug to work
correctly in MTTCG mode.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Nikunj A Dadhania
4771df23ed target/ppc: Generate fence operations
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:14 +10:00
Nikunj A Dadhania
7f9af1abdc cputlb: handle first atomic write to the page
In case where the conditional write is the first write to the page,
TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as
a special case and set the dirty bit. After that fall through to the
actual atomic instruction below.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:14 +10:00
Nikunj A Dadhania
253ce7b2cf target/ppc: Emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can suffer from
the ABA problem. However, portable parallel code is written assuming
only cmpxchg which means that in practice this is a viable alternative.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:14 +10:00
Cédric Le Goater
a1a636b8b4 ppc/pnv: restrict BMC object to the BMC simulator
Today, when a PowerNV guest runs, it uses the sensor definitions of
the BMC simulator to populate the device tree. But an external IPMI
BMC could also be used and, in that case, it is not (yet) possible to
retrieve the sensor list. Generating the OEM SEL event for shutdown or
reboot also does not make sense as it should be generated on the BMC
side.

This change allows a guest to use an 'ipmi-bmc-extern' backend to the
'isa-ipmi-bt' device and a 'chardev' for transport such as :

	-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
	-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
	-device isa-ipmi-bt,bmc=bmc0,irq=10

and connect to a BMC simulator, the OpenIPMI ipmi_sim simulator for
instance.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:14 +10:00
Michael S. Tsirkin
8b12e48950 acpi-defs: clean up open brace usage
patchew has been saying:
ERROR: open brace '{' following struct go on the same line

Fix up acpi-defs.h to follow this rule.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Bruce Rogers
153eba4726 ACPI: don't call acpi_pcihp_device_plug_cb on xen
Commit f0c9d64a exposed the issue that with a xenfv machine using
pci passthrough, acpi pci hotplug code was being executed by mistake.
Guard calls to acpi_pcihp_device_plug_cb (and corresponding
acpi_pcihp_device_unplug_cb) with a check for xen_enabled(). Without
this check I am seeing an error that the bus doesn't have the
acpi-pcihp-bsel property set.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Eduardo Habkost
ef0e8fc768 iommu: Don't crash if machine is not PC_MACHINE
Currently it's possible to crash QEMU using "-device *-iommu" and
"-machine none":

  $ qemu-system-x86_64 -machine none -device amd-iommu
  qemu/hw/i386/amd_iommu.c:1140:amdvi_realize: Object 0x55627dafbc90 is not an instance of type generic-pc-machine
  Aborted (core dumped)
  $ qemu-system-x86_64 -machine none -device intel-iommu
  qemu/hw/i386/intel_iommu.c:2972:vtd_realize: Object 0x56292ec0bc90 is not an instance of type generic-pc-machine
  Aborted (core dumped)

Fix amd-iommu and intel-iommu to ensure the current machine is really a
TYPE_PC_MACHINE instance at their realize methods.

Resulting error messages:

  $ qemu-system-x86_64 -machine none -device amd-iommu
  qemu-system-x86_64: -device amd-iommu: Machine-type 'none' not supported by amd-iommu
  $ qemu-system-x86_64 -machine none -device intel-iommu
  qemu-system-x86_64: -device intel-iommu: Machine-type 'none' not supported by intel-iommu

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Peter Xu
465238d9f8 pc: add 2.10 machine type
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Igor Mammedov
98e753a6e5 pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot
Since 2.7 commit (b2a575a Add optionrom compatible with fw_cfg DMA version)
regressed migration during firmware exection time by
abusing fwcfg.dma_enabled property to decide loading
dma version of option rom AND by mistake disabling DMA
for 2.6 and earlier globally instead of only for option rom.

so 2.6 machine type guest is broken when it already runs
firmware in DMA mode but migrated to qemu-2.7(pc-2.6)
at that time;

a) qemu-2.6:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport)
b) qemu-2.7:pc2.6 (fwcfg.dma=off,firmware=ioport,oprom=ioport)

  to:   a     b
from
a       OK   FAIL
b       OK   OK

So we currently have broken forward migration from
qemu-2.6 to qemu-2.[789] that however could be fixed
for 2.10 by re-enabling DMA for 2.[56] machine types
and allowing dma capable option rom only since 2.7.
As result qemu should end up with:

c) qemu-2.10:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport)

   to:  a     b    c
from
a      OK   FAIL  OK
b      OK   OK    OK
c      OK   FAIL  OK

where forward migration from qemu-2.6 to qemu-2.10 should
work again leaving only qemu-2.[789]:pc-2.6 broken.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Marc-André Lureau
640601c7cb libvhost-user: fix crash when rings aren't ready
Calling libvhost-user functions like vu_queue_get_avail_bytes() when the
queue doesn't yet have addresses will result in the crashes like the
following:

Program received signal SIGSEGV, Segmentation fault.
0x000055c414112ce4 in vring_avail_idx (vq=0x55c41582fd68, vq=0x55c41582fd68)
    at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:940
940            vq->shadow_avail_idx = vq->vring.avail->idx;
(gdb) p vq
$1 = (VuVirtq *) 0x55c41582fd68
(gdb) p vq->vring
$2 = {num = 0, desc = 0x0, avail = 0x0, used = 0x0, log_guest_addr = 0, flags = 0}

    at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:940
No locals.
    at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:960
        num_heads = <optimized out>
    out_bytes=out_bytes@entry=0x7fffd035d7c4, max_in_bytes=max_in_bytes@entry=0,
    max_out_bytes=max_out_bytes@entry=0) at /home/dgilbert/git/qemu/contrib/libvhost-user/libvhost-user.c:1034

Add a pre-condition checks on vring.avail before accessing it.

Fix documentation and return type of vu_queue_empty() while at it.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Zhiyong Yang
60cd11024f hw/virtio: fix vhost user fails to startup when MQ
Qemu2.7~2.9 and vhost user for dpdk 17.02 release work together
to cause failures of new connection when negotiating to set MQ.
(one queue pair works well).
   Because there exist some bugs in qemu code when introducing
VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. When vhost_user_set_mem_table
is invoked to deal with the vhost message VHOST_USER_SET_MEM_TABLE
for the second time, qemu indeed doesn't send the messge (The message
needs to be sent only once)but still will be waiting for dpdk's reply
ack, then, qemu is always freezing, while DPDK is always waiting for
next vhost message from qemu.
  The patch aims to fix the bug, MQ can work well.
  The same bug is found in function vhost_user_net_set_mtu, it is fixed
at the same time.
  DPDK related patch is as following:
  http://www.dpdk.org/dev/patchwork/patch/23955/

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Cc: qemu-stable@nongnu.org
Fixes: ca525ce561 ("vhost-user: Introduce a new protocol feature REPLY_ACK.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jens Freimann <jfreiman@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-05-10 22:04:23 +03:00
Ard Biesheuvel
cb51ac2ffe hw/arm/virt: generate 64-bit addressable ACPI objects
Our current ACPI table generation code limits the placement of ACPI
tables to 32-bit addressable memory, in order to be able to emit the
root pointer (RSDP) and root table (RSDT) using table types from the
ACPI 1.0 days.

Since ARM was not supported by ACPI before version 5.0, it makes sense
to lift this restriction. This is not crucial for mach-virt, which is
guaranteed to have some memory available below the 4 GB mark, but it
is a nice to have for QEMU machines that do not have any 32-bit
addressable memory, which is not uncommon for real world 64-bit ARM
systems.

Since we already emit a version of the RSDP root pointer that has a
secondary 64-bit wide address field for the 64-bit root table (XSDT),
all we need to do is replace the RSDT generation with the generation
of an XSDT table, and use a different slot in the FADT table to refer
to the DSDT.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
2017-05-10 22:04:23 +03:00
Ard Biesheuvel
5ee8534731 hw/acpi-defs: replace leading X with x_ in FADT field names
At the request of Michael, replace the leading capital X in the FADT
field name Xfacs and Xdsdt with lower case x + underscore.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10 22:04:23 +03:00
Stefan Hajnoczi
f465706e59 Merge remote-tracking branch 'mjt/tags/trivial-patches-fetch' into staging
trivial patches for 2017-05-10

# gpg: Signature made Wed 10 May 2017 03:19:30 AM EDT
# gpg:                using RSA key 0x701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* mjt/tags/trivial-patches-fetch: (23 commits)
  tests: Remove redundant assignment
  MAINTAINERS: Update paths for AioContext implementation
  MAINTAINERS: Update paths for main loop
  jazz_led: fix bad snprintf
  tests: Ignore another built executable (test-hmp)
  scripts: Switch to more portable Perl shebang
  scripts/qemu-binfmt-conf.sh: Fix shell portability issue
  virtfs: allow a device id to be specified in the -virtfs option
  hw/core/generic-loader: Fix crash when running without CPU
  virtio-blk: Remove useless condition around g_free()
  qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip
  use _Static_assert in QEMU_BUILD_BUG_ON
  channel-file: fix wrong parameter comments
  block: Make 'replication_state' an enum
  util: Use g_malloc/g_free in envlist.c
  qga: fix compiler warnings (clang 5)
  device_tree: fix compiler warnings (clang 5)
  usb-ccid: make ccid_write_data_block() cope with null buffers
  tests: Ignore more test executables
  Add 'none' as type for drive's if option
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-10 12:31:19 -04:00
Stefan Hajnoczi
1effe6ad5e Merge remote-tracking branch 'danpb/tags/pull-qcrypto-2017-05-09-1' into staging
Merge qcrypto 2017/05/09 v1

# gpg: Signature made Tue 09 May 2017 09:43:47 AM EDT
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* danpb/tags/pull-qcrypto-2017-05-09-1:
  crypto: qcrypto_random_bytes() now works on windows w/o any other crypto libs
  crypto: move 'opaque' parameter to (nearly) the end of parameter list
  List SASL config file under the cryptography maintainer's realm
  Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-10 11:22:13 -04:00
Fam Zheng
e1ae9fb6c2 tests: Remove redundant assignment
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Paolo Bonzini
36c697bda5 MAINTAINERS: Update paths for AioContext implementation
Moved by c2b38b2
("block: move AioContext, QEMUTimer, main-loop to libqemuutil")

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Paolo Bonzini
3ecb29a328 MAINTAINERS: Update paths for main loop
Moved by c2b38b2 ("block: move AioContext, QEMUTimer, main-loop to
libqemuutil"), let's update MAINTAINERS too.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Paolo Bonzini
e9c6ab62c7 jazz_led: fix bad snprintf
Detected by GCC 7's -Wformat-truncation.  snprintf writes at most
2 bytes here including the terminating NUL, so the result is
truncated.  In addition, the newline at the end is pointless.
Fix the buffer size and the format string.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Eric Blake
fafa2e6702 tests: Ignore another built executable (test-hmp)
Commit 78f86a2b7 added a new test, but forgot to exclude the built
binary from version control.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Kamil Rytarowski
b7d5a9c2c6 scripts: Switch to more portable Perl shebang
The default NetBSD package manager is pkgsrc and it installs Perl
along other third party programs under custom and configurable prefix.
The default prefix for binary prebuilt packages is /usr/pkg, and the
Perl executable lands in /usr/pkg/bin/perl.

This change switches "/usr/bin/perl" to "/usr/bin/env perl" as it's
the most portable solution that should work for almost everybody.
Perl's executable is detected automatically.

This change switches -w option passed to the executable with more
modern "use warnings;" approach. There is no functional change to the
default behavior.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:24 +03:00
Kamil Rytarowski
6f75023ab8 scripts/qemu-binfmt-conf.sh: Fix shell portability issue
Appease pkgsrc and use portable shell variable comparison.
This switches "==" to "=". It should not be a functional change.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
Chris Webb
3baa0a6a65 virtfs: allow a device id to be specified in the -virtfs option
When using a virtfs root filesystem, the mount_tag needs to be set to
/dev/root. This can be done long-hand as

  -fsdev local,id=root,path=/path/to/rootfs,...
  -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root

but the -virtfs shortcut cannot be used as it hard-codes the device identifier
to match the mount_tag, and device identifiers may not contain '/':

  $ qemu-system-x86_64 -virtfs local,path=/foo,mount_tag=/dev/root,security_model=passthrough
  qemu-system-x86_64: -virtfs local,path=/foo,mount_tag=/dev/root,security_model=passthrough: duplicate fsdev id: /dev/root

To support this case using -virtfs, we allow the device identifier to be
specified explicitly when the mount_tag is not suitable:

  -virtfs local,id=root,path=/path/to/rootfs,mount_tag=/dev/root,...

Signed-off-by: Chris Webb <chris@arachsys.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
Thomas Huth
6516367fc0 hw/core/generic-loader: Fix crash when running without CPU
When running QEMU with "-M none -device loader,file=kernel.elf", it
currently crashes with a segmentation fault, because the "none"-machine
does not have any CPU by default and the generic loader code tries
to dereference s->cpu. Fix it by adding an appropriate check for a
NULL pointer.

Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
Fam Zheng
1d29b5b049 virtio-blk: Remove useless condition around g_free()
Laszlo spotted and studied this wasteful "if". He pointed out:

The original virtio_blk_free_request needed an "if" as it accesses one
field, since 671ec3f056 ("virtio-blk: Convert VirtIOBlockReq.elem to
pointer", 2014-06-11); later on in f897bf751f ("virtio-blk: embed
VirtQueueElement in VirtIOBlockReq", 2014-07-09) the field became
embedded, so the "if" became unnecessary (at which point we were using
g_slice_free(), but it is the same.

Now drop it.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
Thomas Huth
3ba34a7022 qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip
There are some broken URLs in the qemu-doc which reference tools that
are not available at their original location anymore. Fortunately, they
have been mirrored to archive.org, so point to that location instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
Andreas Grapentin
09d352042f use _Static_assert in QEMU_BUILD_BUG_ON
QEMU_BUILD_BUG_ON should use C11's _Static_assert, if the compiler supports it,
to provide more readable messages on failure.

We check for _Static_assert in configure, and set CONFIG_STATIC_ASSERT
accordingly. QEMU_BUILD_BUG_ON invokes _Static_assert if CONFIG_STATIC_ASSERT
is defined, and reverts to the old way otherwise.

That way, systems without C11 conforming compiler will still have the old
messages, as verified by intentionally breaking the configure check.

the following example output was generated by inverting the condition in
QEMU_BUILD_BUG_ON:

without _Static_assert:

> In file included from /qemu/include/qemu/osdep.h:36:0,
>                  from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:89:12: error: negative width in bit-field ‘<anonymous>’
>      struct { \
>             ^
> /qemu/include/qemu/compiler.h:96:38: note: in expansion of macro  QEMU_BUILD_BUG_ON_STRUCT’
>  #define QEMU_BUILD_BUG_ON(x) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
>                                       ^~~~~~~~~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
>      QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
>      ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
>      atomic_load_acquire(ptr)
>      ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
>      bool finished = atomic_mb_read(&gei->finished);
>                      ^~~~~~~~~~~~~~

with _Static_assert:

> In file included from /qemu/include/qemu/osdep.h:36:0,
>                  from /qemu/qga/commands.c:13:
> /qemu/qga/commands.c: In function ‘qmp_guest_exec_status’:
> /qemu/include/qemu/compiler.h:94:30: error: static assertion failed: "not expecting: sizeof(*&gei->finished) > sizeof(void *)"
>  #define QEMU_BUILD_BUG_ON(x) _Static_assert(!(x), #x)
>                               ^
> /qemu/include/qemu/atomic.h:146:5: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
>      QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
>      ^~~~~~~~~~~~~~~~~
> /qemu/include/qemu/atomic.h:417:5: note: in expansion of macro ‘atomic_load_acquire’
>      atomic_load_acquire(ptr)
>      ^~~~~~~~~~~~~~~~~~~
> /qemu/qga/commands.c:160:21: note: in expansion of macro ‘atomic_mb_read’
>      bool finished = atomic_mb_read(&gei->finished);
>                      ^~~~~~~~~~~~~~

Signed-off-by: Andreas Grapentin <andreas@grapentin.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:19:23 +03:00
sochin.jiang
bcd711feb0 channel-file: fix wrong parameter comments
Signed-off-by: sochin.jiang <sochin@aliyun.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-10 10:18:57 +03:00
Geert Martin Ijewski
a37278169d crypto: qcrypto_random_bytes() now works on windows w/o any other crypto libs
If no crypto library is included in the build, QEMU uses
qcrypto_random_bytes() to generate random data. That function tried to open
/dev/urandom or /dev/random and if opening both files failed it errored out.

Those files obviously do not exist on windows, so there the code uses
CryptGenRandom().

Furthermore there was some refactoring and a new function
qcrypto_random_init() was introduced. If a proper crypto library (gnutls or
libgcrypt) is included in the build, this function does nothing. If neither
is included it initializes the (platform specific) handles that are used by
qcrypto_random_bytes().
Either:
* a handle to /dev/urandom | /dev/random on unix like systems
* a handle to a cryptographic service provider on windows

Signed-off-by: Geert Martin Ijewski <gm.ijewski@web.de>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-05-09 14:41:47 +01:00
Daniel P. Berrange
e4a3507e86 crypto: move 'opaque' parameter to (nearly) the end of parameter list
Previous commit moved 'opaque' to be the 2nd parameter in the list:

  commit 375092332e
  Author: Fam Zheng <famz@redhat.com>
  Date:   Fri Apr 21 20:27:02 2017 +0800

    crypto: Make errp the last parameter of functions

    Move opaque to 2nd instead of the 2nd to last, so that compilers help
    check with the conversion.

this puts it back to the 2nd to last position.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-05-09 14:41:47 +01:00
Daniel P. Berrange
899833cd65 List SASL config file under the cryptography maintainer's realm
No one is listed as maintainer for qemu.sasl. It is used by the
VNC server for SASL auth, but since it is cryptography related,
list it under the crytography maintainer's realm, rather than
under the UI maintainer.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-05-09 14:41:47 +01:00
Daniel P. Berrange
c6a9a9f575 Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL
RFC 6331 documents a number of serious security weaknesses in
the SASL DIGEST-MD5 mechanism. As such, QEMU should not be
using or recommending it as a default mechanism for VNC auth
with SASL.

GSSAPI (Kerberos) is the only other viable SASL mechanism that
can provide secure session encryption so enable that by defalt
as the replacement. If users have TLS enabled for VNC, they can
optionally decide to use SCRAM-SHA-1 instead of GSSAPI, allowing
plain username and password auth.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-05-09 14:41:47 +01:00
Fam Zheng
3c76c606da block: Make 'replication_state' an enum
BDRVReplicationState.replication_state is a name with a bit of
duplication, plus it could be an enum like BDRVReplicationState.mode,
which is more readable and also more straightforward in a debugger.

Rename it, and improve the type while at it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Saurav Sachidanand
ec45bbe5f1 util: Use g_malloc/g_free in envlist.c
Change malloc/strdup/free to g_malloc/g_strdup/g_free in
util/envlist.c.

Remove NULL checks for pointers returned from g_malloc and g_strdup
as they exit in case of failure. Also, update calls to envlist_create
to reflect this.

Free array and array contents returned by envlist_to_environ using
g_free in bsd-user/main.c and linux-user/main.c.

Update comments to reflect change in semantics.

Signed-off-by: Saurav Sachidanand <sauravsachidanand@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Philippe Mathieu-Daudé
9879f5ac62 qga: fix compiler warnings (clang 5)
static code analyzer complain:

qga/commands-posix.c:2127:9: warning: Null pointer passed as an argument to a 'nonnull' parameter
        closedir(dp);
        ^~~~~~~~~~~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Philippe Mathieu-Daudé
21a9ad2f15 device_tree: fix compiler warnings (clang 5)
static code analyzer complain:

device_tree.c:155:18: warning: Null pointer passed as an argument to a 'nonnull' parameter
    while ((de = readdir(d)) != NULL) {
                 ^~~~~~~~~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Philippe Mathieu-Daudé
6b1de1484e usb-ccid: make ccid_write_data_block() cope with null buffers
static code analyzer complain:

hw/usb/dev-smartcard-reader.c:816:5: warning: Null pointer passed as an argument to a 'nonnull' parameter
    memcpy(p->abData, data, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Eric Blake
46bbbec2d3 tests: Ignore more test executables
Ignore test executables when building in-tree:
test-arm-mptimer introduced in commit 882fac3
test-crypto-hmac introduced in commit 4fd460b
test-aio-multithread introduced in commit 0c330a7

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Craig Jellick
ed1fcd0009 Add 'none' as type for drive's if option
Signed-off-by: Craig Jellick <craig@rancher.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Marc-André Lureau
61f7c6a0c2 doc: fix function spelling
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
KONRAD Frederic
2d812d6dff ppc_booke: drop useless assignment
The tb_env variable is set two lines above. So just drop the double assignment.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
Ishani Chugh
d0e31a105e Remove reduntant qemu: from error functions
This patch removes redundant "qemu:" from error functions. The link to the bitesized task is:
http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking

Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-05-07 09:57:51 +03:00
306 changed files with 6629 additions and 2487 deletions

3
.gitmodules vendored
View File

@@ -34,3 +34,6 @@
[submodule "roms/skiboot"]
path = roms/skiboot
url = git://git.qemu.org/skiboot.git
[submodule "roms/QemuMacDrivers"]
path = roms/QemuMacDrivers
url = git://git.qemu.org/QemuMacDrivers.git

View File

@@ -86,9 +86,6 @@ matrix:
- env: CONFIG="--enable-trace-backends=ust"
TEST_CMD=""
compiler: gcc
- env: CONFIG="--with-coroutine=gthread"
TEST_CMD=""
compiler: gcc
- env: CONFIG=""
os: osx
compiler: clang
@@ -191,7 +188,7 @@ matrix:
compiler: none
env:
- COMPILER_NAME=gcc CXX=g++-5 CC=gcc-5
- CONFIG="--cc=gcc-5 --cxx=g++-5 --disable-pie --disable-linux-user --with-coroutine=gthread"
- CONFIG="--cc=gcc-5 --cxx=g++-5 --disable-pie --disable-linux-user"
- TEST_CMD=""
before_script:
- ./configure ${CONFIG} --extra-cflags="-g3 -O0 -fsanitize=thread -fuse-ld=gold" || cat config.log

View File

@@ -354,6 +354,12 @@ L: qemu-devel@nongnu.org
S: Maintained
F: *posix*
NETBSD
L: qemu-devel@nongnu.org
M: Kamil Rytarowski <kamil@netbsd.org>
S: Maintained
K: (?i)NetBSD
W32, W64
L: qemu-devel@nongnu.org
M: Stefan Weil <sw@weilnetz.de>
@@ -1170,6 +1176,7 @@ F: include/block/
F: qemu-img*
F: qemu-io*
F: tests/qemu-iotests/
F: util/qemu-progress.c
T: git git://repo.or.cz/qemu/kevin.git block
Block I/O path
@@ -1177,8 +1184,8 @@ M: Stefan Hajnoczi <stefanha@redhat.com>
M: Fam Zheng <famz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: async.c
F: aio-*.c
F: util/async.c
F: util/aio-*.c
F: block/io.c
F: migration/block*
F: include/block/aio.h
@@ -1307,8 +1314,8 @@ Main loop
M: Paolo Bonzini <pbonzini@redhat.com>
S: Maintained
F: cpus.c
F: main-loop.c
F: qemu-timer.c
F: util/main-loop.c
F: util/qemu-timer.c
F: vl.c
Human Monitor (HMP)
@@ -1487,6 +1494,7 @@ S: Maintained
F: crypto/
F: include/crypto/
F: tests/test-crypto-*
F: qemu.sasl
Coroutines
M: Stefan Hajnoczi <stefanha@redhat.com>
@@ -1577,6 +1585,7 @@ F: default-configs/*-bsd-user.mak
Linux user
M: Riku Voipio <riku.voipio@iki.fi>
R: Laurent Vivier <laurent@vivier.eu>
S: Maintained
F: linux-user/
F: default-configs/*-linux-user.mak

View File

@@ -552,7 +552,8 @@ multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin \
s390-ccw.img \
spapr-rtas.bin slof.bin skiboot.lid \
palcode-clipper \
u-boot.e500
u-boot.e500 \
qemu_vga.ndrv
else
BLOBS=
endif

View File

@@ -49,7 +49,6 @@ common-obj-$(CONFIG_POSIX) += os-posix.o
common-obj-$(CONFIG_LINUX) += fsdev/
common-obj-y += migration/
common-obj-y += page_cache.o #aio.o
common-obj-$(CONFIG_SPICE) += spice-qemu-char.o

127
block.c
View File

@@ -192,11 +192,20 @@ void path_combine(char *dest, int dest_size,
}
}
/* Returns whether the image file is opened as read-only. Note that this can
* return false and writing to the image file is still not possible because the
* image is inactivated. */
bool bdrv_is_read_only(BlockDriverState *bs)
{
return bs->read_only;
}
/* Returns whether the image file can be written to right now */
bool bdrv_is_writable(BlockDriverState *bs)
{
return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
}
int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
{
/* Do not set read_only if copy_on_read is enabled */
@@ -762,6 +771,13 @@ static void bdrv_child_cb_drained_end(BdrvChild *child)
bdrv_drained_end(bs);
}
static int bdrv_child_cb_inactivate(BdrvChild *child)
{
BlockDriverState *bs = child->opaque;
assert(bs->open_flags & BDRV_O_INACTIVE);
return 0;
}
/*
* Returns the options and flags that a temporary snapshot should get, based on
* the originally requested flags (the originally requested image will have
@@ -800,6 +816,7 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
* the parent. */
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
/* Inherit the read-only option from the parent if it's not set */
qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
@@ -821,6 +838,7 @@ const BdrvChildRole child_file = {
.inherit_options = bdrv_inherited_options,
.drained_begin = bdrv_child_cb_drained_begin,
.drained_end = bdrv_child_cb_drained_end,
.inactivate = bdrv_child_cb_inactivate,
};
/*
@@ -842,6 +860,7 @@ const BdrvChildRole child_format = {
.inherit_options = bdrv_inherited_fmt_options,
.drained_begin = bdrv_child_cb_drained_begin,
.drained_end = bdrv_child_cb_drained_end,
.inactivate = bdrv_child_cb_inactivate,
};
static void bdrv_backing_attach(BdrvChild *c)
@@ -908,6 +927,7 @@ static void bdrv_backing_options(int *child_flags, QDict *child_options,
* which is only applied on the top level (BlockBackend) */
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
/* backing files always opened read-only */
qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
@@ -926,6 +946,7 @@ const BdrvChildRole child_backing = {
.inherit_options = bdrv_backing_options,
.drained_begin = bdrv_child_cb_drained_begin,
.drained_end = bdrv_child_cb_drained_end,
.inactivate = bdrv_child_cb_inactivate,
};
static int bdrv_open_flags(BlockDriverState *bs, int flags)
@@ -1150,6 +1171,11 @@ QemuOptsList bdrv_runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "discard operation (ignore/off, unmap/on)",
},
{
.name = BDRV_OPT_FORCE_SHARE,
.type = QEMU_OPT_BOOL,
.help = "always accept other writers (default: off)",
},
{ /* end of list */ }
},
};
@@ -1189,6 +1215,16 @@ static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file,
drv = bdrv_find_format(driver_name);
assert(drv != NULL);
bs->force_share = qemu_opt_get_bool(opts, BDRV_OPT_FORCE_SHARE, false);
if (bs->force_share && (bs->open_flags & BDRV_O_RDWR)) {
error_setg(errp,
BDRV_OPT_FORCE_SHARE
"=on can only be used with read-only images");
ret = -EINVAL;
goto fail_opts;
}
if (file != NULL) {
filename = blk_bs(file)->filename;
} else {
@@ -1448,6 +1484,22 @@ static int bdrv_child_check_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
static void bdrv_child_abort_perm_update(BdrvChild *c);
static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared);
static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
BdrvChild *c,
const BdrvChildRole *role,
uint64_t parent_perm, uint64_t parent_shared,
uint64_t *nperm, uint64_t *nshared)
{
if (bs->drv && bs->drv->bdrv_child_perm) {
bs->drv->bdrv_child_perm(bs, c, role,
parent_perm, parent_shared,
nperm, nshared);
}
if (child_bs && child_bs->force_share) {
*nshared = BLK_PERM_ALL;
}
}
/*
* Check whether permissions on this node can be changed in a way that
* @cumulative_perms and @cumulative_shared_perms are the new cumulative
@@ -1467,7 +1519,7 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Write permissions never work with read-only images */
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
bdrv_is_read_only(bs))
!bdrv_is_writable(bs))
{
error_setg(errp, "Block node is read-only");
return -EPERM;
@@ -1492,9 +1544,9 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Check all children */
QLIST_FOREACH(c, &bs->children, next) {
uint64_t cur_perm, cur_shared;
drv->bdrv_child_perm(bs, c, c->role,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
bdrv_child_perm(bs, c->bs, c, c->role,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
ret = bdrv_child_check_perm(c, cur_perm, cur_shared, ignore_children,
errp);
if (ret < 0) {
@@ -1554,9 +1606,9 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Update all children */
QLIST_FOREACH(c, &bs->children, next) {
uint64_t cur_perm, cur_shared;
drv->bdrv_child_perm(bs, c, c->role,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
bdrv_child_perm(bs, c->bs, c, c->role,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
bdrv_child_set_perm(c, cur_perm, cur_shared);
}
}
@@ -1586,7 +1638,7 @@ static char *bdrv_child_user_desc(BdrvChild *c)
return g_strdup("another user");
}
static char *bdrv_perm_names(uint64_t perm)
char *bdrv_perm_names(uint64_t perm)
{
struct perm_name {
uint64_t perm;
@@ -1752,7 +1804,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
bdrv_filter_default_perms(bs, c, role, perm, shared, &perm, &shared);
/* Format drivers may touch metadata even if the guest doesn't write */
if (!bdrv_is_read_only(bs)) {
if (bdrv_is_writable(bs)) {
perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
}
@@ -1778,6 +1830,10 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
BLK_PERM_WRITE_UNCHANGED;
}
if (bs->open_flags & BDRV_O_INACTIVE) {
shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
}
*nperm = perm;
*nshared = shared;
}
@@ -1891,8 +1947,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
assert(parent_bs->drv);
assert(bdrv_get_aio_context(parent_bs) == bdrv_get_aio_context(child_bs));
parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role,
perm, shared_perm, &perm, &shared_perm);
bdrv_child_perm(parent_bs, child_bs, NULL, child_role,
perm, shared_perm, &perm, &shared_perm);
child = bdrv_root_attach_child(child_bs, child_name, child_role,
perm, shared_perm, parent_bs, errp);
@@ -3916,7 +3972,8 @@ void bdrv_init_with_whitelist(void)
void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
{
BdrvChild *child;
BdrvChild *child, *parent;
uint64_t perm, shared_perm;
Error *local_err = NULL;
int ret;
@@ -3952,6 +4009,26 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
error_setg_errno(errp, -ret, "Could not refresh total sector count");
return;
}
/* Update permissions, they may differ for inactive nodes */
bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
ret = bdrv_check_perm(bs, perm, shared_perm, NULL, &local_err);
if (ret < 0) {
bs->open_flags |= BDRV_O_INACTIVE;
error_propagate(errp, local_err);
return;
}
bdrv_set_perm(bs, perm, shared_perm);
QLIST_FOREACH(parent, &bs->parents, next_parent) {
if (parent->role->activate) {
parent->role->activate(parent, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
}
}
}
void bdrv_invalidate_cache_all(Error **errp)
@@ -3976,7 +4053,7 @@ void bdrv_invalidate_cache_all(Error **errp)
static int bdrv_inactivate_recurse(BlockDriverState *bs,
bool setting_flag)
{
BdrvChild *child;
BdrvChild *child, *parent;
int ret;
if (!setting_flag && bs->drv->bdrv_inactivate) {
@@ -3986,6 +4063,27 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
}
}
if (setting_flag) {
uint64_t perm, shared_perm;
bs->open_flags |= BDRV_O_INACTIVE;
QLIST_FOREACH(parent, &bs->parents, next_parent) {
if (parent->role->inactivate) {
ret = parent->role->inactivate(parent);
if (ret < 0) {
bs->open_flags &= ~BDRV_O_INACTIVE;
return ret;
}
}
}
/* Update permissions, they may differ for inactive nodes */
bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
bdrv_check_perm(bs, perm, shared_perm, NULL, &error_abort);
bdrv_set_perm(bs, perm, shared_perm);
}
QLIST_FOREACH(child, &bs->children, next) {
ret = bdrv_inactivate_recurse(child->bs, setting_flag);
if (ret < 0) {
@@ -3993,9 +4091,6 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
}
}
if (setting_flag) {
bs->open_flags |= BDRV_O_INACTIVE;
}
return 0;
}

View File

@@ -1,6 +1,7 @@
/*
* Block protocol for I/O error injection
*
* Copyright (C) 2016-2017 Red Hat, Inc.
* Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -37,7 +38,12 @@
typedef struct BDRVBlkdebugState {
int state;
int new_state;
int align;
uint64_t align;
uint64_t max_transfer;
uint64_t opt_write_zero;
uint64_t max_write_zero;
uint64_t opt_discard;
uint64_t max_discard;
/* For blkdebug_refresh_filename() */
char *config_file;
@@ -342,6 +348,31 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_SIZE,
.help = "Required alignment in bytes",
},
{
.name = "max-transfer",
.type = QEMU_OPT_SIZE,
.help = "Maximum transfer size in bytes",
},
{
.name = "opt-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Optimum write zero alignment in bytes",
},
{
.name = "max-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Maximum write zero size in bytes",
},
{
.name = "opt-discard",
.type = QEMU_OPT_SIZE,
.help = "Optimum discard alignment in bytes",
},
{
.name = "max-discard",
.type = QEMU_OPT_SIZE,
.help = "Maximum discard size in bytes",
},
{ /* end of list */ }
},
};
@@ -352,8 +383,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
BDRVBlkdebugState *s = bs->opaque;
QemuOpts *opts;
Error *local_err = NULL;
uint64_t align;
int ret;
uint64_t align;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -382,19 +413,69 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
goto out;
}
/* Set request alignment */
align = qemu_opt_get_size(opts, "align", 0);
if (align < INT_MAX && is_power_of_2(align)) {
s->align = align;
} else if (align) {
error_setg(errp, "Invalid alignment");
ret = -EINVAL;
bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags;
bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
bs->file->bs->supported_zero_flags;
ret = -EINVAL;
/* Set alignment overrides */
s->align = qemu_opt_get_size(opts, "align", 0);
if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
error_setg(errp, "Cannot meet constraints with align %" PRIu64,
s->align);
goto out;
}
align = MAX(s->align, bs->file->bs->bl.request_alignment);
s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
if (s->max_transfer &&
(s->max_transfer >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_transfer, align))) {
error_setg(errp, "Cannot meet constraints with max-transfer %" PRIu64,
s->max_transfer);
goto out;
}
s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
if (s->opt_write_zero &&
(s->opt_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
error_setg(errp, "Cannot meet constraints with opt-write-zero %" PRIu64,
s->opt_write_zero);
goto out;
}
s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
if (s->max_write_zero &&
(s->max_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_write_zero,
MAX(s->opt_write_zero, align)))) {
error_setg(errp, "Cannot meet constraints with max-write-zero %" PRIu64,
s->max_write_zero);
goto out;
}
s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
if (s->opt_discard &&
(s->opt_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_discard, align))) {
error_setg(errp, "Cannot meet constraints with opt-discard %" PRIu64,
s->opt_discard);
goto out;
}
s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
if (s->max_discard &&
(s->max_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_discard,
MAX(s->opt_discard, align)))) {
error_setg(errp, "Cannot meet constraints with max-discard %" PRIu64,
s->max_discard);
goto out;
}
ret = 0;
goto out;
out:
if (ret < 0) {
g_free(s->config_file);
@@ -403,11 +484,30 @@ out:
return ret;
}
static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
{
BDRVBlkdebugState *s = bs->opaque;
int error = rule->options.inject.error;
bool immediately = rule->options.inject.immediately;
BlkdebugRule *rule = NULL;
int error;
bool immediately;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(bytes && inject_offset >= offset &&
inject_offset < offset + bytes))
{
break;
}
}
if (!rule || !rule->options.inject.error) {
return 0;
}
immediately = rule->options.inject.immediately;
error = rule->options.inject.error;
if (rule->options.inject.once) {
QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
@@ -426,21 +526,18 @@ static int coroutine_fn
blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(inject_offset >= offset && inject_offset < offset + bytes))
{
break;
}
/* Sanity check block layer guarantees */
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
if (bs->bl.max_transfer) {
assert(bytes <= bs->bl.max_transfer);
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
err = rule_check(bs, offset, bytes);
if (err) {
return err;
}
return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -450,21 +547,18 @@ static int coroutine_fn
blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(inject_offset >= offset && inject_offset < offset + bytes))
{
break;
}
/* Sanity check block layer guarantees */
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
if (bs->bl.max_transfer) {
assert(bytes <= bs->bl.max_transfer);
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
err = rule_check(bs, offset, bytes);
if (err) {
return err;
}
return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -472,22 +566,81 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
static int blkdebug_co_flush(BlockDriverState *bs)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err = rule_check(bs, 0, 0);
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
if (rule->options.inject.offset == -1) {
break;
}
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
if (err) {
return err;
}
return bdrv_co_flush(bs->file->bs);
}
static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int count,
BdrvRequestFlags flags)
{
uint32_t align = MAX(bs->bl.request_alignment,
bs->bl.pwrite_zeroes_alignment);
int err;
/* Only pass through requests that are larger than requested
* preferred alignment (so that we test the fallback to writes on
* unaligned portions), and check that the block layer never hands
* us anything unaligned that crosses an alignment boundary. */
if (count < align) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
if (bs->bl.max_pwrite_zeroes) {
assert(count <= bs->bl.max_pwrite_zeroes);
}
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
}
static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
int64_t offset, int count)
{
uint32_t align = bs->bl.pdiscard_alignment;
int err;
/* Only pass through requests that are larger than requested
* minimum alignment, and ensure that unaligned requests do not
* cross optimum discard boundaries. */
if (count < bs->bl.request_alignment) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
if (align && count >= align) {
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
}
if (bs->bl.max_pdiscard) {
assert(count <= bs->bl.max_pdiscard);
}
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pdiscard(bs->file->bs, offset, count);
}
static void blkdebug_close(BlockDriverState *bs)
{
@@ -715,6 +868,21 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
if (s->align) {
bs->bl.request_alignment = s->align;
}
if (s->max_transfer) {
bs->bl.max_transfer = s->max_transfer;
}
if (s->opt_write_zero) {
bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
}
if (s->max_write_zero) {
bs->bl.max_pwrite_zeroes = s->max_write_zero;
}
if (s->opt_discard) {
bs->bl.pdiscard_alignment = s->opt_discard;
}
if (s->max_discard) {
bs->bl.max_pdiscard = s->max_discard;
}
}
static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
@@ -742,6 +910,8 @@ static BlockDriver bdrv_blkdebug = {
.bdrv_co_preadv = blkdebug_co_preadv,
.bdrv_co_pwritev = blkdebug_co_pwritev,
.bdrv_co_flush_to_disk = blkdebug_co_flush,
.bdrv_co_pwrite_zeroes = blkdebug_co_pwrite_zeroes,
.bdrv_co_pdiscard = blkdebug_co_pdiscard,
.bdrv_debug_event = blkdebug_debug_event,
.bdrv_debug_breakpoint = blkdebug_debug_breakpoint,

View File

@@ -130,6 +130,56 @@ static const char *blk_root_get_name(BdrvChild *child)
return blk_name(child->opaque);
}
/*
* Notifies the user of the BlockBackend that migration has completed. qdev
* devices can tighten their permissions in response (specifically revoke
* shared write permissions that we needed for storage migration).
*
* If an error is returned, the VM cannot be allowed to be resumed.
*/
static void blk_root_activate(BdrvChild *child, Error **errp)
{
BlockBackend *blk = child->opaque;
Error *local_err = NULL;
if (!blk->disable_perm) {
return;
}
blk->disable_perm = false;
blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
if (local_err) {
error_propagate(errp, local_err);
blk->disable_perm = true;
return;
}
}
static int blk_root_inactivate(BdrvChild *child)
{
BlockBackend *blk = child->opaque;
if (blk->disable_perm) {
return 0;
}
/* Only inactivate BlockBackends for guest devices (which are inactive at
* this point because the VM is stopped) and unattached monitor-owned
* BlockBackends. If there is still any other user like a block job, then
* we simply can't inactivate the image. */
if (!blk->dev && !blk->name[0]) {
return -EPERM;
}
blk->disable_perm = true;
if (blk->root) {
bdrv_child_try_set_perm(blk->root, 0, BLK_PERM_ALL, &error_abort);
}
return 0;
}
static const BdrvChildRole child_root = {
.inherit_options = blk_root_inherit_options,
@@ -140,6 +190,9 @@ static const BdrvChildRole child_root = {
.drained_begin = blk_root_drained_begin,
.drained_end = blk_root_drained_end,
.activate = blk_root_activate,
.inactivate = blk_root_inactivate,
};
/*
@@ -601,34 +654,6 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
*shared_perm = blk->shared_perm;
}
/*
* Notifies the user of all BlockBackends that migration has completed. qdev
* devices can tighten their permissions in response (specifically revoke
* shared write permissions that we needed for storage migration).
*
* If an error is returned, the VM cannot be allowed to be resumed.
*/
void blk_resume_after_migration(Error **errp)
{
BlockBackend *blk;
Error *local_err = NULL;
for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
if (!blk->disable_perm) {
continue;
}
blk->disable_perm = false;
blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
if (local_err) {
error_propagate(errp, local_err);
blk->disable_perm = true;
return;
}
}
}
static int blk_do_attach_dev(BlockBackend *blk, void *dev)
{
if (blk->dev) {

View File

@@ -56,10 +56,10 @@ static int block_crypto_probe_generic(QCryptoBlockFormat format,
static ssize_t block_crypto_read_func(QCryptoBlock *block,
void *opaque,
size_t offset,
uint8_t *buf,
size_t buflen,
void *opaque,
Error **errp)
{
BlockDriverState *bs = opaque;
@@ -83,10 +83,10 @@ struct BlockCryptoCreateData {
static ssize_t block_crypto_write_func(QCryptoBlock *block,
void *opaque,
size_t offset,
const uint8_t *buf,
size_t buflen,
void *opaque,
Error **errp)
{
struct BlockCryptoCreateData *data = opaque;
@@ -102,8 +102,8 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
static ssize_t block_crypto_init_func(QCryptoBlock *block,
void *opaque,
size_t headerlen,
void *opaque,
Error **errp)
{
struct BlockCryptoCreateData *data = opaque;

View File

@@ -76,15 +76,12 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
#define CURL_TIMEOUT_DEFAULT 5
#define CURL_TIMEOUT_MAX 10000
#define FIND_RET_NONE 0
#define FIND_RET_OK 1
#define FIND_RET_WAIT 2
#define CURL_BLOCK_OPT_URL "url"
#define CURL_BLOCK_OPT_READAHEAD "readahead"
#define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
#define CURL_BLOCK_OPT_TIMEOUT "timeout"
#define CURL_BLOCK_OPT_COOKIE "cookie"
#define CURL_BLOCK_OPT_COOKIE_SECRET "cookie-secret"
#define CURL_BLOCK_OPT_USERNAME "username"
#define CURL_BLOCK_OPT_PASSWORD_SECRET "password-secret"
#define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username"
@@ -93,14 +90,17 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
struct BDRVCURLState;
typedef struct CURLAIOCB {
BlockAIOCB common;
Coroutine *co;
QEMUIOVector *qiov;
int64_t sector_num;
int nb_sectors;
uint64_t offset;
uint64_t bytes;
int ret;
size_t start;
size_t end;
QSIMPLEQ_ENTRY(CURLAIOCB) next;
} CURLAIOCB;
typedef struct CURLSocket {
@@ -115,7 +115,7 @@ typedef struct CURLState
CURL *curl;
QLIST_HEAD(, CURLSocket) sockets;
char *orig_buf;
size_t buf_start;
uint64_t buf_start;
size_t buf_off;
size_t buf_len;
char range[128];
@@ -126,7 +126,7 @@ typedef struct CURLState
typedef struct BDRVCURLState {
CURLM *multi;
QEMUTimer timer;
size_t len;
uint64_t len;
CURLState states[CURL_NUM_STATES];
char *url;
size_t readahead_size;
@@ -136,6 +136,7 @@ typedef struct BDRVCURLState {
bool accept_range;
AioContext *aio_context;
QemuMutex mutex;
QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq;
char *username;
char *password;
char *proxyusername;
@@ -147,6 +148,7 @@ static void curl_multi_do(void *arg);
static void curl_multi_read(void *arg);
#ifdef NEED_CURL_TIMER_CALLBACK
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
{
BDRVCURLState *s = opaque;
@@ -163,6 +165,7 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
}
#endif
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
void *userp, void *sp)
{
@@ -212,6 +215,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
return 0;
}
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{
BDRVCURLState *s = opaque;
@@ -226,6 +230,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
return realsize;
}
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{
CURLState *s = ((CURLState*)opaque);
@@ -253,7 +258,7 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
continue;
if ((s->buf_off >= acb->end)) {
size_t request_length = acb->nb_sectors * BDRV_SECTOR_SIZE;
size_t request_length = acb->bytes;
qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
acb->end - acb->start);
@@ -264,9 +269,11 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
request_length - offset);
}
acb->common.cb(acb->common.opaque, 0);
qemu_aio_unref(acb);
acb->ret = 0;
s->acb[i] = NULL;
qemu_mutex_unlock(&s->s->mutex);
aio_co_wake(acb->co);
qemu_mutex_lock(&s->s->mutex);
}
}
@@ -275,18 +282,19 @@ read_end:
return size * nmemb;
}
static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
CURLAIOCB *acb)
/* Called with s->mutex held. */
static bool curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
CURLAIOCB *acb)
{
int i;
size_t end = start + len;
size_t clamped_end = MIN(end, s->len);
size_t clamped_len = clamped_end - start;
uint64_t end = start + len;
uint64_t clamped_end = MIN(end, s->len);
uint64_t clamped_len = clamped_end - start;
for (i=0; i<CURL_NUM_STATES; i++) {
CURLState *state = &s->states[i];
size_t buf_end = (state->buf_start + state->buf_off);
size_t buf_fend = (state->buf_start + state->buf_len);
uint64_t buf_end = (state->buf_start + state->buf_off);
uint64_t buf_fend = (state->buf_start + state->buf_len);
if (!state->orig_buf)
continue;
@@ -305,9 +313,8 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
if (clamped_len < len) {
qemu_iovec_memset(acb->qiov, clamped_len, 0, len - clamped_len);
}
acb->common.cb(acb->common.opaque, 0);
return FIND_RET_OK;
acb->ret = 0;
return true;
}
// Wait for unfinished chunks
@@ -325,13 +332,13 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
for (j=0; j<CURL_NUM_ACB; j++) {
if (!state->acb[j]) {
state->acb[j] = acb;
return FIND_RET_WAIT;
return true;
}
}
}
}
return FIND_RET_NONE;
return false;
}
/* Called with s->mutex held. */
@@ -376,11 +383,11 @@ static void curl_multi_check_completion(BDRVCURLState *s)
continue;
}
qemu_mutex_unlock(&s->mutex);
acb->common.cb(acb->common.opaque, -EIO);
qemu_mutex_lock(&s->mutex);
qemu_aio_unref(acb);
acb->ret = -EIO;
state->acb[i] = NULL;
qemu_mutex_unlock(&s->mutex);
aio_co_wake(acb->co);
qemu_mutex_lock(&s->mutex);
}
}
@@ -449,32 +456,28 @@ static void curl_multi_timeout_do(void *arg)
#endif
}
static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
/* Called with s->mutex held. */
static CURLState *curl_find_state(BDRVCURLState *s)
{
CURLState *state = NULL;
int i, j;
do {
for (i=0; i<CURL_NUM_STATES; i++) {
for (j=0; j<CURL_NUM_ACB; j++)
if (s->states[i].acb[j])
continue;
if (s->states[i].in_use)
continue;
int i;
for (i = 0; i < CURL_NUM_STATES; i++) {
if (!s->states[i].in_use) {
state = &s->states[i];
state->in_use = 1;
break;
}
if (!state) {
aio_poll(bdrv_get_aio_context(bs), true);
}
} while(!state);
}
return state;
}
static int curl_init_state(BDRVCURLState *s, CURLState *state)
{
if (!state->curl) {
state->curl = curl_easy_init();
if (!state->curl) {
return NULL;
return -EIO;
}
curl_easy_setopt(state->curl, CURLOPT_URL, s->url);
curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER,
@@ -527,11 +530,18 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
QLIST_INIT(&state->sockets);
state->s = s;
return state;
return 0;
}
/* Called with s->mutex held. */
static void curl_clean_state(CURLState *s)
{
CURLAIOCB *next;
int j;
for (j = 0; j < CURL_NUM_ACB; j++) {
assert(!s->acb[j]);
}
if (s->s->multi)
curl_multi_remove_handle(s->s->multi, s->curl);
@@ -543,6 +553,14 @@ static void curl_clean_state(CURLState *s)
}
s->in_use = 0;
next = QSIMPLEQ_FIRST(&s->s->free_state_waitq);
if (next) {
QSIMPLEQ_REMOVE_HEAD(&s->s->free_state_waitq, next);
qemu_mutex_unlock(&s->s->mutex);
aio_co_wake(next->co);
qemu_mutex_lock(&s->s->mutex);
}
}
static void curl_parse_filename(const char *filename, QDict *options,
@@ -556,6 +574,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
BDRVCURLState *s = bs->opaque;
int i;
qemu_mutex_lock(&s->mutex);
for (i = 0; i < CURL_NUM_STATES; i++) {
if (s->states[i].in_use) {
curl_clean_state(&s->states[i]);
@@ -571,6 +590,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
curl_multi_cleanup(s->multi);
s->multi = NULL;
}
qemu_mutex_unlock(&s->mutex);
timer_del(&s->timer);
}
@@ -623,6 +643,11 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "Pass the cookie or list of cookies with each request"
},
{
.name = CURL_BLOCK_OPT_COOKIE_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of secret used as cookie passed with each request"
},
{
.name = CURL_BLOCK_OPT_USERNAME,
.type = QEMU_OPT_STRING,
@@ -657,6 +682,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
Error *local_err = NULL;
const char *file;
const char *cookie;
const char *cookie_secret;
double d;
const char *secretid;
const char *protocol_delimiter;
@@ -668,6 +694,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
return -EROFS;
}
qemu_mutex_init(&s->mutex);
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
@@ -693,7 +720,22 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
s->sslverify = qemu_opt_get_bool(opts, CURL_BLOCK_OPT_SSLVERIFY, true);
cookie = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE);
s->cookie = g_strdup(cookie);
cookie_secret = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE_SECRET);
if (cookie && cookie_secret) {
error_setg(errp,
"curl driver cannot handle both cookie and cookie secret");
goto out_noclean;
}
if (cookie_secret) {
s->cookie = qcrypto_secret_lookup_as_utf8(cookie_secret, errp);
if (!s->cookie) {
goto out_noclean;
}
} else {
s->cookie = g_strdup(cookie);
}
file = qemu_opt_get(opts, CURL_BLOCK_OPT_URL);
if (file == NULL) {
@@ -736,14 +778,22 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
}
DPRINTF("CURL: Opening %s\n", file);
QSIMPLEQ_INIT(&s->free_state_waitq);
s->aio_context = bdrv_get_aio_context(bs);
s->url = g_strdup(file);
state = curl_init_state(bs, s);
if (!state)
qemu_mutex_lock(&s->mutex);
state = curl_find_state(s);
qemu_mutex_unlock(&s->mutex);
if (!state) {
goto out_noclean;
}
// Get file size
if (curl_init_state(s, state) < 0) {
goto out;
}
s->accept_range = false;
curl_easy_setopt(state->curl, CURLOPT_NOBODY, 1);
curl_easy_setopt(state->curl, CURLOPT_HEADERFUNCTION,
@@ -771,7 +821,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
}
#endif
s->len = (size_t)d;
s->len = d;
if ((!strncasecmp(s->url, "http://", strlen("http://"))
|| !strncasecmp(s->url, "https://", strlen("https://")))
@@ -780,13 +830,14 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
"Server does not support 'range' (byte ranges).");
goto out;
}
DPRINTF("CURL: Size = %zd\n", s->len);
DPRINTF("CURL: Size = %" PRIu64 "\n", s->len);
qemu_mutex_lock(&s->mutex);
curl_clean_state(state);
qemu_mutex_unlock(&s->mutex);
curl_easy_cleanup(state->curl);
state->curl = NULL;
qemu_mutex_init(&s->mutex);
curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
qemu_opts_del(opts);
@@ -797,53 +848,51 @@ out:
curl_easy_cleanup(state->curl);
state->curl = NULL;
out_noclean:
qemu_mutex_destroy(&s->mutex);
g_free(s->cookie);
g_free(s->url);
qemu_opts_del(opts);
return -EINVAL;
}
static const AIOCBInfo curl_aiocb_info = {
.aiocb_size = sizeof(CURLAIOCB),
};
static void curl_readv_bh_cb(void *p)
static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb)
{
CURLState *state;
int running;
int ret = -EINPROGRESS;
CURLAIOCB *acb = p;
BlockDriverState *bs = acb->common.bs;
BDRVCURLState *s = bs->opaque;
size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
size_t end;
uint64_t start = acb->offset;
uint64_t end;
qemu_mutex_lock(&s->mutex);
// In case we have the requested data already (e.g. read-ahead),
// we can just call the callback and be done.
switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
case FIND_RET_OK:
qemu_aio_unref(acb);
// fall through
case FIND_RET_WAIT:
goto out;
default:
break;
if (curl_find_buf(s, start, acb->bytes, acb)) {
goto out;
}
// No cache found, so let's start a new request
state = curl_init_state(acb->common.bs, s);
if (!state) {
ret = -EIO;
for (;;) {
state = curl_find_state(s);
if (state) {
break;
}
QSIMPLEQ_INSERT_TAIL(&s->free_state_waitq, acb, next);
qemu_mutex_unlock(&s->mutex);
qemu_coroutine_yield();
qemu_mutex_lock(&s->mutex);
}
if (curl_init_state(s, state) < 0) {
curl_clean_state(state);
acb->ret = -EIO;
goto out;
}
acb->start = 0;
acb->end = MIN(acb->nb_sectors * BDRV_SECTOR_SIZE, s->len - start);
acb->end = MIN(acb->bytes, s->len - start);
state->buf_off = 0;
g_free(state->orig_buf);
@@ -853,14 +902,14 @@ static void curl_readv_bh_cb(void *p)
state->orig_buf = g_try_malloc(state->buf_len);
if (state->buf_len && state->orig_buf == NULL) {
curl_clean_state(state);
ret = -ENOMEM;
acb->ret = -ENOMEM;
goto out;
}
state->acb[0] = acb;
snprintf(state->range, 127, "%zd-%zd", start, end);
DPRINTF("CURL (AIO): Reading %llu at %zd (%s)\n",
(acb->nb_sectors * BDRV_SECTOR_SIZE), start, state->range);
snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64, start, end);
DPRINTF("CURL (AIO): Reading %" PRIu64 " at %" PRIu64 " (%s)\n",
acb->bytes, start, state->range);
curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
curl_multi_add_handle(s->multi, state->curl);
@@ -870,26 +919,24 @@ static void curl_readv_bh_cb(void *p)
out:
qemu_mutex_unlock(&s->mutex);
if (ret != -EINPROGRESS) {
acb->common.cb(acb->common.opaque, ret);
qemu_aio_unref(acb);
}
}
static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
static int coroutine_fn curl_co_preadv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{
CURLAIOCB *acb;
CURLAIOCB acb = {
.co = qemu_coroutine_self(),
.ret = -EINPROGRESS,
.qiov = qiov,
.offset = offset,
.bytes = bytes
};
acb = qemu_aio_get(&curl_aiocb_info, bs, cb, opaque);
acb->qiov = qiov;
acb->sector_num = sector_num;
acb->nb_sectors = nb_sectors;
aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), curl_readv_bh_cb, acb);
return &acb->common;
curl_setup_preadv(bs, &acb);
while (acb.ret == -EINPROGRESS) {
qemu_coroutine_yield();
}
return acb.ret;
}
static void curl_close(BlockDriverState *bs)
@@ -920,7 +967,7 @@ static BlockDriver bdrv_http = {
.bdrv_close = curl_close,
.bdrv_getlength = curl_getlength,
.bdrv_aio_readv = curl_aio_readv,
.bdrv_co_preadv = curl_co_preadv,
.bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context,
@@ -936,7 +983,7 @@ static BlockDriver bdrv_https = {
.bdrv_close = curl_close,
.bdrv_getlength = curl_getlength,
.bdrv_aio_readv = curl_aio_readv,
.bdrv_co_preadv = curl_co_preadv,
.bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context,
@@ -952,7 +999,7 @@ static BlockDriver bdrv_ftp = {
.bdrv_close = curl_close,
.bdrv_getlength = curl_getlength,
.bdrv_aio_readv = curl_aio_readv,
.bdrv_co_preadv = curl_co_preadv,
.bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context,
@@ -968,7 +1015,7 @@ static BlockDriver bdrv_ftps = {
.bdrv_close = curl_close,
.bdrv_getlength = curl_getlength,
.bdrv_aio_readv = curl_aio_readv,
.bdrv_co_preadv = curl_co_preadv,
.bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context,

View File

@@ -129,12 +129,23 @@ do { \
#define MAX_BLOCKSIZE 4096
/* Posix file locking bytes. Libvirt takes byte 0, we start from higher bytes,
* leaving a few more bytes for its future use. */
#define RAW_LOCK_PERM_BASE 100
#define RAW_LOCK_SHARED_BASE 200
typedef struct BDRVRawState {
int fd;
int lock_fd;
bool use_lock;
int type;
int open_flags;
size_t buf_align;
/* The current permissions. */
uint64_t perm;
uint64_t shared_perm;
#ifdef CONFIG_XFS
bool is_xfs:1;
#endif
@@ -392,6 +403,11 @@ static QemuOptsList raw_runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "host AIO implementation (threads, native)",
},
{
.name = "locking",
.type = QEMU_OPT_STRING,
.help = "file locking mode (on/off/auto, default: auto)",
},
{ /* end of list */ }
},
};
@@ -406,6 +422,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
BlockdevAioOptions aio, aio_default;
int fd, ret;
struct stat st;
OnOffAuto locking;
opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -435,6 +452,37 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
}
s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
locking = qapi_enum_parse(OnOffAuto_lookup, qemu_opt_get(opts, "locking"),
ON_OFF_AUTO__MAX, ON_OFF_AUTO_AUTO, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
switch (locking) {
case ON_OFF_AUTO_ON:
s->use_lock = true;
#ifndef F_OFD_SETLK
fprintf(stderr,
"File lock requested but OFD locking syscall is unavailable, "
"falling back to POSIX file locks.\n"
"Due to the implementation, locks can be lost unexpectedly.\n");
#endif
break;
case ON_OFF_AUTO_OFF:
s->use_lock = false;
break;
case ON_OFF_AUTO_AUTO:
#ifdef F_OFD_SETLK
s->use_lock = true;
#else
s->use_lock = false;
#endif
break;
default:
abort();
}
s->open_flags = open_flags;
raw_parse_flags(bdrv_flags, &s->open_flags);
@@ -450,6 +498,21 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
}
s->fd = fd;
s->lock_fd = -1;
if (s->use_lock) {
fd = qemu_open(filename, s->open_flags);
if (fd < 0) {
ret = -errno;
error_setg_errno(errp, errno, "Could not open '%s' for locking",
filename);
qemu_close(s->fd);
goto fail;
}
s->lock_fd = fd;
}
s->perm = 0;
s->shared_perm = BLK_PERM_ALL;
#ifdef CONFIG_LINUX_AIO
/* Currently Linux does AIO only for files opened with O_DIRECT */
if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
@@ -537,6 +600,161 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
return raw_open_common(bs, options, flags, 0, errp);
}
typedef enum {
RAW_PL_PREPARE,
RAW_PL_COMMIT,
RAW_PL_ABORT,
} RawPermLockOp;
#define PERM_FOREACH(i) \
for ((i) = 0; (1ULL << (i)) <= BLK_PERM_ALL; i++)
/* Lock bytes indicated by @perm_lock_bits and @shared_perm_lock_bits in the
* file; if @unlock == true, also unlock the unneeded bytes.
* @shared_perm_lock_bits is the mask of all permissions that are NOT shared.
*/
static int raw_apply_lock_bytes(BDRVRawState *s,
uint64_t perm_lock_bits,
uint64_t shared_perm_lock_bits,
bool unlock, Error **errp)
{
int ret;
int i;
PERM_FOREACH(i) {
int off = RAW_LOCK_PERM_BASE + i;
if (perm_lock_bits & (1ULL << i)) {
ret = qemu_lock_fd(s->lock_fd, off, 1, false);
if (ret) {
error_setg(errp, "Failed to lock byte %d", off);
return ret;
}
} else if (unlock) {
ret = qemu_unlock_fd(s->lock_fd, off, 1);
if (ret) {
error_setg(errp, "Failed to unlock byte %d", off);
return ret;
}
}
}
PERM_FOREACH(i) {
int off = RAW_LOCK_SHARED_BASE + i;
if (shared_perm_lock_bits & (1ULL << i)) {
ret = qemu_lock_fd(s->lock_fd, off, 1, false);
if (ret) {
error_setg(errp, "Failed to lock byte %d", off);
return ret;
}
} else if (unlock) {
ret = qemu_unlock_fd(s->lock_fd, off, 1);
if (ret) {
error_setg(errp, "Failed to unlock byte %d", off);
return ret;
}
}
}
return 0;
}
/* Check "unshared" bytes implied by @perm and ~@shared_perm in the file. */
static int raw_check_lock_bytes(BDRVRawState *s,
uint64_t perm, uint64_t shared_perm,
Error **errp)
{
int ret;
int i;
PERM_FOREACH(i) {
int off = RAW_LOCK_SHARED_BASE + i;
uint64_t p = 1ULL << i;
if (perm & p) {
ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
if (ret) {
char *perm_name = bdrv_perm_names(p);
error_setg(errp,
"Failed to get \"%s\" lock",
perm_name);
g_free(perm_name);
error_append_hint(errp,
"Is another process using the image?\n");
return ret;
}
}
}
PERM_FOREACH(i) {
int off = RAW_LOCK_PERM_BASE + i;
uint64_t p = 1ULL << i;
if (!(shared_perm & p)) {
ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
if (ret) {
char *perm_name = bdrv_perm_names(p);
error_setg(errp,
"Failed to get shared \"%s\" lock",
perm_name);
g_free(perm_name);
error_append_hint(errp,
"Is another process using the image?\n");
return ret;
}
}
}
return 0;
}
static int raw_handle_perm_lock(BlockDriverState *bs,
RawPermLockOp op,
uint64_t new_perm, uint64_t new_shared,
Error **errp)
{
BDRVRawState *s = bs->opaque;
int ret = 0;
Error *local_err = NULL;
if (!s->use_lock) {
return 0;
}
if (bdrv_get_flags(bs) & BDRV_O_INACTIVE) {
return 0;
}
assert(s->lock_fd > 0);
switch (op) {
case RAW_PL_PREPARE:
ret = raw_apply_lock_bytes(s, s->perm | new_perm,
~s->shared_perm | ~new_shared,
false, errp);
if (!ret) {
ret = raw_check_lock_bytes(s, new_perm, new_shared, errp);
if (!ret) {
return 0;
}
}
op = RAW_PL_ABORT;
/* fall through to unlock bytes. */
case RAW_PL_ABORT:
raw_apply_lock_bytes(s, s->perm, ~s->shared_perm, true, &local_err);
if (local_err) {
/* Theoretically the above call only unlocks bytes and it cannot
* fail. Something weird happened, report it.
*/
error_report_err(local_err);
}
break;
case RAW_PL_COMMIT:
raw_apply_lock_bytes(s, new_perm, ~new_shared, true, &local_err);
if (local_err) {
/* Theoretically the above call only unlocks bytes and it cannot
* fail. Something weird happened, report it.
*/
error_report_err(local_err);
}
break;
}
return ret;
}
static int raw_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp)
{
@@ -1405,6 +1623,10 @@ static void raw_close(BlockDriverState *bs)
qemu_close(s->fd);
s->fd = -1;
}
if (s->lock_fd >= 0) {
qemu_close(s->lock_fd);
s->lock_fd = -1;
}
}
static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
@@ -1949,6 +2171,25 @@ static QemuOptsList raw_create_opts = {
}
};
static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared,
Error **errp)
{
return raw_handle_perm_lock(bs, RAW_PL_PREPARE, perm, shared, errp);
}
static void raw_set_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared)
{
BDRVRawState *s = bs->opaque;
raw_handle_perm_lock(bs, RAW_PL_COMMIT, perm, shared, NULL);
s->perm = perm;
s->shared_perm = shared;
}
static void raw_abort_perm_update(BlockDriverState *bs)
{
raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
}
BlockDriver bdrv_file = {
.format_name = "file",
.protocol_name = "file",
@@ -1979,7 +2220,9 @@ BlockDriver bdrv_file = {
.bdrv_get_info = raw_get_info,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
.bdrv_check_perm = raw_check_perm,
.bdrv_set_perm = raw_set_perm,
.bdrv_abort_perm_update = raw_abort_perm_update,
.create_opts = &raw_create_opts,
};
@@ -2438,6 +2681,9 @@ static BlockDriver bdrv_host_device = {
.bdrv_get_info = raw_get_info,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
.bdrv_check_perm = raw_check_perm,
.bdrv_set_perm = raw_set_perm,
.bdrv_abort_perm_update = raw_abort_perm_update,
.bdrv_probe_blocksizes = hdev_probe_blocksizes,
.bdrv_probe_geometry = hdev_probe_geometry,

View File

@@ -344,6 +344,12 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
if (qdict_get_try_bool(options, "locking", false)) {
error_setg(errp, "locking=on is not supported on Windows");
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
use_aio = get_aio_option(opts, flags, &local_err);

View File

@@ -1784,8 +1784,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
if (ret & BDRV_BLOCK_RAW) {
assert(ret & BDRV_BLOCK_OFFSET_VALID);
ret = bdrv_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
*pnum, pnum, file);
ret = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
*pnum, pnum, file);
goto out;
}

View File

@@ -32,7 +32,7 @@
#include <zlib.h>
#include "qapi/qmp/qerror.h"
#include "crypto/cipher.h"
#include "migration/migration.h"
#include "migration/blocker.h"
/**************************************************************/
/* QEMU COW block driver with compression and encryption support */

View File

@@ -309,14 +309,19 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
uint64_t *l2_table, uint64_t stop_flags)
{
int i;
QCow2ClusterType first_cluster_type;
uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
uint64_t first_entry = be64_to_cpu(l2_table[0]);
uint64_t offset = first_entry & mask;
if (!offset)
if (!offset) {
return 0;
}
assert(qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL);
/* must be allocated */
first_cluster_type = qcow2_get_cluster_type(first_entry);
assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
@@ -328,14 +333,21 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
return i;
}
static int count_contiguous_clusters_by_type(int nb_clusters,
uint64_t *l2_table,
int wanted_type)
/*
* Checks how many consecutive unallocated clusters in a given L2
* table have the same cluster type.
*/
static int count_contiguous_clusters_unallocated(int nb_clusters,
uint64_t *l2_table,
QCow2ClusterType wanted_type)
{
int i;
assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
for (i = 0; i < nb_clusters; i++) {
int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
uint64_t entry = be64_to_cpu(l2_table[i]);
QCow2ClusterType type = qcow2_get_cluster_type(entry);
if (type != wanted_type) {
break;
@@ -487,6 +499,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
int l1_bits, c;
unsigned int offset_in_cluster;
uint64_t bytes_available, bytes_needed, nb_clusters;
QCow2ClusterType type;
int ret;
offset_in_cluster = offset_into_cluster(s, offset);
@@ -509,13 +522,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
l1_index = offset >> l1_bits;
if (l1_index >= s->l1_size) {
ret = QCOW2_CLUSTER_UNALLOCATED;
type = QCOW2_CLUSTER_UNALLOCATED;
goto out;
}
l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
if (!l2_offset) {
ret = QCOW2_CLUSTER_UNALLOCATED;
type = QCOW2_CLUSTER_UNALLOCATED;
goto out;
}
@@ -544,38 +557,37 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
* true */
assert(nb_clusters <= INT_MAX);
ret = qcow2_get_cluster_type(*cluster_offset);
switch (ret) {
type = qcow2_get_cluster_type(*cluster_offset);
if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
type == QCOW2_CLUSTER_ZERO_ALLOC)) {
qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
" in pre-v3 image (L2 offset: %#" PRIx64
", L2 index: %#x)", l2_offset, l2_index);
ret = -EIO;
goto fail;
}
switch (type) {
case QCOW2_CLUSTER_COMPRESSED:
/* Compressed clusters can only be processed one by one */
c = 1;
*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
break;
case QCOW2_CLUSTER_ZERO:
if (s->qcow_version < 3) {
qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
" in pre-v3 image (L2 offset: %#" PRIx64
", L2 index: %#x)", l2_offset, l2_index);
ret = -EIO;
goto fail;
}
c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
QCOW2_CLUSTER_ZERO);
*cluster_offset = 0;
break;
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED:
/* how many empty clusters ? */
c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
QCOW2_CLUSTER_UNALLOCATED);
c = count_contiguous_clusters_unallocated(nb_clusters,
&l2_table[l2_index], type);
*cluster_offset = 0;
break;
case QCOW2_CLUSTER_ZERO_ALLOC:
case QCOW2_CLUSTER_NORMAL:
/* how many allocated clusters ? */
c = count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], QCOW_OFLAG_ZERO);
&l2_table[l2_index], QCOW_OFLAG_ZERO);
*cluster_offset &= L2E_OFFSET_MASK;
if (offset_into_cluster(s, *cluster_offset)) {
qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"
qcow2_signal_corruption(bs, true, -1, -1,
"Cluster allocation offset %#"
PRIx64 " unaligned (L2 offset: %#" PRIx64
", L2 index: %#x)", *cluster_offset,
l2_offset, l2_index);
@@ -602,7 +614,7 @@ out:
assert(bytes_available - offset_in_cluster <= UINT_MAX);
*bytes = bytes_available - offset_in_cluster;
return ret;
return type;
fail:
qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
@@ -835,7 +847,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
* Don't discard clusters that reach a refcount of 0 (e.g. compressed
* clusters), the next write will reuse them anyway.
*/
if (j != 0) {
if (!m->keep_old_clusters && j != 0) {
for (i = 0; i < j; i++) {
qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
QCOW2_DISCARD_NEVER);
@@ -860,7 +872,7 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters,
for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[l2_index + i]);
int cluster_type = qcow2_get_cluster_type(l2_entry);
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry);
switch(cluster_type) {
case QCOW2_CLUSTER_NORMAL:
@@ -870,7 +882,8 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters,
break;
case QCOW2_CLUSTER_UNALLOCATED:
case QCOW2_CLUSTER_COMPRESSED:
case QCOW2_CLUSTER_ZERO:
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_ZERO_ALLOC:
break;
default:
abort();
@@ -1132,8 +1145,9 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
uint64_t entry;
uint64_t nb_clusters;
int ret;
bool keep_old_clusters = false;
uint64_t alloc_cluster_offset;
uint64_t alloc_cluster_offset = 0;
trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
*bytes);
@@ -1170,31 +1184,54 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
* wrong with our code. */
assert(nb_clusters > 0);
if (qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
(entry & QCOW_OFLAG_COPIED) &&
(!*host_offset ||
start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
{
/* Try to reuse preallocated zero clusters; contiguous normal clusters
* would be fine, too, but count_cow_clusters() above has limited
* nb_clusters already to a range of COW clusters */
int preallocated_nb_clusters =
count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], QCOW_OFLAG_COPIED);
assert(preallocated_nb_clusters > 0);
nb_clusters = preallocated_nb_clusters;
alloc_cluster_offset = entry & L2E_OFFSET_MASK;
/* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
* should not free them. */
keep_old_clusters = true;
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
/* Allocate, if necessary at a given offset in the image file */
alloc_cluster_offset = start_of_cluster(s, *host_offset);
ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
&nb_clusters);
if (ret < 0) {
goto fail;
}
/* Can't extend contiguous allocation */
if (nb_clusters == 0) {
*bytes = 0;
return 0;
}
/* !*host_offset would overwrite the image header and is reserved for "no
* host offset preferred". If 0 was a valid host offset, it'd trigger the
* following overlap check; do that now to avoid having an invalid value in
* *host_offset. */
if (!alloc_cluster_offset) {
ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
nb_clusters * s->cluster_size);
assert(ret < 0);
goto fail;
/* Allocate, if necessary at a given offset in the image file */
alloc_cluster_offset = start_of_cluster(s, *host_offset);
ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
&nb_clusters);
if (ret < 0) {
goto fail;
}
/* Can't extend contiguous allocation */
if (nb_clusters == 0) {
*bytes = 0;
return 0;
}
/* !*host_offset would overwrite the image header and is reserved for
* "no host offset preferred". If 0 was a valid host offset, it'd
* trigger the following overlap check; do that now to avoid having an
* invalid value in *host_offset. */
if (!alloc_cluster_offset) {
ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
nb_clusters * s->cluster_size);
assert(ret < 0);
goto fail;
}
}
/*
@@ -1225,6 +1262,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
.offset = start_of_cluster(s, guest_offset),
.nb_clusters = nb_clusters,
.keep_old_clusters = keep_old_clusters,
.cow_start = {
.offset = 0,
.nb_bytes = offset_into_cluster(s, guest_offset),
@@ -1472,24 +1511,25 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
* but rather fall through to the backing file.
*/
switch (qcow2_get_cluster_type(old_l2_entry)) {
case QCOW2_CLUSTER_UNALLOCATED:
if (full_discard || !bs->backing) {
continue;
}
break;
case QCOW2_CLUSTER_UNALLOCATED:
if (full_discard || !bs->backing) {
continue;
}
break;
case QCOW2_CLUSTER_ZERO:
if (!full_discard) {
continue;
}
break;
case QCOW2_CLUSTER_ZERO_PLAIN:
if (!full_discard) {
continue;
}
break;
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_COMPRESSED:
break;
case QCOW2_CLUSTER_ZERO_ALLOC:
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_COMPRESSED:
break;
default:
abort();
default:
abort();
}
/* First remove L2 entries */
@@ -1509,35 +1549,36 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters;
}
int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
int nb_sectors, enum qcow2_discard_type type, bool full_discard)
int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, enum qcow2_discard_type type,
bool full_discard)
{
BDRVQcow2State *s = bs->opaque;
uint64_t end_offset;
uint64_t end_offset = offset + bytes;
uint64_t nb_clusters;
int64_t cleared;
int ret;
end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
/* The caller must cluster-align start; round end down except at EOF */
/* Caller must pass aligned values, except at image end */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
end_offset = start_of_cluster(s, end_offset);
}
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
nb_clusters = size_to_clusters(s, end_offset - offset);
nb_clusters = size_to_clusters(s, bytes);
s->cache_discards = true;
/* Each L2 table is handled by its own loop iteration */
while (nb_clusters > 0) {
ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
if (ret < 0) {
cleared = discard_single_l2(bs, offset, nb_clusters, type,
full_discard);
if (cleared < 0) {
ret = cleared;
goto fail;
}
nb_clusters -= ret;
offset += (ret * s->cluster_size);
nb_clusters -= cleared;
offset += (cleared * s->cluster_size);
}
ret = 0;
@@ -1561,6 +1602,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
int l2_index;
int ret;
int i;
bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
if (ret < 0) {
@@ -1573,12 +1615,22 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
for (i = 0; i < nb_clusters; i++) {
uint64_t old_offset;
QCow2ClusterType cluster_type;
old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* Update L2 entries */
/*
* Minimize L2 changes if the cluster already reads back as
* zeroes with correct allocation.
*/
cluster_type = qcow2_get_cluster_type(old_offset);
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
continue;
}
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
} else {
@@ -1591,31 +1643,39 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters;
}
int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
int flags)
int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, int flags)
{
BDRVQcow2State *s = bs->opaque;
uint64_t end_offset = offset + bytes;
uint64_t nb_clusters;
int64_t cleared;
int ret;
/* Caller must pass aligned values, except at image end */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
/* The zero flag is only supported by version 3 and newer */
if (s->qcow_version < 3) {
return -ENOTSUP;
}
/* Each L2 table is handled by its own loop iteration */
nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
nb_clusters = size_to_clusters(s, bytes);
s->cache_discards = true;
while (nb_clusters > 0) {
ret = zero_single_l2(bs, offset, nb_clusters, flags);
if (ret < 0) {
cleared = zero_single_l2(bs, offset, nb_clusters, flags);
if (cleared < 0) {
ret = cleared;
goto fail;
}
nb_clusters -= ret;
offset += (ret * s->cluster_size);
nb_clusters -= cleared;
offset += (cleared * s->cluster_size);
}
ret = 0;
@@ -1699,14 +1759,14 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]);
int64_t offset = l2_entry & L2E_OFFSET_MASK;
int cluster_type = qcow2_get_cluster_type(l2_entry);
bool preallocated = offset != 0;
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry);
if (cluster_type != QCOW2_CLUSTER_ZERO) {
if (cluster_type != QCOW2_CLUSTER_ZERO_PLAIN &&
cluster_type != QCOW2_CLUSTER_ZERO_ALLOC) {
continue;
}
if (!preallocated) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
if (!bs->backing) {
/* not backed; therefore we can simply deallocate the
* cluster */
@@ -1741,7 +1801,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
"%#" PRIx64 " unaligned (L2 offset: %#"
PRIx64 ", L2 index: %#x)", offset,
l2_offset, j);
if (!preallocated) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}
@@ -1751,7 +1811,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
if (ret < 0) {
if (!preallocated) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}
@@ -1760,7 +1820,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
ret = bdrv_pwrite_zeroes(bs->file, offset, s->cluster_size, 0);
if (ret < 0) {
if (!preallocated) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}

View File

@@ -1028,18 +1028,17 @@ void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
}
break;
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO:
if (l2_entry & L2E_OFFSET_MASK) {
if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) {
qcow2_signal_corruption(bs, false, -1, -1,
"Cannot free unaligned cluster %#llx",
l2_entry & L2E_OFFSET_MASK);
} else {
qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
nb_clusters << s->cluster_bits, type);
}
case QCOW2_CLUSTER_ZERO_ALLOC:
if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) {
qcow2_signal_corruption(bs, false, -1, -1,
"Cannot free unaligned cluster %#llx",
l2_entry & L2E_OFFSET_MASK);
} else {
qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
nb_clusters << s->cluster_bits, type);
}
break;
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED:
break;
default:
@@ -1059,9 +1058,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
int64_t l1_table_offset, int l1_size, int addend)
{
BDRVQcow2State *s = bs->opaque;
uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2, refcount;
uint64_t *l1_table, *l2_table, l2_offset, entry, l1_size2, refcount;
bool l1_allocated = false;
int64_t old_offset, old_l2_offset;
int64_t old_entry, old_l2_offset;
int i, j, l1_modified = 0, nb_csectors;
int ret;
@@ -1089,15 +1088,16 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
goto fail;
}
for(i = 0;i < l1_size; i++)
for (i = 0; i < l1_size; i++) {
be64_to_cpus(&l1_table[i]);
}
} else {
assert(l1_size == s->l1_size);
l1_table = s->l1_table;
l1_allocated = false;
}
for(i = 0; i < l1_size; i++) {
for (i = 0; i < l1_size; i++) {
l2_offset = l1_table[i];
if (l2_offset) {
old_l2_offset = l2_offset;
@@ -1117,81 +1117,79 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
goto fail;
}
for(j = 0; j < s->l2_size; j++) {
for (j = 0; j < s->l2_size; j++) {
uint64_t cluster_index;
uint64_t offset;
offset = be64_to_cpu(l2_table[j]);
old_offset = offset;
offset &= ~QCOW_OFLAG_COPIED;
entry = be64_to_cpu(l2_table[j]);
old_entry = entry;
entry &= ~QCOW_OFLAG_COPIED;
offset = entry & L2E_OFFSET_MASK;
switch (qcow2_get_cluster_type(offset)) {
case QCOW2_CLUSTER_COMPRESSED:
nb_csectors = ((offset >> s->csize_shift) &
s->csize_mask) + 1;
if (addend != 0) {
ret = update_refcount(bs,
(offset & s->cluster_offset_mask) & ~511,
switch (qcow2_get_cluster_type(entry)) {
case QCOW2_CLUSTER_COMPRESSED:
nb_csectors = ((entry >> s->csize_shift) &
s->csize_mask) + 1;
if (addend != 0) {
ret = update_refcount(bs,
(entry & s->cluster_offset_mask) & ~511,
nb_csectors * 512, abs(addend), addend < 0,
QCOW2_DISCARD_SNAPSHOT);
if (ret < 0) {
goto fail;
}
}
/* compressed clusters are never modified */
refcount = 2;
break;
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO:
if (offset_into_cluster(s, offset & L2E_OFFSET_MASK)) {
qcow2_signal_corruption(bs, true, -1, -1, "Data "
"cluster offset %#llx "
"unaligned (L2 offset: %#"
PRIx64 ", L2 index: %#x)",
offset & L2E_OFFSET_MASK,
l2_offset, j);
ret = -EIO;
goto fail;
}
cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
if (!cluster_index) {
/* unallocated */
refcount = 0;
break;
}
if (addend != 0) {
ret = qcow2_update_cluster_refcount(bs,
cluster_index, abs(addend), addend < 0,
QCOW2_DISCARD_SNAPSHOT);
if (ret < 0) {
goto fail;
}
}
ret = qcow2_get_refcount(bs, cluster_index, &refcount);
if (ret < 0) {
goto fail;
}
break;
}
/* compressed clusters are never modified */
refcount = 2;
break;
case QCOW2_CLUSTER_UNALLOCATED:
refcount = 0;
break;
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO_ALLOC:
if (offset_into_cluster(s, offset)) {
qcow2_signal_corruption(bs, true, -1, -1, "Cluster "
"allocation offset %#" PRIx64
" unaligned (L2 offset: %#"
PRIx64 ", L2 index: %#x)",
offset, l2_offset, j);
ret = -EIO;
goto fail;
}
default:
abort();
cluster_index = offset >> s->cluster_bits;
assert(cluster_index);
if (addend != 0) {
ret = qcow2_update_cluster_refcount(bs,
cluster_index, abs(addend), addend < 0,
QCOW2_DISCARD_SNAPSHOT);
if (ret < 0) {
goto fail;
}
}
ret = qcow2_get_refcount(bs, cluster_index, &refcount);
if (ret < 0) {
goto fail;
}
break;
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED:
refcount = 0;
break;
default:
abort();
}
if (refcount == 1) {
offset |= QCOW_OFLAG_COPIED;
entry |= QCOW_OFLAG_COPIED;
}
if (offset != old_offset) {
if (entry != old_entry) {
if (addend > 0) {
qcow2_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
l2_table[j] = cpu_to_be64(entry);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache,
l2_table);
}
@@ -1441,12 +1439,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
}
break;
case QCOW2_CLUSTER_ZERO:
if ((l2_entry & L2E_OFFSET_MASK) == 0) {
break;
}
/* fall through */
case QCOW2_CLUSTER_ZERO_ALLOC:
case QCOW2_CLUSTER_NORMAL:
{
uint64_t offset = l2_entry & L2E_OFFSET_MASK;
@@ -1476,6 +1469,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
break;
}
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED:
break;
@@ -1638,10 +1632,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]);
uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
int cluster_type = qcow2_get_cluster_type(l2_entry);
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry);
if ((cluster_type == QCOW2_CLUSTER_NORMAL) ||
((cluster_type == QCOW2_CLUSTER_ZERO) && (data_offset != 0))) {
if (cluster_type == QCOW2_CLUSTER_NORMAL ||
cluster_type == QCOW2_CLUSTER_ZERO_ALLOC) {
ret = qcow2_get_refcount(bs,
data_offset >> s->cluster_bits,
&refcount);

View File

@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
/* The VM state isn't needed any more in the active L1 table; in fact, it
* hurts by causing expensive COW for the next snapshot. */
qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
align_offset(sn->vm_state_size, s->cluster_size)
>> BDRV_SECTOR_BITS,
QCOW2_DISCARD_NEVER, false);
qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
align_offset(sn->vm_state_size, s->cluster_size),
QCOW2_DISCARD_NEVER, false);
#ifdef DEBUG_ALLOC
{

View File

@@ -1385,7 +1385,7 @@ static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
*file = bs->file->bs;
status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
}
if (ret == QCOW2_CLUSTER_ZERO) {
if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
status |= BDRV_BLOCK_ZERO;
} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
status |= BDRV_BLOCK_DATA;
@@ -1482,7 +1482,8 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
}
break;
case QCOW2_CLUSTER_ZERO:
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_ZERO_ALLOC:
qemu_iovec_memset(&hd_qiov, 0, 0, cur_bytes);
break;
@@ -2139,7 +2140,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* too, as long as the bulk is allocated here). Therefore, using
* floating point arithmetic is fine. */
int64_t meta_size = 0;
uint64_t nreftablee, nrefblocke, nl1e, nl2e;
uint64_t nreftablee, nrefblocke, nl1e, nl2e, refblock_count;
int64_t aligned_total_size = align_offset(total_size, cluster_size);
int refblock_bits, refblock_size;
/* refcount entry size in bytes */
@@ -2182,11 +2183,12 @@ static int qcow2_create2(const char *filename, int64_t total_size,
nrefblocke = (aligned_total_size + meta_size + cluster_size)
/ (cluster_size - rces - rces * sizeof(uint64_t)
/ cluster_size);
meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
refblock_count = DIV_ROUND_UP(nrefblocke, refblock_size);
meta_size += refblock_count * cluster_size;
/* total size of refcount tables */
nreftablee = nrefblocke / refblock_size;
nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
nreftablee = align_offset(refblock_count,
cluster_size / sizeof(uint64_t));
meta_size += nreftablee * sizeof(uint64_t);
qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
@@ -2449,6 +2451,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
BlockDriverState *file;
int64_t res;
if (start + count > bs->total_sectors) {
count = bs->total_sectors - start;
}
if (!count) {
return true;
}
@@ -2467,6 +2473,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
uint32_t tail = (offset + count) % s->cluster_size;
trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
tail = 0;
}
if (head || tail) {
int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
@@ -2490,7 +2499,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
count = s->cluster_size;
nr = s->cluster_size;
ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
if (ret != QCOW2_CLUSTER_UNALLOCATED &&
ret != QCOW2_CLUSTER_ZERO_PLAIN &&
ret != QCOW2_CLUSTER_ZERO_ALLOC) {
qemu_co_mutex_unlock(&s->lock);
return -ENOTSUP;
}
@@ -2501,7 +2512,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);
/* Whatever is left can use real zero clusters */
ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
ret = qcow2_cluster_zeroize(bs, offset, count, flags);
qemu_co_mutex_unlock(&s->lock);
return ret;
@@ -2524,8 +2535,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
}
qemu_co_mutex_lock(&s->lock);
ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
QCOW2_DISCARD_REQUEST, false);
ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
false);
qemu_co_mutex_unlock(&s->lock);
return ret;
}
@@ -2832,9 +2843,8 @@ fail:
static int qcow2_make_empty(BlockDriverState *bs)
{
BDRVQcow2State *s = bs->opaque;
uint64_t start_sector;
int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
BDRV_SECTOR_SIZE);
uint64_t offset, end_offset;
int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
int l1_clusters, ret = 0;
l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2851,18 +2861,15 @@ static int qcow2_make_empty(BlockDriverState *bs)
/* This fallback code simply discards every active cluster; this is slow,
* but works in all cases */
for (start_sector = 0; start_sector < bs->total_sectors;
start_sector += sector_step)
{
end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
for (offset = 0; offset < end_offset; offset += step) {
/* As this function is generally used after committing an external
* snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
* default action for this kind of discard is to pass the discard,
* which will ideally result in an actually smaller image file, as
* is probably desired. */
ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
MIN(sector_step,
bs->total_sectors - start_sector),
QCOW2_DISCARD_SNAPSHOT, true);
ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
QCOW2_DISCARD_SNAPSHOT, true);
if (ret < 0) {
break;
}

View File

@@ -322,6 +322,9 @@ typedef struct QCowL2Meta
/** Number of newly allocated clusters */
int nb_clusters;
/** Do not free the old clusters */
bool keep_old_clusters;
/**
* Requests that overlap with this allocation and wait to be restarted
* when the allocating request has completed.
@@ -346,12 +349,13 @@ typedef struct QCowL2Meta
QLIST_ENTRY(QCowL2Meta) next_in_flight;
} QCowL2Meta;
enum {
typedef enum QCow2ClusterType {
QCOW2_CLUSTER_UNALLOCATED,
QCOW2_CLUSTER_ZERO_PLAIN,
QCOW2_CLUSTER_ZERO_ALLOC,
QCOW2_CLUSTER_NORMAL,
QCOW2_CLUSTER_COMPRESSED,
QCOW2_CLUSTER_ZERO
};
} QCow2ClusterType;
typedef enum QCow2MetadataOverlap {
QCOW2_OL_MAIN_HEADER_BITNR = 0,
@@ -440,12 +444,15 @@ static inline uint64_t qcow2_max_refcount_clusters(BDRVQcow2State *s)
return QCOW_MAX_REFTABLE_SIZE >> s->cluster_bits;
}
static inline int qcow2_get_cluster_type(uint64_t l2_entry)
static inline QCow2ClusterType qcow2_get_cluster_type(uint64_t l2_entry)
{
if (l2_entry & QCOW_OFLAG_COMPRESSED) {
return QCOW2_CLUSTER_COMPRESSED;
} else if (l2_entry & QCOW_OFLAG_ZERO) {
return QCOW2_CLUSTER_ZERO;
if (l2_entry & L2E_OFFSET_MASK) {
return QCOW2_CLUSTER_ZERO_ALLOC;
}
return QCOW2_CLUSTER_ZERO_PLAIN;
} else if (!(l2_entry & L2E_OFFSET_MASK)) {
return QCOW2_CLUSTER_UNALLOCATED;
} else {
@@ -544,10 +551,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
int compressed_size);
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
int nb_sectors, enum qcow2_discard_type type, bool full_discard);
int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
int flags);
int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, enum qcow2_discard_type type,
bool full_discard);
int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, int flags);
int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,

View File

@@ -22,9 +22,17 @@
#include "qapi/error.h"
#include "replication.h"
typedef enum {
BLOCK_REPLICATION_NONE, /* block replication is not started */
BLOCK_REPLICATION_RUNNING, /* block replication is running */
BLOCK_REPLICATION_FAILOVER, /* failover is running in background */
BLOCK_REPLICATION_FAILOVER_FAILED, /* failover failed */
BLOCK_REPLICATION_DONE, /* block replication is done */
} ReplicationStage;
typedef struct BDRVReplicationState {
ReplicationMode mode;
int replication_state;
ReplicationStage stage;
BdrvChild *active_disk;
BdrvChild *hidden_disk;
BdrvChild *secondary_disk;
@@ -36,14 +44,6 @@ typedef struct BDRVReplicationState {
int error;
} BDRVReplicationState;
enum {
BLOCK_REPLICATION_NONE, /* block replication is not started */
BLOCK_REPLICATION_RUNNING, /* block replication is running */
BLOCK_REPLICATION_FAILOVER, /* failover is running in background */
BLOCK_REPLICATION_FAILOVER_FAILED, /* failover failed */
BLOCK_REPLICATION_DONE, /* block replication is done */
};
static void replication_start(ReplicationState *rs, ReplicationMode mode,
Error **errp);
static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
@@ -141,10 +141,10 @@ static void replication_close(BlockDriverState *bs)
{
BDRVReplicationState *s = bs->opaque;
if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
if (s->stage == BLOCK_REPLICATION_RUNNING) {
replication_stop(s->rs, false, NULL);
}
if (s->replication_state == BLOCK_REPLICATION_FAILOVER) {
if (s->stage == BLOCK_REPLICATION_FAILOVER) {
block_job_cancel_sync(s->active_disk->bs->job);
}
@@ -174,7 +174,7 @@ static int64_t replication_getlength(BlockDriverState *bs)
static int replication_get_io_status(BDRVReplicationState *s)
{
switch (s->replication_state) {
switch (s->stage) {
case BLOCK_REPLICATION_NONE:
return -EIO;
case BLOCK_REPLICATION_RUNNING:
@@ -403,7 +403,7 @@ static void backup_job_completed(void *opaque, int ret)
BlockDriverState *bs = opaque;
BDRVReplicationState *s = bs->opaque;
if (s->replication_state != BLOCK_REPLICATION_FAILOVER) {
if (s->stage != BLOCK_REPLICATION_FAILOVER) {
/* The backup job is cancelled unexpectedly */
s->error = -EIO;
}
@@ -445,7 +445,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->replication_state != BLOCK_REPLICATION_NONE) {
if (s->stage != BLOCK_REPLICATION_NONE) {
error_setg(errp, "Block replication is running or done");
aio_context_release(aio_context);
return;
@@ -545,7 +545,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
abort();
}
s->replication_state = BLOCK_REPLICATION_RUNNING;
s->stage = BLOCK_REPLICATION_RUNNING;
if (s->mode == REPLICATION_MODE_SECONDARY) {
secondary_do_checkpoint(s, errp);
@@ -581,7 +581,7 @@ static void replication_get_error(ReplicationState *rs, Error **errp)
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
if (s->stage != BLOCK_REPLICATION_RUNNING) {
error_setg(errp, "Block replication is not running");
aio_context_release(aio_context);
return;
@@ -601,7 +601,7 @@ static void replication_done(void *opaque, int ret)
BDRVReplicationState *s = bs->opaque;
if (ret == 0) {
s->replication_state = BLOCK_REPLICATION_DONE;
s->stage = BLOCK_REPLICATION_DONE;
/* refresh top bs's filename */
bdrv_refresh_filename(bs);
@@ -610,7 +610,7 @@ static void replication_done(void *opaque, int ret)
s->hidden_disk = NULL;
s->error = 0;
} else {
s->replication_state = BLOCK_REPLICATION_FAILOVER_FAILED;
s->stage = BLOCK_REPLICATION_FAILOVER_FAILED;
s->error = -EIO;
}
}
@@ -625,7 +625,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
if (s->stage != BLOCK_REPLICATION_RUNNING) {
error_setg(errp, "Block replication is not running");
aio_context_release(aio_context);
return;
@@ -633,7 +633,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
switch (s->mode) {
case REPLICATION_MODE_PRIMARY:
s->replication_state = BLOCK_REPLICATION_DONE;
s->stage = BLOCK_REPLICATION_DONE;
s->error = 0;
break;
case REPLICATION_MODE_SECONDARY:
@@ -648,12 +648,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
if (!failover) {
secondary_do_checkpoint(s, errp);
s->replication_state = BLOCK_REPLICATION_DONE;
s->stage = BLOCK_REPLICATION_DONE;
aio_context_release(aio_context);
return;
}
s->replication_state = BLOCK_REPLICATION_FAILOVER;
s->stage = BLOCK_REPLICATION_FAILOVER;
commit_active_start(NULL, s->active_disk->bs, s->secondary_disk->bs,
BLOCK_JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
NULL, replication_done, bs, true, errp);

View File

@@ -55,7 +55,7 @@
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/coroutine.h"
#include "qemu/cutils.h"
#include "qemu/uuid.h"

View File

@@ -24,7 +24,7 @@
#include "qemu/crc32c.h"
#include "qemu/bswap.h"
#include "block/vhdx.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/uuid.h"
/* Options for VHDX creation */

View File

@@ -31,7 +31,7 @@
#include "qemu/error-report.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/cutils.h"
#include <zlib.h>

View File

@@ -28,7 +28,7 @@
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/bswap.h"
#include "qemu/uuid.h"

View File

@@ -28,7 +28,7 @@
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qstring.h"

View File

@@ -2923,10 +2923,9 @@ void qmp_block_resize(bool has_device, const char *device,
goto out;
}
/* complete all in-flight operations before resizing the device */
bdrv_drain_all();
bdrv_drained_begin(bs);
ret = blk_truncate(blk, size, errp);
bdrv_drained_end(bs);
out:
blk_unref(blk);
@@ -3151,6 +3150,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
Error *local_err = NULL;
int flags;
int64_t size;
bool set_backing_hd = false;
if (!backup->has_speed) {
backup->speed = 0;
@@ -3201,6 +3201,8 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
}
if (backup->sync == MIRROR_SYNC_MODE_NONE) {
source = bs;
flags |= BDRV_O_NO_BACKING;
set_backing_hd = true;
}
size = bdrv_getlength(bs);
@@ -3227,7 +3229,9 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
}
if (backup->format) {
options = qdict_new();
if (!options) {
options = qdict_new();
}
qdict_put_str(options, "driver", backup->format);
}
@@ -3238,6 +3242,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
bdrv_set_aio_context(target_bs, aio_context);
if (set_backing_hd) {
bdrv_set_backing_hd(target_bs, source, &local_err);
if (local_err) {
bdrv_unref(target_bs);
goto out;
}
}
if (backup->has_bitmap) {
bmap = bdrv_find_dirty_bitmap(bs, backup->bitmap);
if (!bmap) {

View File

@@ -744,10 +744,7 @@ int main(int argc, char **argv)
qemu_init_cpu_list();
module_call_init(MODULE_INIT_QOM);
if ((envlist = envlist_create()) == NULL) {
(void) fprintf(stderr, "Unable to allocate envlist\n");
exit(1);
}
envlist = envlist_create();
/* add current environment into the list */
for (wrk = environ; *wrk != NULL; wrk++) {
@@ -785,10 +782,7 @@ int main(int argc, char **argv)
usage();
} else if (!strcmp(r, "ignore-environment")) {
envlist_free(envlist);
if ((envlist = envlist_create()) == NULL) {
(void) fprintf(stderr, "Unable to allocate envlist\n");
exit(1);
}
envlist = envlist_create();
} else if (!strcmp(r, "U")) {
r = argv[optind++];
if (envlist_unsetenv(envlist, r) != 0)
@@ -956,10 +950,10 @@ int main(int argc, char **argv)
}
for (wrk = target_environ; *wrk; wrk++) {
free(*wrk);
g_free(*wrk);
}
free(target_environ);
g_free(target_environ);
if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
qemu_log("guest_base 0x%lx\n", guest_base);

24
configure vendored
View File

@@ -611,6 +611,7 @@ NetBSD)
audio_possible_drivers="oss sdl"
oss_lib="-lossaudio"
HOST_VARIANT_DIR="netbsd"
supported_os="yes"
;;
OpenBSD)
bsd="yes"
@@ -1334,7 +1335,7 @@ Advanced options (experts only):
--oss-lib path to OSS library
--cpu=CPU Build for host CPU [$cpu]
--with-coroutine=BACKEND coroutine backend. Supported options:
gthread, ucontext, sigaltstack, windows
ucontext, sigaltstack, windows
--enable-gcov enable test coverage analysis with gcov
--gcov=GCOV use specified gcov [$gcov_tool]
--disable-blobs disable installing provided firmware blobs
@@ -2014,7 +2015,7 @@ if test "$xen" != "no" ; then
else
xen_libs="-lxenstore -lxenctrl -lxenguest"
xen_stable_libs="-lxencall -lxenforeignmemory -lxengnttab -lxenevtchn"
xen_stable_libs="-lxenforeignmemory -lxengnttab -lxenevtchn"
# First we test whether Xen headers and libraries are available.
# If no, we are done and there is no Xen support.
@@ -4418,10 +4419,8 @@ fi
# check and set a backend for coroutine
# We prefer ucontext, but it's not always possible. The fallback
# is sigcontext. gthread is not selectable except explicitly, because
# it is not functional enough to run QEMU proper. (It is occasionally
# useful for debugging purposes.) On Windows the only valid backend
# is the Windows-specific one.
# is sigcontext. On Windows the only valid backend is the Windows
# specific one.
ucontext_works=no
if test "$darwin" != "yes"; then
@@ -4460,7 +4459,7 @@ else
feature_not_found "ucontext"
fi
;;
gthread|sigaltstack)
sigaltstack)
if test "$mingw32" = "yes"; then
error_exit "only the 'windows' coroutine backend is valid for Windows"
fi
@@ -4472,14 +4471,7 @@ else
fi
if test "$coroutine_pool" = ""; then
if test "$coroutine" = "gthread"; then
coroutine_pool=no
else
coroutine_pool=yes
fi
fi
if test "$coroutine" = "gthread" -a "$coroutine_pool" = "yes"; then
error_exit "'gthread' coroutine backend does not support pool (use --disable-coroutine-pool)"
coroutine_pool=yes
fi
if test "$debug_stack_usage" = "yes"; then
@@ -6110,12 +6102,14 @@ case "$target_name" in
ppc64)
TARGET_BASE_ARCH=ppc
TARGET_ABI_DIR=ppc
mttcg=yes
gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml"
;;
ppc64le)
TARGET_ARCH=ppc64
TARGET_BASE_ARCH=ppc
TARGET_ABI_DIR=ppc
mttcg=yes
gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml"
;;
ppc64abi32)

View File

@@ -1031,6 +1031,11 @@ vu_queue_get_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int *in_bytes,
idx = vq->last_avail_idx;
total_bufs = in_total = out_total = 0;
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
goto done;
}
while ((rc = virtqueue_num_heads(dev, vq, idx)) > 0) {
unsigned int max, num_bufs, indirect = 0;
struct vring_desc *desc;
@@ -1121,11 +1126,16 @@ vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
/* Fetch avail_idx from VQ memory only when we really need to know if
* guest has added some buffers. */
int
bool
vu_queue_empty(VuDev *dev, VuVirtq *vq)
{
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
return true;
}
if (vq->shadow_avail_idx != vq->last_avail_idx) {
return 0;
return false;
}
return vring_avail_idx(vq) == vq->last_avail_idx;
@@ -1174,7 +1184,8 @@ vring_notify(VuDev *dev, VuVirtq *vq)
void
vu_queue_notify(VuDev *dev, VuVirtq *vq)
{
if (unlikely(dev->broken)) {
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
return;
}
@@ -1291,7 +1302,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
struct vring_desc *desc;
int rc;
if (unlikely(dev->broken)) {
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
return NULL;
}
@@ -1445,7 +1457,8 @@ vu_queue_fill(VuDev *dev, VuVirtq *vq,
{
struct vring_used_elem uelem;
if (unlikely(dev->broken)) {
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
return;
}
@@ -1474,7 +1487,8 @@ vu_queue_flush(VuDev *dev, VuVirtq *vq, unsigned int count)
{
uint16_t old, new;
if (unlikely(dev->broken)) {
if (unlikely(dev->broken) ||
unlikely(!vq->vring.avail)) {
return;
}

View File

@@ -327,13 +327,13 @@ void vu_queue_set_notification(VuDev *dev, VuVirtq *vq, int enable);
bool vu_queue_enabled(VuDev *dev, VuVirtq *vq);
/**
* vu_queue_enabled:
* vu_queue_empty:
* @dev: a VuDev context
* @vq: a VuVirtq queue
*
* Returns: whether the queue is empty.
* Returns: true if the queue is empty or not ready.
*/
int vu_queue_empty(VuDev *dev, VuVirtq *vq);
bool vu_queue_empty(VuDev *dev, VuVirtq *vq);
/**
* vu_queue_notify:

16
cpus.c
View File

@@ -50,6 +50,7 @@
#include "qapi-event.h"
#include "hw/nmi.h"
#include "sysemu/replay.h"
#include "hw/boards.h"
#ifdef CONFIG_LINUX
@@ -1483,6 +1484,12 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
/* Ignore everything else? */
break;
}
} else if (cpu->unplug) {
qemu_tcg_destroy_vcpu(cpu);
cpu->created = false;
qemu_cond_signal(&qemu_cpu_cond);
qemu_mutex_unlock_iothread();
return NULL;
}
atomic_mb_set(&cpu->exit_request, 0);
@@ -1859,6 +1866,8 @@ void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
CpuInfoList *qmp_query_cpus(Error **errp)
{
MachineState *ms = MACHINE(qdev_get_machine());
MachineClass *mc = MACHINE_GET_CLASS(ms);
CpuInfoList *head = NULL, *cur_item = NULL;
CPUState *cpu;
@@ -1909,6 +1918,13 @@ CpuInfoList *qmp_query_cpus(Error **errp)
#else
info->value->arch = CPU_INFO_ARCH_OTHER;
#endif
info->value->has_props = !!mc->cpu_index_to_instance_props;
if (info->value->has_props) {
CpuInstanceProperties *props;
props = g_malloc0(sizeof(*props));
*props = mc->cpu_index_to_instance_props(ms, cpu->cpu_index);
info->value->props = props;
}
/* XXX: waiting for the qapi to support GSList */
if (!cur_item) {

View File

@@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
tlb_addr = tlbe->addr_write;
}
/* Notice an IO access, or a notdirty page. */
/* Check notdirty */
if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
tlb_set_dirty(ENV_GET_CPU(env), addr);
tlb_addr = tlb_addr & ~TLB_NOTDIRTY;
}
/* Notice an IO access */
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
/* There's really nothing that can be done to
support this apart from stop-the-world. */

View File

@@ -473,9 +473,9 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
* then encrypted.
*/
rv = readfunc(block,
opaque,
slot->key_offset * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
splitkey, splitkeylen,
opaque,
errp);
if (rv < 0) {
goto cleanup;
@@ -676,9 +676,10 @@ qcrypto_block_luks_open(QCryptoBlock *block,
/* Read the entire LUKS header, minus the key material from
* the underlying device */
rv = readfunc(block, opaque, 0,
rv = readfunc(block, 0,
(uint8_t *)&luks->header,
sizeof(luks->header),
opaque,
errp);
if (rv < 0) {
ret = rv;
@@ -1245,7 +1246,7 @@ qcrypto_block_luks_create(QCryptoBlock *block,
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
/* Reserve header space to match payload offset */
initfunc(block, opaque, block->payload_offset, &local_err);
initfunc(block, block->payload_offset, opaque, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto error;
@@ -1267,9 +1268,10 @@ qcrypto_block_luks_create(QCryptoBlock *block,
/* Write out the partition header and key slot headers */
writefunc(block, opaque, 0,
writefunc(block, 0,
(const uint8_t *)&luks->header,
sizeof(luks->header),
opaque,
&local_err);
/* Delay checking local_err until we've byte-swapped */
@@ -1295,10 +1297,11 @@ qcrypto_block_luks_create(QCryptoBlock *block,
/* Write out the master key material, starting at the
* sector immediately following the partition header. */
if (writefunc(block, opaque,
if (writefunc(block,
luks->header.key_slots[0].key_offset *
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
splitkey, splitkeylen,
opaque,
errp) != splitkeylen) {
goto error;
}

View File

@@ -32,6 +32,8 @@
#include <gcrypt.h>
#endif
#include "crypto/random.h"
/* #define DEBUG_GNUTLS */
/*
@@ -146,5 +148,9 @@ int qcrypto_init(Error **errp)
gcry_control(GCRYCTL_INITIALIZATION_FINISHED, 0);
#endif
if (qcrypto_random_init(errp) < 0) {
return -1;
}
return 0;
}

View File

@@ -31,3 +31,5 @@ int qcrypto_random_bytes(uint8_t *buf,
gcry_randomize(buf, buflen, GCRY_STRONG_RANDOM);
return 0;
}
int qcrypto_random_init(Error **errp G_GNUC_UNUSED) { return 0; }

View File

@@ -41,3 +41,6 @@ int qcrypto_random_bytes(uint8_t *buf,
return 0;
}
int qcrypto_random_init(Error **errp G_GNUC_UNUSED) { return 0; }

View File

@@ -22,14 +22,16 @@
#include "crypto/random.h"
int qcrypto_random_bytes(uint8_t *buf G_GNUC_UNUSED,
size_t buflen G_GNUC_UNUSED,
Error **errp)
{
int fd;
int ret = -1;
int got;
#ifdef _WIN32
#include <wincrypt.h>
static HCRYPTPROV hCryptProv;
#else
static int fd; /* a file handle to either /dev/urandom or /dev/random */
#endif
int qcrypto_random_init(Error **errp)
{
#ifndef _WIN32
/* TBD perhaps also add support for BSD getentropy / Linux
* getrandom syscalls directly */
fd = open("/dev/urandom", O_RDONLY);
@@ -41,6 +43,25 @@ int qcrypto_random_bytes(uint8_t *buf G_GNUC_UNUSED,
error_setg(errp, "No /dev/urandom or /dev/random found");
return -1;
}
#else
if (!CryptAcquireContext(&hCryptProv, NULL, NULL, PROV_RSA_FULL,
CRYPT_SILENT | CRYPT_VERIFYCONTEXT)) {
error_setg_win32(errp, GetLastError(),
"Unable to create cryptographic provider");
return -1;
}
#endif
return 0;
}
int qcrypto_random_bytes(uint8_t *buf G_GNUC_UNUSED,
size_t buflen G_GNUC_UNUSED,
Error **errp)
{
#ifndef _WIN32
int ret = -1;
int got;
while (buflen > 0) {
got = read(fd, buf, buflen);
@@ -59,6 +80,14 @@ int qcrypto_random_bytes(uint8_t *buf G_GNUC_UNUSED,
ret = 0;
cleanup:
close(fd);
return ret;
#else
if (!CryptGenRandom(hCryptProv, buflen, buf)) {
error_setg_win32(errp, GetLastError(),
"Unable to read random bytes");
return -1;
}
return 0;
#endif
}

View File

@@ -148,6 +148,7 @@ static void read_fstree(void *fdt, const char *dirname)
d = opendir(dirname);
if (!d) {
error_setg(&error_fatal, "%s cannot open %s", __func__, dirname);
return;
}
while ((de = readdir(d)) != NULL) {

View File

@@ -182,15 +182,13 @@ The appropriate DEVNAME depends on the machine type. For type "pc":
This lets you control I/O ports and IRQs.
* -usbdevice serial:vendorid=VID,productid=PRID becomes
-device usb-serial,vendorid=VID,productid=PRID
* -usbdevice serial::chardev becomes -device usb-serial,chardev=dev.
* -usbdevice braille doesn't support LEGACY-CHARDEV syntax. It always
uses "braille". With -device, this useful default is gone, so you
have to use something like
-device usb-braille,chardev=braille,vendorid=VID,productid=PRID
-chardev braille,id=braille
-device usb-braille,chardev=braille -chardev braille,id=braille
* -virtioconsole becomes
-device virtio-serial-pci,class=C,vectors=V,ioeventfd=IOEVENTFD,max_ports=N

167
exec.c
View File

@@ -71,6 +71,8 @@
#include "qemu/mmap-alloc.h"
#endif
#include "monitor/monitor.h"
//#define DEBUG_SUBPAGE
#if !defined(CONFIG_USER_ONLY)
@@ -463,42 +465,12 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
}
/* Called from RCU critical section */
IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
bool is_write)
{
IOMMUTLBEntry iotlb = {0};
MemoryRegionSection *section;
MemoryRegion *mr;
for (;;) {
AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
section = address_space_lookup_region(d, addr, false);
addr = addr - section->offset_within_address_space
+ section->offset_within_region;
mr = section->mr;
if (!mr->iommu_ops) {
break;
}
iotlb = mr->iommu_ops->translate(mr, addr, is_write);
if (!(iotlb.perm & (1 << is_write))) {
iotlb.target_as = NULL;
break;
}
addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
| (addr & iotlb.addr_mask));
as = iotlb.target_as;
}
return iotlb;
}
/* Called from RCU critical section */
MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
hwaddr *xlat, hwaddr *plen,
bool is_write)
static MemoryRegionSection address_space_do_translate(AddressSpace *as,
hwaddr addr,
hwaddr *xlat,
hwaddr *plen,
bool is_write,
bool is_mmio)
{
IOMMUTLBEntry iotlb;
MemoryRegionSection *section;
@@ -506,7 +478,7 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
for (;;) {
AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
section = address_space_translate_internal(d, addr, &addr, plen, true);
section = address_space_translate_internal(d, addr, &addr, plen, is_mmio);
mr = section->mr;
if (!mr->iommu_ops) {
@@ -518,19 +490,84 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
| (addr & iotlb.addr_mask));
*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
if (!(iotlb.perm & (1 << is_write))) {
mr = &io_mem_unassigned;
break;
goto translate_fail;
}
as = iotlb.target_as;
}
*xlat = addr;
return *section;
translate_fail:
return (MemoryRegionSection) { .mr = &io_mem_unassigned };
}
/* Called from RCU critical section */
IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
bool is_write)
{
MemoryRegionSection section;
hwaddr xlat, plen;
/* Try to get maximum page mask during translation. */
plen = (hwaddr)-1;
/* This can never be MMIO. */
section = address_space_do_translate(as, addr, &xlat, &plen,
is_write, false);
/* Illegal translation */
if (section.mr == &io_mem_unassigned) {
goto iotlb_fail;
}
/* Convert memory region offset into address space offset */
xlat += section.offset_within_address_space -
section.offset_within_region;
if (plen == (hwaddr)-1) {
/*
* We use default page size here. Logically it only happens
* for identity mappings.
*/
plen = TARGET_PAGE_SIZE;
}
/* Convert to address mask */
plen -= 1;
return (IOMMUTLBEntry) {
.target_as = section.address_space,
.iova = addr & ~plen,
.translated_addr = xlat & ~plen,
.addr_mask = plen,
/* IOTLBs are for DMAs, and DMA only allows on RAMs. */
.perm = IOMMU_RW,
};
iotlb_fail:
return (IOMMUTLBEntry) {0};
}
/* Called from RCU critical section */
MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
hwaddr *xlat, hwaddr *plen,
bool is_write)
{
MemoryRegion *mr;
MemoryRegionSection section;
/* This can be MMIO, so setup MMIO bit. */
section = address_space_do_translate(as, addr, xlat, plen, is_write, true);
mr = section.mr;
if (xen_enabled() && memory_access_is_direct(mr, is_write)) {
hwaddr page = ((addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE) - addr;
*plen = MIN(page, *plen);
}
*xlat = addr;
return mr;
}
@@ -978,7 +1015,7 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
if (block && addr - block->offset < block->max_length) {
return block;
}
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
if (addr - block->offset < block->max_length) {
goto found;
}
@@ -1333,6 +1370,26 @@ void qemu_mutex_unlock_ramlist(void)
qemu_mutex_unlock(&ram_list.mutex);
}
void ram_block_dump(Monitor *mon)
{
RAMBlock *block;
char *psize;
rcu_read_lock();
monitor_printf(mon, "%24s %8s %18s %18s %18s\n",
"Block Name", "PSize", "Offset", "Used", "Total");
RAMBLOCK_FOREACH(block) {
psize = size_to_str(block->page_size);
monitor_printf(mon, "%24s %8s 0x%016" PRIx64 " 0x%016" PRIx64
" 0x%016" PRIx64 "\n", block->idstr, psize,
(uint64_t)block->offset,
(uint64_t)block->used_length,
(uint64_t)block->max_length);
g_free(psize);
}
rcu_read_unlock();
}
#ifdef __linux__
/*
* FIXME TOCTTOU: this iterates over memory backends' mem-path, which
@@ -1578,12 +1635,12 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
return 0;
}
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
ram_addr_t end, next = RAM_ADDR_MAX;
end = block->offset + block->max_length;
QLIST_FOREACH_RCU(next_block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(next_block) {
if (next_block->offset >= end) {
next = MIN(next, next_block->offset);
}
@@ -1609,7 +1666,7 @@ unsigned long last_ram_page(void)
ram_addr_t last = 0;
rcu_read_lock();
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
last = MAX(last, block->offset + block->max_length);
}
rcu_read_unlock();
@@ -1659,7 +1716,7 @@ void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState *dev)
pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
rcu_read_lock();
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
if (block != new_block &&
!strcmp(block->idstr, new_block->idstr)) {
fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
@@ -1693,7 +1750,7 @@ size_t qemu_ram_pagesize_largest(void)
RAMBlock *block;
size_t largest = 0;
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
largest = MAX(largest, qemu_ram_pagesize(block));
}
@@ -1839,7 +1896,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
* QLIST (which has an RCU-friendly variant) does not have insertion at
* tail, so save the last element in last_block.
*/
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
last_block = block;
if (block->max_length < new_block->max_length) {
break;
@@ -2021,7 +2078,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
int flags;
void *area, *vaddr;
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
offset = addr - block->offset;
if (offset < block->max_length) {
vaddr = ramblock_ptr(block, offset);
@@ -2084,10 +2141,10 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
* In that case just map until the end of the page.
*/
if (block->offset == 0) {
return xen_map_cache(addr, 0, 0);
return xen_map_cache(addr, 0, 0, false);
}
block->host = xen_map_cache(block->offset, block->max_length, 1);
block->host = xen_map_cache(block->offset, block->max_length, 1, false);
}
return ramblock_ptr(block, addr);
}
@@ -2117,10 +2174,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
* In that case just map the requested area.
*/
if (block->offset == 0) {
return xen_map_cache(addr, *size, 1);
return xen_map_cache(addr, *size, 1, true);
}
block->host = xen_map_cache(block->offset, block->max_length, 1);
block->host = xen_map_cache(block->offset, block->max_length, 1, true);
}
return ramblock_ptr(block, addr);
@@ -2167,7 +2224,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
goto found;
}
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
/* This case append when the block is not mapped. */
if (block->host == NULL) {
continue;
@@ -2200,7 +2257,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name)
{
RAMBlock *block;
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
if (!strcmp(name, block->idstr)) {
return block;
}
@@ -3424,7 +3481,7 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
int ret = 0;
rcu_read_lock();
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
RAMBLOCK_FOREACH(block) {
ret = func(block->idstr, block->host, block->offset,
block->used_length, opaque);
if (ret) {

View File

@@ -785,6 +785,20 @@ STEXI
@item info dump
@findex dump
Display the latest dump status.
ETEXI
{
.name = "ramblock",
.args_type = "",
.params = "",
.help = "Display system ramblock information",
.cmd = hmp_info_ramblock,
},
STEXI
@item info ramblock
@findex ramblock
Dump all the ramblocks of the system.
ETEXI
{

15
hmp.c
View File

@@ -39,6 +39,7 @@
#include "qemu-io.h"
#include "qemu/cutils.h"
#include "qemu/error-report.h"
#include "exec/ramlist.h"
#include "hw/intc/intc.h"
#ifdef CONFIG_SPICE
@@ -1274,17 +1275,22 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
{
int saved_vm_running = runstate_is_running();
const char *name = qdict_get_str(qdict, "name");
Error *err = NULL;
vm_stop(RUN_STATE_RESTORE_VM);
if (load_vmstate(name) == 0 && saved_vm_running) {
if (load_vmstate(name, &err) == 0 && saved_vm_running) {
vm_start();
}
hmp_handle_error(mon, &err);
}
void hmp_savevm(Monitor *mon, const QDict *qdict)
{
save_vmstate(qdict_get_try_str(qdict, "name"));
Error *err = NULL;
save_vmstate(qdict_get_try_str(qdict, "name"), &err);
hmp_handle_error(mon, &err);
}
void hmp_delvm(Monitor *mon, const QDict *qdict)
@@ -2738,6 +2744,11 @@ void hmp_info_dump(Monitor *mon, const QDict *qdict)
qapi_free_DumpQueryResult(result);
}
void hmp_info_ramblock(Monitor *mon, const QDict *qdict)
{
ram_block_dump(mon);
}
void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict)
{
Error *err = NULL;

1
hmp.h
View File

@@ -140,6 +140,7 @@ void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
void hmp_info_dump(Monitor *mon, const QDict *qdict);
void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);

View File

@@ -452,6 +452,11 @@ static off_t local_telldir(FsContext *ctx, V9fsFidOpenState *fs)
return telldir(fs->dir.stream);
}
static bool local_is_mapped_file_metadata(FsContext *fs_ctx, const char *name)
{
return !strcmp(name, VIRTFS_META_DIR);
}
static struct dirent *local_readdir(FsContext *ctx, V9fsFidOpenState *fs)
{
struct dirent *entry;
@@ -465,8 +470,8 @@ again:
if (ctx->export_flags & V9FS_SM_MAPPED) {
entry->d_type = DT_UNKNOWN;
} else if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
if (!strcmp(entry->d_name, VIRTFS_META_DIR)) {
/* skp the meta data directory */
if (local_is_mapped_file_metadata(ctx, entry->d_name)) {
/* skip the meta data directory */
goto again;
}
entry->d_type = DT_UNKNOWN;
@@ -559,6 +564,12 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -605,6 +616,12 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -694,6 +711,12 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
/*
* Mark all the open to not follow symlinks
*/
@@ -752,6 +775,12 @@ static int local_symlink(FsContext *fs_ctx, const char *oldpath,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -826,6 +855,12 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
int ret = -1;
int odirfd, ndirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
odirfd = local_opendir_nofollow(ctx, odirpath);
if (odirfd == -1) {
goto out;
@@ -1096,6 +1131,12 @@ static int local_lremovexattr(FsContext *ctx, V9fsPath *fs_path,
static int local_name_to_path(FsContext *ctx, V9fsPath *dir_path,
const char *name, V9fsPath *target)
{
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
if (dir_path) {
v9fs_path_sprintf(target, "%s/%s", dir_path->data, name);
} else if (strcmp(name, "/")) {
@@ -1116,6 +1157,13 @@ static int local_renameat(FsContext *ctx, V9fsPath *olddir,
int ret;
int odirfd, ndirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
(local_is_mapped_file_metadata(ctx, old_name) ||
local_is_mapped_file_metadata(ctx, new_name))) {
errno = EINVAL;
return -1;
}
odirfd = local_opendir_nofollow(ctx, olddir->data);
if (odirfd == -1) {
return -1;
@@ -1206,6 +1254,12 @@ static int local_unlinkat(FsContext *ctx, V9fsPath *dir,
int ret;
int dirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(ctx, dir->data);
if (dirfd == -1) {
return -1;

View File

@@ -23,7 +23,7 @@
#include "9p-xattr.h"
#include "coth.h"
#include "trace.h"
#include "migration/migration.h"
#include "migration/blocker.h"
int open_fd_hw;
int total_open_fd;

View File

@@ -332,12 +332,14 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
str = g_strdup_printf("ring-ref%u", i);
if (xenstore_read_fe_int(&xen_9pdev->xendev, str,
&xen_9pdev->rings[i].ref) == -1) {
g_free(str);
goto out;
}
g_free(str);
str = g_strdup_printf("event-channel-%u", i);
if (xenstore_read_fe_int(&xen_9pdev->xendev, str,
&xen_9pdev->rings[i].evtchn) == -1) {
g_free(str);
goto out;
}
g_free(str);
@@ -378,7 +380,7 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
if (xen_9pdev->rings[i].evtchndev == NULL) {
goto out;
}
fcntl(xenevtchn_fd(xen_9pdev->rings[i].evtchndev), F_SETFD, FD_CLOEXEC);
qemu_set_cloexec(xenevtchn_fd(xen_9pdev->rings[i].evtchndev));
xen_9pdev->rings[i].local_port = xenevtchn_bind_interdomain
(xen_9pdev->rings[i].evtchndev,
xendev->dom,

View File

@@ -24,6 +24,7 @@
#include "hw/acpi/aml-build.h"
#include "qemu/bswap.h"
#include "qemu/bitops.h"
#include "sysemu/numa.h"
static GArray *build_alloc_array(void)
{
@@ -1599,6 +1600,33 @@ build_rsdt(GArray *table_data, BIOSLinker *linker, GArray *table_offsets,
(void *)rsdt, "RSDT", rsdt_len, 1, oem_id, oem_table_id);
}
/* Build xsdt table */
void
build_xsdt(GArray *table_data, BIOSLinker *linker, GArray *table_offsets,
const char *oem_id, const char *oem_table_id)
{
int i;
unsigned xsdt_entries_offset;
AcpiXsdtDescriptorRev2 *xsdt;
const unsigned table_data_len = (sizeof(uint64_t) * table_offsets->len);
const unsigned xsdt_entry_size = sizeof(xsdt->table_offset_entry[0]);
const size_t xsdt_len = sizeof(*xsdt) + table_data_len;
xsdt = acpi_data_push(table_data, xsdt_len);
xsdt_entries_offset = (char *)xsdt->table_offset_entry - table_data->data;
for (i = 0; i < table_offsets->len; ++i) {
uint64_t ref_tbl_offset = g_array_index(table_offsets, uint32_t, i);
uint64_t xsdt_entry_offset = xsdt_entries_offset + xsdt_entry_size * i;
/* xsdt->table_offset_entry to be filled by Guest linker */
bios_linker_loader_add_pointer(linker,
ACPI_BUILD_TABLE_FILE, xsdt_entry_offset, xsdt_entry_size,
ACPI_BUILD_TABLE_FILE, ref_tbl_offset);
}
build_header(linker, table_data,
(void *)xsdt, "XSDT", xsdt_len, 1, oem_id, oem_table_id);
}
void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
uint64_t len, int node, MemoryAffinityFlags flags)
{
@@ -1609,3 +1637,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
numamem->base_addr = cpu_to_le64(base);
numamem->range_length = cpu_to_le64(len);
}
/*
* ACPI spec 5.2.17 System Locality Distance Information Table
* (Revision 2.0 or later)
*/
void build_slit(GArray *table_data, BIOSLinker *linker)
{
int slit_start, i, j;
slit_start = table_data->len;
acpi_data_push(table_data, sizeof(AcpiTableHeader));
build_append_int_noprefix(table_data, nb_numa_nodes, 8);
for (i = 0; i < nb_numa_nodes; i++) {
for (j = 0; j < nb_numa_nodes; j++) {
assert(numa_info[i].distance[j]);
build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
}
}
build_header(linker, table_data,
(void *)(table_data->data + slit_start),
"SLIT",
table_data->len - slit_start, 1, NULL, NULL);
}

View File

@@ -503,7 +503,6 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
/* build Processor object for each processor */
for (i = 0; i < arch_ids->len; i++) {
int j;
Aml *dev;
Aml *uid = aml_int(i);
GArray *madt_buf = g_array_new(0, 1, 1);
@@ -557,9 +556,9 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
* as a result _PXM is required for all CPUs which might
* be hot-plugged. For simplicity, add it for all CPUs.
*/
j = numa_get_node_for_cpu(i);
if (j < nb_numa_nodes) {
aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
if (arch_ids->cpus[i].props.has_node_id) {
aml_append(dev, aml_name_decl("_PXM",
aml_int(arch_ids->cpus[i].props.node_id)));
}
aml_append(cpus_dev, dev);

View File

@@ -385,7 +385,10 @@ static void piix4_device_plug_cb(HotplugHandler *hotplug_dev,
dev, errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
acpi_pcihp_device_plug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev, errp);
if (!xen_enabled()) {
acpi_pcihp_device_plug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
if (s->cpu_hotplug_legacy) {
legacy_acpi_cpu_plug_cb(hotplug_dev, &s->gpe_cpu, dev, errp);
@@ -408,8 +411,10 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
acpi_memory_unplug_request_cb(hotplug_dev, &s->acpi_memory_hotplug,
dev, errp);
} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
acpi_pcihp_device_unplug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
if (!xen_enabled()) {
acpi_pcihp_device_unplug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) &&
!s->cpu_hotplug_legacy) {
acpi_cpu_unplug_request_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
@@ -700,7 +705,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
* Reason: part of PIIX4 southbridge, needs to be wired up,
* e.g. by mips_malta_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->hotpluggable = false;
hc->plug = piix4_device_plug_cb;
hc->unplug_request = piix4_device_unplug_request_cb;

View File

@@ -1076,7 +1076,7 @@ static void sl_nand_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_sl_nand_info;
dc->props = sl_nand_properties;
/* Reason: init() method uses drive_get() */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo sl_nand_info = {

View File

@@ -364,12 +364,12 @@ static void acpi_dsdt_add_power_button(Aml *scope)
/* RSDP */
static GArray *
build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned rsdt_tbl_offset)
build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned xsdt_tbl_offset)
{
AcpiRsdpDescriptor *rsdp = acpi_data_push(rsdp_table, sizeof *rsdp);
unsigned rsdt_pa_size = sizeof(rsdp->rsdt_physical_address);
unsigned rsdt_pa_offset =
(char *)&rsdp->rsdt_physical_address - rsdp_table->data;
unsigned xsdt_pa_size = sizeof(rsdp->xsdt_physical_address);
unsigned xsdt_pa_offset =
(char *)&rsdp->xsdt_physical_address - rsdp_table->data;
bios_linker_loader_alloc(linker, ACPI_BUILD_RSDP_FILE, rsdp_table, 16,
true /* fseg memory */);
@@ -381,8 +381,8 @@ build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned rsdt_tbl_offset)
/* Address to be filled by Guest linker */
bios_linker_loader_add_pointer(linker,
ACPI_BUILD_RSDP_FILE, rsdt_pa_offset, rsdt_pa_size,
ACPI_BUILD_TABLE_FILE, rsdt_tbl_offset);
ACPI_BUILD_RSDP_FILE, xsdt_pa_offset, xsdt_pa_size,
ACPI_BUILD_TABLE_FILE, xsdt_tbl_offset);
/* Checksum to be filled by Guest linker */
bios_linker_loader_add_checksum(linker, ACPI_BUILD_RSDP_FILE,
@@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
AcpiSystemResourceAffinityTable *srat;
AcpiSratProcessorGiccAffinity *core;
AcpiSratMemoryAffinity *numamem;
int i, j, srat_start;
int i, srat_start;
uint64_t mem_base;
uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
for (i = 0; i < vms->smp_cpus; i++) {
j = numa_get_node_for_cpu(i);
if (j < nb_numa_nodes) {
cpu_node[i] = j;
}
}
MachineClass *mc = MACHINE_GET_CLASS(vms);
const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
srat_start = table_data->len;
srat = acpi_data_push(table_data, sizeof(*srat));
srat->reserved1 = cpu_to_le32(1);
for (i = 0; i < vms->smp_cpus; ++i) {
for (i = 0; i < cpu_list->len; ++i) {
int node_id = cpu_list->cpus[i].props.has_node_id ?
cpu_list->cpus[i].props.node_id : 0;
core = acpi_data_push(table_data, sizeof(*core));
core->type = ACPI_SRAT_PROCESSOR_GICC;
core->length = sizeof(*core);
core->proximity = cpu_to_le32(cpu_node[i]);
core->proximity = cpu_to_le32(node_id);
core->acpi_processor_uid = cpu_to_le32(i);
core->flags = cpu_to_le32(1);
}
g_free(cpu_node);
mem_base = vms->memmap[VIRT_MEM].base;
for (i = 0; i < nb_numa_nodes; ++i) {
@@ -659,7 +654,7 @@ static void build_fadt(GArray *table_data, BIOSLinker *linker,
VirtMachineState *vms, unsigned dsdt_tbl_offset)
{
AcpiFadtDescriptorRev5_1 *fadt = acpi_data_push(table_data, sizeof(*fadt));
unsigned dsdt_entry_offset = (char *)&fadt->dsdt - table_data->data;
unsigned xdsdt_entry_offset = (char *)&fadt->x_dsdt - table_data->data;
uint16_t bootflags;
switch (vms->psci_conduit) {
@@ -685,7 +680,7 @@ static void build_fadt(GArray *table_data, BIOSLinker *linker,
/* DSDT address to be filled by Guest linker */
bios_linker_loader_add_pointer(linker,
ACPI_BUILD_TABLE_FILE, dsdt_entry_offset, sizeof(fadt->dsdt),
ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->x_dsdt),
ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
build_header(linker, table_data,
@@ -748,7 +743,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
{
VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
GArray *table_offsets;
unsigned dsdt, rsdt;
unsigned dsdt, xsdt;
GArray *tables_blob = tables->table_data;
table_offsets = g_array_new(false, true /* clear */,
@@ -788,12 +783,12 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
build_iort(tables_blob, tables->linker);
}
/* RSDT is pointed to by RSDP */
rsdt = tables_blob->len;
build_rsdt(tables_blob, tables->linker, table_offsets, NULL, NULL);
/* XSDT is pointed to by RSDP */
xsdt = tables_blob->len;
build_xsdt(tables_blob, tables->linker, table_offsets, NULL, NULL);
/* RSDP is in FSEG memory, so allocate it separately */
build_rsdp(tables->rsdp, tables->linker, rsdt);
build_rsdp(tables->rsdp, tables->linker, xsdt);
/* Cleanup memory that's no longer used. */
g_array_free(table_offsets, true);

View File

@@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
{
int cpu;
int addr_cells = 1;
unsigned int i;
const MachineState *ms = MACHINE(vms);
/*
* From Documentation/devicetree/bindings/arm/cpus.txt
@@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
CPUState *cs = CPU(armcpu);
qemu_fdt_add_subnode(vms->fdt, nodename);
qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
@@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
armcpu->mp_affinity);
}
i = numa_get_node_for_cpu(cpu);
if (i < nb_numa_nodes) {
qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
}
g_free(nodename);
@@ -1194,10 +1195,35 @@ void virt_machine_done(Notifier *notifier, void *data)
virt_build_smbios(vms);
}
static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
{
uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
if (!vmc->disallow_affinity_adjustment) {
/* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
* GIC's target-list limitations. 32-bit KVM hosts currently
* always create clusters of 4 CPUs, but that is expected to
* change when they gain support for gicv3. When KVM is enabled
* it will override the changes we make here, therefore our
* purposes are to make TCG consistent (with 64-bit KVM hosts)
* and to improve SGI efficiency.
*/
if (vms->gic_version == 3) {
clustersz = GICV3_TARGETLIST_BITS;
} else {
clustersz = GIC_TARGETLIST_BITS;
}
}
return arm_cpu_mp_affinity(idx, clustersz);
}
static void machvirt_init(MachineState *machine)
{
VirtMachineState *vms = VIRT_MACHINE(machine);
VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
MachineClass *mc = MACHINE_GET_CLASS(machine);
const CPUArchIdList *possible_cpus;
qemu_irq pic[NUM_IRQS];
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *secure_sysmem = NULL;
@@ -1210,7 +1236,6 @@ static void machvirt_init(MachineState *machine)
CPUClass *cc;
Error *err = NULL;
bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
uint8_t clustersz;
if (!cpu_model) {
cpu_model = "cortex-a15";
@@ -1263,10 +1288,8 @@ static void machvirt_init(MachineState *machine)
*/
if (vms->gic_version == 3) {
virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
clustersz = GICV3_TARGETLIST_BITS;
} else {
virt_max_cpus = GIC_NCPU;
clustersz = GIC_TARGETLIST_BITS;
}
if (max_cpus > virt_max_cpus) {
@@ -1324,21 +1347,35 @@ static void machvirt_init(MachineState *machine)
exit(1);
}
for (n = 0; n < smp_cpus; n++) {
Object *cpuobj = object_new(typename);
if (!vmc->disallow_affinity_adjustment) {
/* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
* GIC's target-list limitations. 32-bit KVM hosts currently
* always create clusters of 4 CPUs, but that is expected to
* change when they gain support for gicv3. When KVM is enabled
* it will override the changes we make here, therefore our
* purposes are to make TCG consistent (with 64-bit KVM hosts)
* and to improve SGI efficiency.
*/
uint8_t aff1 = n / clustersz;
uint8_t aff0 = n % clustersz;
object_property_set_int(cpuobj, (aff1 << ARM_AFF1_SHIFT) | aff0,
"mp-affinity", NULL);
possible_cpus = mc->possible_cpu_arch_ids(machine);
for (n = 0; n < possible_cpus->len; n++) {
Object *cpuobj;
CPUState *cs;
int node_id;
if (n >= smp_cpus) {
break;
}
cpuobj = object_new(typename);
object_property_set_int(cpuobj, possible_cpus->cpus[n].arch_id,
"mp-affinity", NULL);
cs = CPU(cpuobj);
cs->cpu_index = n;
node_id = possible_cpus->cpus[cs->cpu_index].props.node_id;
if (!possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
/* by default CPUState::numa_node was 0 if it's not set via CLI
* keep it this way for now but in future we probably should
* refuse to start up with incomplete numa mapping */
node_id = 0;
}
if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
cs->numa_node = node_id;
} else {
/* CPU isn't device_add compatible yet, this shouldn't happen */
error_setg(&error_abort, "user set node-id not implemented");
}
if (!vms->secure) {
@@ -1518,6 +1555,46 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
}
}
static CpuInstanceProperties
virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
assert(cpu_index < possible_cpus->len);
return possible_cpus->cpus[cpu_index].props;
}
static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
{
int n;
VirtMachineState *vms = VIRT_MACHINE(ms);
if (ms->possible_cpus) {
assert(ms->possible_cpus->len == max_cpus);
return ms->possible_cpus;
}
ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
sizeof(CPUArchId) * max_cpus);
ms->possible_cpus->len = max_cpus;
for (n = 0; n < ms->possible_cpus->len; n++) {
ms->possible_cpus->cpus[n].arch_id =
virt_cpu_mp_affinity(vms, n);
ms->possible_cpus->cpus[n].props.has_thread_id = true;
ms->possible_cpus->cpus[n].props.thread_id = n;
/* default distribution of CPUs over NUMA nodes */
if (nb_numa_nodes) {
/* preset values but do not enable them i.e. 'has_node_id = false',
* numa init code will enable them later if manual mapping wasn't
* present on CLI */
ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;
}
}
return ms->possible_cpus;
}
static void virt_machine_class_init(ObjectClass *oc, void *data)
{
MachineClass *mc = MACHINE_CLASS(oc);
@@ -1534,6 +1611,8 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
mc->pci_allow_0_address = true;
/* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
mc->minimum_page_bits = 12;
mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
}
static const TypeInfo virt_machine_info = {

View File

@@ -292,7 +292,7 @@ static void mv88w8618_audio_class_init(ObjectClass *klass, void *data)
dc->vmsd = &mv88w8618_audio_vmsd;
dc->props = mv88w8618_audio_properties;
/* Reason: pointer property "wm8750" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo mv88w8618_audio_info = {

View File

@@ -223,7 +223,7 @@ static void pcspk_class_initfn(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_spk;
dc->props = pcspk_properties;
/* Reason: realize sets global pcspk_state */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo pcspk_info = {

View File

@@ -227,6 +227,29 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
return NVME_NO_COMPLETE;
}
static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
NvmeRequest *req)
{
NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
const uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
const uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
uint64_t aio_slba = slba << (data_shift - BDRV_SECTOR_BITS);
uint32_t aio_nlb = nlb << (data_shift - BDRV_SECTOR_BITS);
if (slba + nlb > ns->id_ns.nsze) {
return NVME_LBA_RANGE | NVME_DNR;
}
req->has_sg = false;
block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
BLOCK_ACCT_WRITE);
req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, aio_slba, aio_nlb,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
return NVME_NO_COMPLETE;
}
static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
NvmeRequest *req)
{
@@ -279,6 +302,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
switch (cmd->opcode) {
case NVME_CMD_FLUSH:
return nvme_flush(n, ns, cmd, req);
case NVME_CMD_WRITE_ZEROS:
return nvme_write_zeros(n, ns, cmd, req);
case NVME_CMD_WRITE:
case NVME_CMD_READ:
return nvme_rw(n, ns, cmd, req);
@@ -895,6 +920,7 @@ static int nvme_init(PCIDevice *pci_dev)
id->sqes = (0x6 << 4) | 0x6;
id->cqes = (0x4 << 4) | 0x4;
id->nn = cpu_to_le32(n->num_namespaces);
id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROS);
id->psd[0].mp = cpu_to_le16(0x9c4);
id->psd[0].enlat = cpu_to_le32(0x10);
id->psd[0].exlat = cpu_to_le32(0x4);

View File

@@ -179,6 +179,7 @@ enum NvmeIoCommands {
NVME_CMD_READ = 0x02,
NVME_CMD_WRITE_UNCOR = 0x04,
NVME_CMD_COMPARE = 0x05,
NVME_CMD_WRITE_ZEROS = 0x08,
NVME_CMD_DSM = 0x09,
};

View File

@@ -42,9 +42,7 @@ static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
static void virtio_blk_free_request(VirtIOBlockReq *req)
{
if (req) {
g_free(req);
}
g_free(req);
}
static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)

View File

@@ -137,20 +137,21 @@ static void generic_loader_realize(DeviceState *dev, Error **errp)
#endif
if (s->file) {
AddressSpace *as = s->cpu ? s->cpu->as : NULL;
if (!s->force_raw) {
size = load_elf_as(s->file, NULL, NULL, &entry, NULL, NULL,
big_endian, 0, 0, 0, s->cpu->as);
big_endian, 0, 0, 0, as);
if (size < 0) {
size = load_uimage_as(s->file, &entry, NULL, NULL, NULL, NULL,
s->cpu->as);
as);
}
}
if (size < 0 || s->force_raw) {
/* Default to the maximum size being the machine's ram size */
size = load_image_targphys_as(s->file, s->addr, ram_size,
s->cpu->as);
size = load_image_targphys_as(s->file, s->addr, ram_size, as);
} else {
s->addr = entry;
}

View File

@@ -17,8 +17,10 @@
#include "qapi/visitor.h"
#include "hw/sysbus.h"
#include "sysemu/sysemu.h"
#include "sysemu/numa.h"
#include "qemu/error-report.h"
#include "qemu/cutils.h"
#include "sysemu/numa.h"
static char *machine_get_accel(Object *obj, Error **errp)
{
@@ -388,6 +390,102 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
return head;
}
/**
* machine_set_cpu_numa_node:
* @machine: machine object to modify
* @props: specifies which cpu objects to assign to
* numa node specified by @props.node_id
* @errp: if an error occurs, a pointer to an area to store the error
*
* Associate NUMA node specified by @props.node_id with cpu slots that
* match socket/core/thread-ids specified by @props. It's recommended to use
* query-hotpluggable-cpus.props values to specify affected cpu slots,
* which would lead to exact 1:1 mapping of cpu slots to NUMA node.
*
* However for CLI convenience it's possible to pass in subset of properties,
* which would affect all cpu slots that match it.
* Ex for pc machine:
* -smp 4,cores=2,sockets=2 -numa node,nodeid=0 -numa node,nodeid=1 \
* -numa cpu,node-id=0,socket_id=0 \
* -numa cpu,node-id=1,socket_id=1
* will assign all child cores of socket 0 to node 0 and
* of socket 1 to node 1.
*
* On attempt of reassigning (already assigned) cpu slot to another NUMA node,
* return error.
* Empty subset is disallowed and function will return with error in this case.
*/
void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props, Error **errp)
{
MachineClass *mc = MACHINE_GET_CLASS(machine);
bool match = false;
int i;
if (!mc->possible_cpu_arch_ids) {
error_setg(errp, "mapping of CPUs to NUMA node is not supported");
return;
}
/* disabling node mapping is not supported, forbid it */
assert(props->has_node_id);
/* force board to initialize possible_cpus if it hasn't been done yet */
mc->possible_cpu_arch_ids(machine);
for (i = 0; i < machine->possible_cpus->len; i++) {
CPUArchId *slot = &machine->possible_cpus->cpus[i];
/* reject unsupported by board properties */
if (props->has_thread_id && !slot->props.has_thread_id) {
error_setg(errp, "thread-id is not supported");
return;
}
if (props->has_core_id && !slot->props.has_core_id) {
error_setg(errp, "core-id is not supported");
return;
}
if (props->has_socket_id && !slot->props.has_socket_id) {
error_setg(errp, "socket-id is not supported");
return;
}
/* skip slots with explicit mismatch */
if (props->has_thread_id && props->thread_id != slot->props.thread_id) {
continue;
}
if (props->has_core_id && props->core_id != slot->props.core_id) {
continue;
}
if (props->has_socket_id && props->socket_id != slot->props.socket_id) {
continue;
}
/* reject assignment if slot is already assigned, for compatibility
* of legacy cpu_index mapping with SPAPR core based mapping do not
* error out if cpu thread and matched core have the same node-id */
if (slot->props.has_node_id &&
slot->props.node_id != props->node_id) {
error_setg(errp, "CPU is already assigned to node-id: %" PRId64,
slot->props.node_id);
return;
}
/* assign slot to node as it's matched '-numa cpu' key */
match = true;
slot->props.node_id = props->node_id;
slot->props.has_node_id = props->has_node_id;
}
if (!match) {
error_setg(errp, "no match found");
}
}
static void machine_class_init(ObjectClass *oc, void *data)
{
MachineClass *mc = MACHINE_CLASS(oc);
@@ -400,6 +498,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
* On Linux, each node's border has to be 8MB aligned
*/
mc->numa_mem_align_shift = 23;
mc->numa_auto_assign_ram = numa_default_auto_assign_ram;
object_class_property_add_str(oc, "accel",
machine_get_accel, machine_set_accel, &error_abort);
@@ -580,6 +679,69 @@ bool machine_mem_merge(MachineState *machine)
return machine->mem_merge;
}
static char *cpu_slot_to_string(const CPUArchId *cpu)
{
GString *s = g_string_new(NULL);
if (cpu->props.has_socket_id) {
g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id);
}
if (cpu->props.has_core_id) {
if (s->len) {
g_string_append_printf(s, ", ");
}
g_string_append_printf(s, "core-id: %"PRId64, cpu->props.core_id);
}
if (cpu->props.has_thread_id) {
if (s->len) {
g_string_append_printf(s, ", ");
}
g_string_append_printf(s, "thread-id: %"PRId64, cpu->props.thread_id);
}
return g_string_free(s, false);
}
static void machine_numa_validate(MachineState *machine)
{
int i;
GString *s = g_string_new(NULL);
MachineClass *mc = MACHINE_GET_CLASS(machine);
const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
assert(nb_numa_nodes);
for (i = 0; i < possible_cpus->len; i++) {
const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
/* at this point numa mappings are initilized by CLI options
* or with default mappings so it's sufficient to list
* all not yet mapped CPUs here */
/* TODO: make it hard error in future */
if (!cpu_slot->props.has_node_id) {
char *cpu_str = cpu_slot_to_string(cpu_slot);
g_string_append_printf(s, "%sCPU %d [%s]", s->len ? ", " : "", i,
cpu_str);
g_free(cpu_str);
}
}
if (s->len) {
error_report("warning: CPU(s) not present in any NUMA nodes: %s",
s->str);
error_report("warning: All CPU(s) up to maxcpus should be described "
"in NUMA config, ability to start up with partial NUMA "
"mappings is obsoleted and will be removed in future");
}
g_string_free(s, true);
}
void machine_run_board_init(MachineState *machine)
{
MachineClass *machine_class = MACHINE_GET_CLASS(machine);
if (nb_numa_nodes) {
machine_numa_validate(machine);
}
machine_class->init(machine);
}
static void machine_class_finalize(ObjectClass *klass, void *data)
{
MachineClass *mc = MACHINE_CLASS(klass);

View File

@@ -91,7 +91,7 @@ static void or_irq_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_or_irq;
/* Reason: Needs to be wired up to work, e.g. see stm32f205_soc.c */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo or_irq_type_info = {

View File

@@ -37,7 +37,7 @@
#include "hw/boards.h"
#include "hw/sysbus.h"
#include "qapi-event.h"
#include "migration/migration.h"
#include "migration/vmstate.h"
bool qdev_hotplug = false;
static bool qdev_hot_added = false;
@@ -861,6 +861,20 @@ static bool device_get_realized(Object *obj, Error **errp)
return dev->realized;
}
static bool check_only_migratable(Object *obj, Error **err)
{
DeviceClass *dc = DEVICE_GET_CLASS(obj);
if (!vmstate_check_only_migratable(dc->vmsd)) {
error_setg(err, "Device %s is not migratable, but "
"--only-migratable was specified",
object_get_typename(obj));
return false;
}
return true;
}
static void device_set_realized(Object *obj, bool value, Error **errp)
{
DeviceState *dev = DEVICE(obj);
@@ -870,7 +884,6 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
Error *local_err = NULL;
bool unattached_parent = false;
static int unattached_count;
int ret;
if (dev->hotplugged && !dc->hotpluggable) {
error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
@@ -878,8 +891,7 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
}
if (value && !dev->realized) {
ret = check_migratable(obj, &local_err);
if (ret < 0) {
if (!check_only_migratable(obj, &local_err)) {
goto fail;
}
@@ -1118,6 +1130,7 @@ static void device_class_init(ObjectClass *class, void *data)
* should override it in their class_init()
*/
dc->hotpluggable = true;
dc->user_creatable = true;
}
void device_reset(DeviceState *dev)

View File

@@ -288,7 +288,7 @@ static void register_class_init(ObjectClass *oc, void *data)
DeviceClass *dc = DEVICE_CLASS(oc);
/* Reason: needs to be wired up to work */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo register_info = {

View File

@@ -326,6 +326,17 @@ static void sysbus_device_class_init(ObjectClass *klass, void *data)
DeviceClass *k = DEVICE_CLASS(klass);
k->init = sysbus_device_init;
k->bus_type = TYPE_SYSTEM_BUS;
/*
* device_add plugs devices into a suitable bus. For "real" buses,
* that actually connects the device. For sysbus, the connections
* need to be made separately, and device_add can't do that. The
* device would be left unconnected, and will probably not work
*
* However, a few machines can handle device_add/-device with
* a few specific sysbus devices. In those cases, the device
* subclass needs to override it and set user_creatable=true.
*/
k->user_creatable = false;
}
static const TypeInfo sysbus_device_type_info = {

View File

@@ -94,7 +94,8 @@ static void cg3_update_display(void *opaque)
uint32_t dval;
int x, y, y_start;
unsigned int width, height;
ram_addr_t page, page_min, page_max;
ram_addr_t page;
DirtyBitmapSnapshot *snap = NULL;
if (surface_bits_per_pixel(surface) != 32) {
return;
@@ -103,29 +104,32 @@ static void cg3_update_display(void *opaque)
height = s->height;
y_start = -1;
page_min = -1;
page_max = 0;
page = 0;
pix = memory_region_get_ram_ptr(&s->vram_mem);
data = (uint32_t *)surface_data(surface);
memory_region_sync_dirty_bitmap(&s->vram_mem);
if (!s->full_update) {
memory_region_sync_dirty_bitmap(&s->vram_mem);
snap = memory_region_snapshot_and_clear_dirty(&s->vram_mem, 0x0,
memory_region_size(&s->vram_mem),
DIRTY_MEMORY_VGA);
}
for (y = 0; y < height; y++) {
int update = s->full_update;
int update;
page = (ram_addr_t)y * width;
update |= memory_region_get_dirty(&s->vram_mem, page, width,
DIRTY_MEMORY_VGA);
if (s->full_update) {
update = 1;
} else {
update = memory_region_snapshot_get_dirty(&s->vram_mem, snap, page,
width);
}
if (update) {
if (y_start < 0) {
y_start = y;
}
if (page < page_min) {
page_min = page;
}
if (page > page_max) {
page_max = page;
}
for (x = 0; x < width; x++) {
dval = *pix++;
@@ -134,7 +138,7 @@ static void cg3_update_display(void *opaque)
}
} else {
if (y_start >= 0) {
dpy_gfx_update(s->con, 0, y_start, s->width, y - y_start);
dpy_gfx_update(s->con, 0, y_start, width, y - y_start);
y_start = -1;
}
pix += width;
@@ -143,17 +147,14 @@ static void cg3_update_display(void *opaque)
}
s->full_update = 0;
if (y_start >= 0) {
dpy_gfx_update(s->con, 0, y_start, s->width, y - y_start);
}
if (page_max >= page_min) {
memory_region_reset_dirty(&s->vram_mem,
page_min, page_max - page_min, DIRTY_MEMORY_VGA);
dpy_gfx_update(s->con, 0, y_start, width, y - y_start);
}
/* vsync interrupt? */
if (s->regs[0] & CG3_CR_ENABLE_INTS) {
s->regs[1] |= CG3_SR_PENDING_INT;
qemu_irq_raise(s->irq);
}
g_free(snap);
}
static void cg3_invalidate_display(void *opaque)

View File

@@ -227,13 +227,13 @@ static void jazz_led_invalidate_display(void *opaque)
static void jazz_led_text_update(void *opaque, console_ch_t *chardata)
{
LedState *s = opaque;
char buf[2];
char buf[3];
dpy_text_cursor(s->con, -1, -1);
qemu_console_resize(s->con, 2, 1);
/* TODO: draw the segments */
snprintf(buf, 2, "%02hhx\n", s->segments);
snprintf(buf, 3, "%02hhx", s->segments);
console_write_ch(chardata++, ATTR2CHTYPE(buf[0], QEMU_COLOR_BLUE,
QEMU_COLOR_BLACK, 1));
console_write_ch(chardata++, ATTR2CHTYPE(buf[1], QEMU_COLOR_BLUE,

View File

@@ -26,7 +26,7 @@
#include "qemu/queue.h"
#include "qemu/atomic.h"
#include "sysemu/sysemu.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "trace.h"
#include "qxl.h"

View File

@@ -1414,6 +1414,7 @@ static void sm501_update_display(void *opaque)
{
SM501State *s = (SM501State *)opaque;
DisplaySurface *surface = qemu_console_surface(s->con);
DirtyBitmapSnapshot *snap;
int y, c_x = 0, c_y = 0;
int crt = (s->dc_crt_control & SM501_DC_CRT_CONTROL_SEL) ? 1 : 0;
int width = get_width(s, crt);
@@ -1425,9 +1426,7 @@ static void sm501_update_display(void *opaque)
draw_hwc_line_func *draw_hwc_line = NULL;
int full_update = 0;
int y_start = -1;
ram_addr_t page_min = ~0l;
ram_addr_t page_max = 0l;
ram_addr_t offset;
ram_addr_t offset = 0;
uint32_t *palette;
uint8_t hwc_palette[3 * 3];
uint8_t *hwc_src = NULL;
@@ -1479,17 +1478,17 @@ static void sm501_update_display(void *opaque)
/* draw each line according to conditions */
memory_region_sync_dirty_bitmap(&s->local_mem_region);
snap = memory_region_snapshot_and_clear_dirty(&s->local_mem_region,
offset, width * height * src_bpp, DIRTY_MEMORY_VGA);
for (y = 0, offset = 0; y < height; y++, offset += width * src_bpp) {
int update, update_hwc;
ram_addr_t page0 = offset;
ram_addr_t page1 = offset + width * src_bpp - 1;
/* check if hardware cursor is enabled and we're within its range */
update_hwc = draw_hwc_line && c_y <= y && y < c_y + SM501_HWC_HEIGHT;
update = full_update || update_hwc;
/* check dirty flags for each line */
update |= memory_region_get_dirty(&s->local_mem_region, page0,
page1 - page0, DIRTY_MEMORY_VGA);
update |= memory_region_snapshot_get_dirty(&s->local_mem_region, snap,
offset, width * src_bpp);
/* draw line and change status */
if (update) {
@@ -1507,12 +1506,6 @@ static void sm501_update_display(void *opaque)
if (y_start < 0) {
y_start = y;
}
if (page0 < page_min) {
page_min = page0;
}
if (page1 > page_max) {
page_max = page1;
}
} else {
if (y_start >= 0) {
/* flush to display */
@@ -1521,18 +1514,12 @@ static void sm501_update_display(void *opaque)
}
}
}
g_free(snap);
/* complete flush to display */
if (y_start >= 0) {
dpy_gfx_update(s->con, 0, y_start, width, y - y_start);
}
/* clear dirty flags */
if (page_min != ~0l) {
memory_region_reset_dirty(&s->local_mem_region,
page_min, page_max + TARGET_PAGE_SIZE,
DIRTY_MEMORY_VGA);
}
}
static const GraphicHwOps sm501_ops = {

View File

@@ -104,36 +104,23 @@ static void tcx_set_dirty(TCXState *s, ram_addr_t addr, int len)
}
}
static int tcx_check_dirty(TCXState *s, ram_addr_t addr, int len)
static int tcx_check_dirty(TCXState *s, DirtyBitmapSnapshot *snap,
ram_addr_t addr, int len)
{
int ret;
ret = memory_region_get_dirty(&s->vram_mem, addr, len, DIRTY_MEMORY_VGA);
ret = memory_region_snapshot_get_dirty(&s->vram_mem, snap, addr, len);
if (s->depth == 24) {
ret |= memory_region_get_dirty(&s->vram_mem,
s->vram24_offset + addr * 4, len * 4,
DIRTY_MEMORY_VGA);
ret |= memory_region_get_dirty(&s->vram_mem,
s->cplane_offset + addr * 4, len * 4,
DIRTY_MEMORY_VGA);
ret |= memory_region_snapshot_get_dirty(&s->vram_mem, snap,
s->vram24_offset + addr * 4, len * 4);
ret |= memory_region_snapshot_get_dirty(&s->vram_mem, snap,
s->cplane_offset + addr * 4, len * 4);
}
return ret;
}
static void tcx_reset_dirty(TCXState *s, ram_addr_t addr, int len)
{
memory_region_reset_dirty(&s->vram_mem, addr, len, DIRTY_MEMORY_VGA);
if (s->depth == 24) {
memory_region_reset_dirty(&s->vram_mem, s->vram24_offset + addr * 4,
len * 4, DIRTY_MEMORY_VGA);
memory_region_reset_dirty(&s->vram_mem, s->cplane_offset + addr * 4,
len * 4, DIRTY_MEMORY_VGA);
}
}
static void update_palette_entries(TCXState *s, int start, int end)
{
DisplaySurface *surface = qemu_console_surface(s->con);
@@ -233,7 +220,8 @@ static void tcx_update_display(void *opaque)
{
TCXState *ts = opaque;
DisplaySurface *surface = qemu_console_surface(ts->con);
ram_addr_t page, page_min, page_max;
ram_addr_t page;
DirtyBitmapSnapshot *snap = NULL;
int y, y_start, dd, ds;
uint8_t *d, *s;
@@ -243,22 +231,20 @@ static void tcx_update_display(void *opaque)
page = 0;
y_start = -1;
page_min = -1;
page_max = 0;
d = surface_data(surface);
s = ts->vram;
dd = surface_stride(surface);
ds = 1024;
memory_region_sync_dirty_bitmap(&ts->vram_mem);
snap = memory_region_snapshot_and_clear_dirty(&ts->vram_mem, 0x0,
memory_region_size(&ts->vram_mem),
DIRTY_MEMORY_VGA);
for (y = 0; y < ts->height; y++, page += ds) {
if (tcx_check_dirty(ts, page, ds)) {
if (tcx_check_dirty(ts, snap, page, ds)) {
if (y_start < 0)
y_start = y;
if (page < page_min)
page_min = page;
if (page > page_max)
page_max = page;
tcx_draw_line32(ts, d, s, ts->width);
if (y >= ts->cursy && y < ts->cursy + 32 && ts->cursx < ts->width) {
@@ -280,17 +266,15 @@ static void tcx_update_display(void *opaque)
dpy_gfx_update(ts->con, 0, y_start,
ts->width, y - y_start);
}
/* reset modified pages */
if (page_max >= page_min) {
tcx_reset_dirty(ts, page_min, page_max - page_min);
}
g_free(snap);
}
static void tcx24_update_display(void *opaque)
{
TCXState *ts = opaque;
DisplaySurface *surface = qemu_console_surface(ts->con);
ram_addr_t page, page_min, page_max;
ram_addr_t page;
DirtyBitmapSnapshot *snap = NULL;
int y, y_start, dd, ds;
uint8_t *d, *s;
uint32_t *cptr, *s24;
@@ -301,8 +285,6 @@ static void tcx24_update_display(void *opaque)
page = 0;
y_start = -1;
page_min = -1;
page_max = 0;
d = surface_data(surface);
s = ts->vram;
s24 = ts->vram24;
@@ -311,14 +293,15 @@ static void tcx24_update_display(void *opaque)
ds = 1024;
memory_region_sync_dirty_bitmap(&ts->vram_mem);
snap = memory_region_snapshot_and_clear_dirty(&ts->vram_mem, 0x0,
memory_region_size(&ts->vram_mem),
DIRTY_MEMORY_VGA);
for (y = 0; y < ts->height; y++, page += ds) {
if (tcx_check_dirty(ts, page, ds)) {
if (tcx_check_dirty(ts, snap, page, ds)) {
if (y_start < 0)
y_start = y;
if (page < page_min)
page_min = page;
if (page > page_max)
page_max = page;
tcx24_draw_line32(ts, d, s, ts->width, cptr, s24);
if (y >= ts->cursy && y < ts->cursy+32 && ts->cursx < ts->width) {
tcx_draw_cursor32(ts, d, y, ts->width);
@@ -341,10 +324,7 @@ static void tcx24_update_display(void *opaque)
dpy_gfx_update(ts->con, 0, y_start,
ts->width, y - y_start);
}
/* reset modified pages */
if (page_max >= page_min) {
tcx_reset_dirty(ts, page_min, page_max - page_min);
}
g_free(snap);
}
static void tcx_invalidate_display(void *opaque)

View File

@@ -1630,7 +1630,7 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
if (!full_update) {
vga_sync_dirty_bitmap(s);
snap = memory_region_snapshot_and_clear_dirty(&s->vram, addr1,
bwidth * height,
line_offset * height,
DIRTY_MEMORY_VGA);
}

View File

@@ -19,7 +19,7 @@
#include "hw/virtio/virtio.h"
#include "hw/virtio/virtio-gpu.h"
#include "hw/virtio/virtio-bus.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/log.h"
#include "qapi/error.h"

View File

@@ -601,7 +601,7 @@ static void i8257_class_init(ObjectClass *klass, void *data)
idc->schedule = i8257_dma_schedule;
idc->register_channel = i8257_dma_register_channel;
/* Reason: needs to be wired up by isa_bus_dma() to work */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo i8257_info = {

View File

@@ -305,7 +305,7 @@ static void sparc32_dma_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_dma;
dc->props = sparc32_dma_properties;
/* Reason: pointer property "iommu_opaque" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo sparc32_dma_info = {

View File

@@ -773,7 +773,7 @@ static void omap_gpio_class_init(ObjectClass *klass, void *data)
dc->reset = omap_gpif_reset;
dc->props = omap_gpio_properties;
/* Reason: pointer property "clk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo omap_gpio_info = {
@@ -804,7 +804,7 @@ static void omap2_gpio_class_init(ObjectClass *klass, void *data)
dc->reset = omap2_gpif_reset;
dc->props = omap2_gpio_properties;
/* Reason: pointer properties "iclk", "fclk0", ..., "fclk5" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo omap2_gpio_info = {

View File

@@ -491,7 +491,7 @@ static void omap_i2c_class_init(ObjectClass *klass, void *data)
dc->props = omap_i2c_properties;
dc->reset = omap_i2c_reset;
/* Reason: pointer properties "iclk", "fclk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = omap_i2c_realize;
}

View File

@@ -123,7 +123,7 @@ static void smbus_eeprom_class_initfn(ObjectClass *klass, void *data)
sc->read_data = eeprom_read_data;
dc->props = smbus_eeprom_properties;
/* Reason: pointer property "data" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo smbus_eeprom_info = {

View File

@@ -103,7 +103,7 @@ static void ich9_smb_class_init(ObjectClass *klass, void *data)
* Reason: part of ICH9 southbridge, needs to be wired up by
* pc_q35_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
I2CBus *ich9_smb_init(PCIBus *bus, int devfn, uint32_t smb_io_base)

View File

@@ -341,7 +341,7 @@ build_fadt(GArray *table_data, BIOSLinker *linker, AcpiPmInfo *pm,
AcpiFadtDescriptorRev3 *fadt = acpi_data_push(table_data, sizeof(*fadt));
unsigned fw_ctrl_offset = (char *)&fadt->firmware_ctrl - table_data->data;
unsigned dsdt_entry_offset = (char *)&fadt->dsdt - table_data->data;
unsigned xdsdt_entry_offset = (char *)&fadt->Xdsdt - table_data->data;
unsigned xdsdt_entry_offset = (char *)&fadt->x_dsdt - table_data->data;
/* FACS address to be filled by Guest linker */
bios_linker_loader_add_pointer(linker,
@@ -354,7 +354,7 @@ build_fadt(GArray *table_data, BIOSLinker *linker, AcpiPmInfo *pm,
ACPI_BUILD_TABLE_FILE, dsdt_entry_offset, sizeof(fadt->dsdt),
ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
bios_linker_loader_add_pointer(linker,
ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->Xdsdt),
ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->x_dsdt),
ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
build_header(linker, table_data,
@@ -2335,7 +2335,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
srat->reserved1 = cpu_to_le32(1);
for (i = 0; i < apic_ids->len; i++) {
int j = numa_get_node_for_cpu(i);
int node_id = apic_ids->cpus[i].props.has_node_id ?
apic_ids->cpus[i].props.node_id : 0;
uint32_t apic_id = apic_ids->cpus[i].arch_id;
if (apic_id < 255) {
@@ -2345,9 +2346,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
core->type = ACPI_SRAT_PROCESSOR_APIC;
core->length = sizeof(*core);
core->local_apic_id = apic_id;
if (j < nb_numa_nodes) {
core->proximity_lo = j;
}
core->proximity_lo = node_id;
memset(core->proximity_hi, 0, 3);
core->local_sapic_eid = 0;
core->flags = cpu_to_le32(1);
@@ -2358,9 +2357,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
core->type = ACPI_SRAT_PROCESSOR_x2APIC;
core->length = sizeof(*core);
core->x2apic_id = cpu_to_le32(apic_id);
if (j < nb_numa_nodes) {
core->proximity_domain = cpu_to_le32(j);
}
core->proximity_domain = cpu_to_le32(node_id);
core->flags = cpu_to_le32(1);
}
}
@@ -2707,6 +2704,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
if (pcms->numa_nodes) {
acpi_add_table(table_offsets, tables_blob);
build_srat(tables_blob, tables->linker, machine);
if (have_numa_distance) {
acpi_add_table(table_offsets, tables_blob);
build_slit(tables_blob, tables->linker);
}
}
if (acpi_get_mcfg(&mcfg)) {
acpi_add_table(table_offsets, tables_blob);

View File

@@ -21,6 +21,7 @@
*/
#include "qemu/osdep.h"
#include "hw/i386/amd_iommu.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "trace.h"
@@ -1137,7 +1138,19 @@ static void amdvi_realize(DeviceState *dev, Error **err)
int ret = 0;
AMDVIState *s = AMD_IOMMU_DEVICE(dev);
X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
PCIBus *bus = PC_MACHINE(qdev_get_machine())->bus;
MachineState *ms = MACHINE(qdev_get_machine());
MachineClass *mc = MACHINE_GET_CLASS(ms);
PCMachineState *pcms =
PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
PCIBus *bus;
if (!pcms) {
error_setg(err, "Machine-type '%s' not supported by amd-iommu",
mc->name);
return;
}
bus = pcms->bus;
s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
amdvi_uint64_equal, g_free, g_free);
@@ -1186,6 +1199,8 @@ static void amdvi_class_init(ObjectClass *klass, void* data)
dc->vmsd = &vmstate_amdvi;
dc->hotpluggable = false;
dc_class->realize = amdvi_realize;
/* Supported by the pc-q35-* machine types */
dc->user_creatable = true;
}
static const TypeInfo amdvi = {

View File

@@ -2969,11 +2969,21 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
static void vtd_realize(DeviceState *dev, Error **errp)
{
PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
PCIBus *bus = pcms->bus;
MachineState *ms = MACHINE(qdev_get_machine());
MachineClass *mc = MACHINE_GET_CLASS(ms);
PCMachineState *pcms =
PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
PCIBus *bus;
IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
if (!pcms) {
error_setg(errp, "Machine-type '%s' not supported by intel-iommu",
mc->name);
return;
}
bus = pcms->bus;
VTD_DPRINTF(GENERAL, "");
x86_iommu->type = TYPE_INTEL;
@@ -3009,6 +3019,8 @@ static void vtd_class_init(ObjectClass *klass, void *data)
dc->hotpluggable = false;
x86_class->realize = vtd_realize;
x86_class->int_remap = vtd_int_remap;
/* Supported by the pc-q35-* machine types */
dc->user_creatable = true;
}
static const TypeInfo vtd_info = {

View File

@@ -597,7 +597,7 @@ static void port92_class_initfn(ObjectClass *klass, void *data)
* wiring: its A20 output line needs to be wired up by
* port92_init().
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo port92_info = {
@@ -747,7 +747,9 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
{
FWCfgState *fw_cfg;
uint64_t *numa_fw_cfg;
int i, j;
int i;
const CPUArchIdList *cpus;
MachineClass *mc = MACHINE_GET_CLASS(pcms);
fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
@@ -782,12 +784,12 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
*/
numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
for (i = 0; i < max_cpus; i++) {
unsigned int apic_id = x86_cpu_apic_id_from_index(i);
cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
for (i = 0; i < cpus->len; i++) {
unsigned int apic_id = cpus->cpus[i].arch_id;
assert(apic_id < pcms->apic_id_limit);
j = numa_get_node_for_cpu(i);
if (j < nb_numa_nodes) {
numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
if (cpus->cpus[i].props.has_node_id) {
numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
}
}
for (i = 0; i < nb_numa_nodes; i++) {
@@ -1047,12 +1049,10 @@ static void load_linux(PCMachineState *pcms,
fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
if (fw_cfg_dma_enabled(fw_cfg)) {
option_rom[nb_option_roms].bootindex = 0;
option_rom[nb_option_roms].name = "linuxboot.bin";
if (pcmc->linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
option_rom[nb_option_roms].name = "linuxboot_dma.bin";
option_rom[nb_option_roms].bootindex = 0;
} else {
option_rom[nb_option_roms].name = "linuxboot.bin";
option_rom[nb_option_roms].bootindex = 0;
}
nb_option_roms++;
}
@@ -1893,6 +1893,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
{
int idx;
int node_id;
CPUState *cs;
CPUArchId *cpu_slot;
X86CPUTopoInfo topo;
@@ -1982,6 +1983,22 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
cs = CPU(cpu);
cs->cpu_index = idx;
node_id = cpu_slot->props.node_id;
if (!cpu_slot->props.has_node_id) {
/* by default CPUState::numa_node was 0 if it's not set via CLI
* keep it this way for now but in future we probably should
* refuse to start up with incomplete numa mapping */
node_id = 0;
}
if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
cs->numa_node = node_id;
} else if (cs->numa_node != node_id) {
error_setg(errp, "node-id %d must match numa node specified"
"with -numa option for cpu-index %d",
cs->numa_node, cs->cpu_index);
return;
}
}
static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
@@ -2243,12 +2260,14 @@ static void pc_machine_reset(void)
}
}
static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
static CpuInstanceProperties
pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
{
X86CPUTopoInfo topo;
x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
&topo);
return topo.pkg_id;
MachineClass *mc = MACHINE_GET_CLASS(ms);
const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
assert(cpu_index < possible_cpus->len);
return possible_cpus->cpus[cpu_index].props;
}
static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
@@ -2280,6 +2299,15 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
ms->possible_cpus->cpus[i].props.has_thread_id = true;
ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
/* default distribution of CPUs over NUMA nodes */
if (nb_numa_nodes) {
/* preset values but do not enable them i.e. 'has_node_id = false',
* numa init code will enable them later if manual mapping wasn't
* present on CLI */
ms->possible_cpus->cpus[i].props.node_id =
topo.pkg_id % nb_numa_nodes;
}
}
return ms->possible_cpus;
}
@@ -2321,8 +2349,9 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
* to be used at the moment, 32K should be enough for a while. */
pcmc->acpi_data_size = 0x20000 + 0x8000;
pcmc->save_tsc_khz = true;
pcmc->linuxboot_dma_enabled = true;
mc->get_hotplug_handler = pc_get_hotpug_handler;
mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
mc->has_hotpluggable_cpus = true;
mc->default_boot_order = "cad";

View File

@@ -54,6 +54,7 @@
#endif
#include "migration/migration.h"
#include "kvm_i386.h"
#include "sysemu/numa.h"
#define MAX_IDE_BUS 2
@@ -437,11 +438,23 @@ static void pc_i440fx_machine_options(MachineClass *m)
m->default_display = "std";
}
static void pc_i440fx_2_9_machine_options(MachineClass *m)
static void pc_i440fx_2_10_machine_options(MachineClass *m)
{
pc_i440fx_machine_options(m);
m->alias = "pc";
m->is_default = 1;
m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
}
DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
pc_i440fx_2_10_machine_options);
static void pc_i440fx_2_9_machine_options(MachineClass *m)
{
pc_i440fx_2_10_machine_options(m);
m->is_default = 0;
m->alias = NULL;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_9);
}
DEFINE_I440FX_MACHINE(v2_9, "pc-i440fx-2.9", NULL,
@@ -450,8 +463,6 @@ DEFINE_I440FX_MACHINE(v2_9, "pc-i440fx-2.9", NULL,
static void pc_i440fx_2_8_machine_options(MachineClass *m)
{
pc_i440fx_2_9_machine_options(m);
m->is_default = 0;
m->alias = NULL;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_8);
}
@@ -474,6 +485,7 @@ static void pc_i440fx_2_6_machine_options(MachineClass *m)
PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
pc_i440fx_2_7_machine_options(m);
pcmc->legacy_cpu_hotplug = true;
pcmc->linuxboot_dma_enabled = false;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_6);
}

View File

@@ -47,6 +47,7 @@
#include "hw/usb.h"
#include "qemu/error-report.h"
#include "migration/migration.h"
#include "sysemu/numa.h"
/* ICH9 AHCI has 6 ports */
#define MAX_SATA_PORTS 6
@@ -301,10 +302,21 @@ static void pc_q35_machine_options(MachineClass *m)
m->max_cpus = 288;
}
static void pc_q35_2_9_machine_options(MachineClass *m)
static void pc_q35_2_10_machine_options(MachineClass *m)
{
pc_q35_machine_options(m);
m->alias = "q35";
m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
}
DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
pc_q35_2_10_machine_options);
static void pc_q35_2_9_machine_options(MachineClass *m)
{
pc_q35_2_10_machine_options(m);
m->alias = NULL;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_9);
}
DEFINE_Q35_MACHINE(v2_9, "pc-q35-2.9", NULL,
@@ -313,7 +325,6 @@ DEFINE_Q35_MACHINE(v2_9, "pc-q35-2.9", NULL,
static void pc_q35_2_8_machine_options(MachineClass *m)
{
pc_q35_2_9_machine_options(m);
m->alias = NULL;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_8);
}
@@ -335,6 +346,7 @@ static void pc_q35_2_6_machine_options(MachineClass *m)
PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
pc_q35_2_7_machine_options(m);
pcmc->legacy_cpu_hotplug = true;
pcmc->linuxboot_dma_enabled = false;
SET_MACHINE_COMPAT(m, PC_COMPAT_2_6);
}

View File

@@ -62,6 +62,7 @@ typedef struct MapCacheRev {
hwaddr paddr_index;
hwaddr size;
QTAILQ_ENTRY(MapCacheRev) next;
bool dma;
} MapCacheRev;
typedef struct MapCache {
@@ -202,7 +203,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
}
static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
uint8_t lock)
uint8_t lock, bool dma)
{
MapCacheEntry *entry, *pentry = NULL;
hwaddr address_index;
@@ -289,6 +290,7 @@ tryagain:
if (lock) {
MapCacheRev *reventry = g_malloc0(sizeof(MapCacheRev));
entry->lock++;
reventry->dma = dma;
reventry->vaddr_req = mapcache->last_entry->vaddr_base + address_offset;
reventry->paddr_index = mapcache->last_entry->paddr_index;
reventry->size = entry->size;
@@ -300,12 +302,12 @@ tryagain:
}
uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
uint8_t lock)
uint8_t lock, bool dma)
{
uint8_t *p;
mapcache_lock();
p = xen_map_cache_unlocked(phys_addr, size, lock);
p = xen_map_cache_unlocked(phys_addr, size, lock, dma);
mapcache_unlock();
return p;
}
@@ -426,8 +428,11 @@ void xen_invalidate_map_cache(void)
mapcache_lock();
QTAILQ_FOREACH(reventry, &mapcache->locked_entries, next) {
DPRINTF("There should be no locked mappings at this time, "
"but "TARGET_FMT_plx" -> %p is present\n",
if (!reventry->dma) {
continue;
}
fprintf(stderr, "Locked DMA mapping while invalidating mapcache!"
" "TARGET_FMT_plx" -> %p is present\n",
reventry->paddr_index, reventry->vaddr_req);
}

View File

@@ -286,7 +286,7 @@ static void vmmouse_class_initfn(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_vmmouse;
dc->props = vmmouse_properties;
/* Reason: pointer property "ps2_mouse" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo vmmouse_info = {

View File

@@ -501,7 +501,7 @@ static void apic_common_class_init(ObjectClass *klass, void *data)
* Reason: APIC and CPU need to be wired up by
* x86_cpu_apic_create()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo apic_common_type = {

View File

@@ -24,7 +24,7 @@
#include "qemu-common.h"
#include "cpu.h"
#include "hw/sysbus.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "sysemu/kvm.h"
#include "kvm_arm.h"
#include "gic_internal.h"

View File

@@ -24,7 +24,7 @@
#include "sysemu/sysemu.h"
#include "sysemu/kvm.h"
#include "kvm_arm.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#define TYPE_KVM_ARM_ITS "arm-its-kvm"
#define KVM_ARM_ITS(obj) OBJECT_CHECK(GICv3ITSState, (obj), TYPE_KVM_ARM_ITS)

View File

@@ -28,7 +28,7 @@
#include "kvm_arm.h"
#include "gicv3_internal.h"
#include "vgic_common.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#ifdef DEBUG_GICV3_KVM
#define DPRINTF(fmt, ...) \

View File

@@ -173,7 +173,7 @@ static void etraxfs_pic_class_init(ObjectClass *klass, void *data)
dc->props = etraxfs_pic_properties;
/*
* Note: pointer property "interrupt_vector" may remain null, thus
* no need for dc->cannot_instantiate_with_device_add_yet = true;
* no need for dc->user_creatable = false;
*/
}

View File

@@ -360,7 +360,7 @@ static void grlib_irqmp_class_init(ObjectClass *klass, void *data)
dc->reset = grlib_irqmp_reset;
dc->props = grlib_irqmp_properties;
/* Reason: pointer properties "set_pil_in", "set_pil_in_opaque" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = grlib_irqmp_realize;
}

View File

@@ -144,7 +144,7 @@ static void pic_common_class_init(ObjectClass *klass, void *data)
* wiring of the slave to the master is hard-coded in device model
* code.
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo pic_common_type = {

View File

@@ -80,7 +80,7 @@ static void altera_iic_class_init(ObjectClass *klass, void *data)
DeviceClass *dc = DEVICE_CLASS(klass);
/* Reason: needs to be wired up, e.g. by nios2_10m50_ghrd_init() */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = altera_iic_realize;
}

View File

@@ -401,7 +401,7 @@ static void omap_intc_class_init(ObjectClass *klass, void *data)
dc->reset = omap_inth_reset;
dc->props = omap_intc_properties;
/* Reason: pointer property "clk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = omap_intc_realize;
}
@@ -656,7 +656,7 @@ static void omap2_intc_class_init(ObjectClass *klass, void *data)
dc->reset = omap_inth_reset;
dc->props = omap2_intc_properties;
/* Reason: pointer property "iclk", "fclk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = omap2_intc_realize;
}

View File

@@ -213,6 +213,7 @@ static void ics_get_kvm_state(ICSState *ics)
irq->priority = irq->saved_priority;
}
irq->status = 0;
if (state & KVM_XICS_PENDING) {
if (state & KVM_XICS_LEVEL_SENSITIVE) {
irq->status |= XICS_STATUS_ASSERTED;
@@ -228,6 +229,12 @@ static void ics_get_kvm_state(ICSState *ics)
| XICS_STATUS_REJECTED;
}
}
if (state & KVM_XICS_PRESENTED) {
irq->status |= XICS_STATUS_PRESENTED;
}
if (state & KVM_XICS_QUEUED) {
irq->status |= XICS_STATUS_QUEUED;
}
}
}
@@ -265,6 +272,12 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
state |= KVM_XICS_PENDING;
}
}
if (irq->status & XICS_STATUS_PRESENTED) {
state |= KVM_XICS_PRESENTED;
}
if (irq->status & XICS_STATUS_QUEUED) {
state |= KVM_XICS_QUEUED;
}
ret = ioctl(kernel_xics_fd, KVM_SET_DEVICE_ATTR, &attr);
if (ret != 0) {

View File

@@ -805,7 +805,7 @@ static void ich9_lpc_class_init(ObjectClass *klass, void *data)
* Reason: part of ICH9 southbridge, needs to be wired up by
* pc_q35_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
hc->plug = ich9_pm_device_plug_cb;
hc->unplug_request = ich9_pm_device_unplug_request_cb;
hc->unplug = ich9_pm_device_unplug_cb;

View File

@@ -123,7 +123,7 @@ static void piix4_class_init(ObjectClass *klass, void *data)
* Reason: part of PIIX4 southbridge, needs to be wired up,
* e.g. by mips_malta_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->hotpluggable = false;
}

View File

@@ -494,7 +494,7 @@ static void via_class_init(ObjectClass *klass, void *data)
* Reason: part of VIA VT82C686 southbridge, needs to be wired up,
* e.g. by mips_fulong2e_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo via_info = {

View File

@@ -189,7 +189,7 @@ void microblaze_load_kernel(MicroBlazeCPU *cpu, hwaddr ddr_base,
ram_size - initrd_offset);
}
if (initrd_size < 0) {
error_report("qemu: could not load initrd '%s'",
error_report("could not load initrd '%s'",
initrd_filename);
exit(EXIT_FAILURE);
}

View File

@@ -1224,7 +1224,7 @@ static void gt64120_pci_class_init(ObjectClass *klass, void *data)
* PCI-facing part of the host bridge, not usable without the
* host-facing part, which can't be device_add'ed, yet.
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo gt64120_pci_info = {

View File

@@ -25,7 +25,7 @@
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "sysemu/kvm.h"
#include "migration/migration.h"
#include "migration/blocker.h"
#include "qemu/error-report.h"
#include "qemu/event_notifier.h"
#include "qom/object_interfaces.h"

Some files were not shown because too many files have changed in this diff Show More