Compare commits

..

556 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
77bfcf28f1 console: Remove unused QEMU_BIG_ENDIAN_FLAG
If we need to, we should use the pixman formats instead but for
now this is unused except in commented out code so take it out
to avoid further confusion about surface endianness.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 15:38:04 +02:00
Gerd Hoffmann
43c7d8bd44 console: add qemu_pixman_linebuf_copy
Helper function for copying data from linebuf to framebuffer using
pixman, possibly converting in case src and dst formats differ.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
4c38762fb5 console: add dpy_gfx_update_dirty
Calls dpy_gfx_update for all dirty scanlines. Works for
DisplaySurfaces backed by guest memory (i.e. the ones created
using qemu_create_displaysurface_guestmem).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
a77549b3ff console: add qemu_create_displaysurface_guestmem
This patch adds a qemu_create_displaysurface_guestmem helper function.
Works simliar to qemu_create_displaysurface_from, but accepts a
guest address instead of a host pointer and it handles
cpu_physical_memory_{map,unmap} for you.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
30f1e661b6 console: stop using PixelFormat
With this patch the qemu console core stops using PixelFormat and pixman
format codes side-by-side, pixman format code is the primary way to
specify the DisplaySurface format:

 * DisplaySurface stops carrying a PixelFormat field.
 * qemu_create_displaysurface_from() expects a pixman format now.

Functions to convert PixelFormat to pixman_format_code_t (and back)
exist for those who still use PixelFormat.   As PixelFormat allows
easy access to masks and shifts it will probably continue to exist.

[ xenfb added by Benjamin Herrenschmidt ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
56bd9ea1a3 console: reimplement qemu_default_pixelformat
Use the new qemu_pixelformat_from_pixman and qemu_default_pixman_format
functions to reimplement qemu_default_pixelformat
(qemu_different_endianness_pixelformat too).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
1527a25ec9 console: add qemu_default_pixman_format
Function returning the default pixman format for a given depth.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Gerd Hoffmann
a93a3af9ec console: add qemu_pixelformat_from_pixman
Function to convert pixman format codes to qemu PixelFormat.

[ Benjamin Herrenschmidt: fix BGRA+RGBA shifts ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-05 13:27:11 +02:00
Peter Maydell
fd884c0765 Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging
QOM infrastructure fixes and device conversions

* Cleanups for recursive device unrealization

# gpg: Signature made Thu 04 Sep 2014 18:17:35 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/qom-devices-for-peter:
  qdev: Add cleanup logic in device_set_realized() to avoid resource leak
  qdev: Use NULL instead of local_err for qbus_child unrealize
  qdev: Use error_abort instead of using local_err
  memory: Remove object_property_add_child_array()
  qom: Add automatic arrayification to object_property_add()
  machine: Clean up -machine handling
  qom: Make object_child_foreach() safe for objects removal

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 19:41:15 +01:00
Peter Maydell
bbb6a1e872 Merge remote-tracking branch 'remotes/kvaneesh/for-upstream' into staging
* remotes/kvaneesh/for-upstream:
  hw/9pfs: Don't return type from host in readdir on local 9p filesystem
  hw/9pfs: Use little-endian format for xattr values

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 18:34:28 +01:00
Gonglei
1d45a705fc qdev: Add cleanup logic in device_set_realized() to avoid resource leak
At present, this function doesn't have partial cleanup implemented,
which will cause resource leaks in some scenarios.

Example:

1. Assume that "dc->realize(dev, &local_err)" executes successful
   and local_err == NULL;
2. device hotplug in hotplug_handler_plug() executes but fails
   (it is prone to occur). Then local_err != NULL;
3. error_propagate(errp, local_err) and return. But the resources
   which have been allocated in dc->realize() will be leaked.
Simple backtrace:
  dc->realize()
   |->device_realize
            |->pci_qdev_init()
                |->do_pci_register_device()
                |->etc.

Add fuller cleanup logic which assures that function can
goto appropriate error label as local_err population is
detected at each relevant point.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 19:15:54 +02:00
Gonglei
cd4520adca qdev: Use NULL instead of local_err for qbus_child unrealize
Forcefully unrealize all children regardless of errors in earlier
iterations (if any). We should keep going with cleanup operation
rather than report an error immediately. Therefore store the first
child unrealization failure and propagate it at the end. We also
forcefully unregister vmsd and unrealize actual object, too.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 19:15:06 +02:00
Peter Maydell
8cf8c92e77 Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging
Net patches

# gpg: Signature made Thu 04 Sep 2014 17:32:44 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/net-pull-request:
  virtio-net: purge outstanding packets when starting vhost
  net: complete all queued packets on VM stop
  net: invoke callback when purging queue
  virtio: don't call device on !vm_running
  virtio-net: don't run bh on vm stopped
  net: Forbid dealing with packets when VM is not running

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 17:39:07 +01:00
Michael S. Tsirkin
086abc1ccd virtio-net: purge outstanding packets when starting vhost
whenever we start vhost, virtio could have outstanding packets
queued, when they complete later we'll modify the ring
while vhost is processing it.

To prevent this, purge outstanding packets on vhost start.

Cc: qemu-stable@nongnu.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 17:19:09 +01:00
Michael S. Tsirkin
ca77d85e1d net: complete all queued packets on VM stop
This completes all packets, ensuring that callbacks
will not run when VM is stopped.

Cc: qemu-stable@nongnu.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 17:19:09 +01:00
Michael S. Tsirkin
07d8084624 net: invoke callback when purging queue
devices rely on packet callbacks eventually running,
but we violate this rule whenever we purge the queue.
To fix, invoke callbacks on all packets on purge.
Set length to 0, this way callers can detect that
this happened and re-queue if necessary.

Cc: qemu-stable@nongnu.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 17:19:09 +01:00
Michael S. Tsirkin
269bd822e7 virtio: don't call device on !vm_running
On vm stop, virtio changes vm_running state
too soon, so callbacks can get envoked with
vm_running = false;

Cc: qemu-stable@nongnu.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 17:19:09 +01:00
Michael S. Tsirkin
e8bcf84200 virtio-net: don't run bh on vm stopped
commit 783e770693
    virtio-net: stop/start bh when appropriate

is incomplete: BH might execute within the same main loop iteration but
after vmstop, so in theory, we might trigger an assertion.
I was unable to reproduce this in practice,
but it seems clear enough that the potential is there, so worth fixing.

Cc: qemu-stable@nongnu.org
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 17:19:09 +01:00
Bastian Blank
840a1bf283 hw/9pfs: Don't return type from host in readdir on local 9p filesystem
When using mapped mode in 9pfs, readdir implementation
should not return file type in d_type from the host
readdir, instead, it should use the type stored in
the extended attributes.  Since d_type is optional
and reading ext attrs for every readdir is expensive,
it should be sufficient to just set d_type to DT_UNKNOWN,
so guest will know to look it up separately.

This is a -stable material.

Signed-off-by: Bastian Blank <waldi@debian.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2014-09-04 10:51:13 -05:00
Gonglei
d578029e71 qdev: Use error_abort instead of using local_err
This error can not happen normally. If it happens, it indicates
something very wrong, we should abort QEMU. Moreover, the
user can only refer to /machine/peripheral or /objects, not
/machine/unattached.

While at it, remove superfluous check about local_err.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 16:14:47 +02:00
Peter Crosthwaite
843ef73a69 memory: Remove object_property_add_child_array()
Obsoleted by automatic object_property_add() arrayification.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 16:14:47 +02:00
Peter Crosthwaite
339659041f qom: Add automatic arrayification to object_property_add()
If "[*]" is given as the last part of a QOM property name, treat that
as an array property. The added property is given the first available
name, replacing the * with a decimal number counting from 0.

First add with name "foo[*]" will be "foo[0]". Second "foo[1]" and so
on.

Callers may inspect the ObjectProperty * return value to see what
number the added property was given.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 16:14:47 +02:00
Andreas Färber
d2659e27e1 machine: Clean up -machine handling
Since commit c4090f8, -object options are no longer handled through
object_set_property(), so clean up -object leftovers by renaming the
function and dropping special-casing of qom-type and id properties.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 16:14:47 +02:00
Alexey Kardashevskiy
8af734ca31 qom: Make object_child_foreach() safe for objects removal
Current object_child_foreach() uses QTAILQ_FOREACH() to walk
through children and that makes children removal from the callback
impossible.

This makes object_child_foreach() use QTAILQ_FOREACH_SAFE().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-04 16:14:47 +02:00
zhanghailiang
e1d64c084b net: Forbid dealing with packets when VM is not running
For all NICs(except virtio-net) emulated by qemu,
Such as e1000, rtl8139, pcnet and ne2k_pci,
Qemu can still receive packets when VM is not running.

If this happened in *migration's* last PAUSE VM stage, but
before the end of the migration, the new receiving packets will possibly dirty
parts of RAM which has been cached in *iovec*(will be sent asynchronously) and
dirty parts of new RAM which will be missed.
This will lead serious network fault in VM.

To avoid this, we forbid receiving packets in generic net code when
VM is not running.

Bug reproduction steps:
(1) Start a VM which configured at least one NIC
(2) In VM, open several Terminal and do *Ping IP -i 0.1*
(3) Migrate the VM repeatedly between two Hosts
And the *PING* command in VM will very likely fail with message:
'Destination HOST Unreachable', the NIC in VM will stay unavailable unless you
run 'service network restart'

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-04 14:31:54 +01:00
Peter Maydell
01eb313907 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-09-03' into staging
trivial patches for 2014-09-03

# gpg: Signature made Wed 03 Sep 2014 06:53:42 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514  66A7 BEE5 9D74 A4C3 D7DB

* remotes/mjt/tags/trivial-patches-2014-09-03:
  slirp: Honour vlan/stack in hostfwd_remove commands
  hmp: fix MemdevList memory leak
  qom/object.c, hmp.c: fix string_output_get_string() memory leak
  query-memdev: fix potential memory leaks
  MAINTAINERS: Add VMWare devices maintainer
  device_tree.c: dump all err mesages with error_report
  device_tree.c: redirect load_device_tree err message to stderr
  scripts: Remove scripts/qtest
  Fix debug print warning
  curl: The macro that you have to uncomment to get debugging is DEBUG_CURL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 13:33:53 +01:00
Peter Maydell
b27e37d4ce Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features

A bunch of bugfixes - these will make sense for 2.1.1

Initial Intel IOMMU support.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 03 Sep 2014 14:41:23 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  acpi-build: Set FORCE_APIC_CLUSTER_MODEL bit for FADT flags
  vhost-scsi: init backend features earlier
  vhost_net: init acked_features to backend_features
  vhost_net: start/stop guest notifiers properly

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 12:20:41 +01:00
Peter Maydell
4771b02512 Revert "vhost_net: start/stop guest notifiers properly"
This reverts commit aad4dce934.

I accidentally merged the wrong version of a pull request
which had a buggy version of this patch. Reverting the
buggy version means we can then cleanly merge in the correct
pull with the corrected change.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-04 12:19:37 +01:00
zhanghailiang
07b81ed937 acpi-build: Set FORCE_APIC_CLUSTER_MODEL bit for FADT flags
If we start Windows 2008 R2 DataCenter with number of cpu less than 8,
The system will use APIC Flat Logical destination mode as default configuration,
Which has an upper limit of 8 CPUs.

The fault is that VM can not show all processors within Task Manager if
we hot-add cpus when the number of cpus in VM extends the limit of 8.

If we use cluster destination model, the problem will be solved.

Note:
This flag was introduced later than ACPI v1.0 specification while QEMU
generates v1.0 tables only, but...

linux kernel ignores this flag, so patch has no influence on it.

Tested with Win[XPsp3|Srv2003EE|Srv2008DC|Srv2008R2|Srv2012R2], there
isn't BSODs and guests boot just fine. In cases guest doesn't support
cpu-hotplug, cpu becomes visible after reboot and in case the guest
supports cpu-hotplug, it works as expected with this patch.

Cc: qemu-stable@nongnu.org
Signed-off-by: huangzhichao <huangzhichao@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
2014-09-03 16:41:05 +03:00
Michael S. Tsirkin
3a1655fc53 vhost-scsi: init backend features earlier
As vhost core can use backend_features during init, clear it earlier to
avoid using uninitialized memory.
This use would be harmless since vhost scsi ignores the result
anyway, but initializing earlier will help prevent valgrind errors,
and make scsi and net behave similarly.

Cc: qemu-stable@nongnu.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-03 16:41:05 +03:00
Jason Wang
b49ae9138d vhost_net: init acked_features to backend_features
commit 2e6d46d77e (vhost: add
vhost_get_features and vhost_ack_features) removes the step that
initializes the acked_features to backend_features.

As this field is now uninitialized, vhost initialization will sometimes
fail.

To fix, initialize acked_features on each ack.

Tested-by: Andrey Korolyov <andrey@xdel.ru>
Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-03 16:41:05 +03:00
Jason Wang
cd7d1d26b0 vhost_net: start/stop guest notifiers properly
commit a9f98bb5eb "vhost: multiqueue
support" changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.

To fix this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.

MST: fix up error handling.

Cc: qemu-stable@nongnu.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Andrey Korolyov <andrey@xdel.ru>
Reported-by: "Zhangjie (HZ)" <zhangjie14@huawei.com>
Tested-by: William Dauchy <william@gandi.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-03 16:40:44 +03:00
Aneesh Kumar K.V
f8ad4a89e9 hw/9pfs: Use little-endian format for xattr values
With security_model=mapped-xattr, we encode the uid,gid and other file
attributes as extended attributes of the file. We save them under
user.virtfs.* namespace.

Use little-endian encoding for on-disk values. This enables us to export
the same directory from both little-endian and big-endian hosts.

NOTE: This will break big-endian host that have virtFS exports
using security model mapped-xattr. They will have to use external tools
to convert the xattr to little-endian format.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2014-09-02 16:02:33 -05:00
Peter Maydell
70381662aa slirp: Honour vlan/stack in hostfwd_remove commands
The hostfwd_add and hostfwd_remove monitor commands allow the user
to optionally specify a vlan/stack tuple. hostfwd_add honours this,
but hostfwd_remove does not (it looks up the tuple but then ignores
the SlirpState it has looked up and always uses the first stack
in the list anyway). Correct this to honour what the user requested.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Chen Fan
ecaf54a052 hmp: fix MemdevList memory leak
the memdev_list in hmp_info_memdev() is never freed.
so we use existent method qapi_free_MemdevList() to free it.
and also we can use qapi_free_MemdevList() to replace list loops
to clean up the memdev list in error path.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Chen Fan
976620ac40 qom/object.c, hmp.c: fix string_output_get_string() memory leak
string_output_get_string() uses g_string_free(str, false) to
transfer the 'str' pointer to callers and never free it.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Chen Fan
b0e90181e4 query-memdev: fix potential memory leaks
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Dmitry Fleytman
622fb504c4 MAINTAINERS: Add VMWare devices maintainer
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Li Liu
508e221f2c device_tree.c: dump all err mesages with error_report
Signed-off-by: Li Liu <john.liuli@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Li Liu
db013f81b2 device_tree.c: redirect load_device_tree err message to stderr
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Li Liu <john.liuli@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Fam Zheng
7d2ff422ca scripts: Remove scripts/qtest
This is a dummy file with no user, drop it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Gonglei
c5539cb426 Fix debug print warning
Steps:

1.enable qemu debug print, using simply scprit as below:
 grep "//#define DEBUG" * -rl | xargs sed -i "s/\/\/#define DEBUG/#define DEBUG/g"
2. make -j
3. get some warning:
hw/i2c/pm_smbus.c: In function 'smb_ioport_writeb':
hw/i2c/pm_smbus.c:142: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/i2c/pm_smbus.c:142: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/i2c/pm_smbus.c: In function 'smb_ioport_readb':
hw/i2c/pm_smbus.c:209: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/intc/i8259.c: In function 'pic_ioport_read':
hw/intc/i8259.c:373: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/input/pckbd.c: In function 'kbd_write_command':
hw/input/pckbd.c:232: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'
hw/input/pckbd.c: In function 'kbd_write_data':
hw/input/pckbd.c:333: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'
hw/isa/apm.c: In function 'apm_ioport_writeb':
hw/isa/apm.c:44: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/isa/apm.c:44: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/isa/apm.c: In function 'apm_ioport_readb':
hw/isa/apm.c:67: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/timer/mc146818rtc.c: In function 'cmos_ioport_write':
hw/timer/mc146818rtc.c:394: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/i386/pc.c: In function 'port92_write':
hw/i386/pc.c:479: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'

Fix them.

Cc: qemu-trivial@nongnu.org
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Richard W.M. Jones
41c2346716 curl: The macro that you have to uncomment to get debugging is DEBUG_CURL.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Peter Maydell
f2426947de Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features

A bunch of bugfixes - these will make sense for 2.1.1

Initial Intel IOMMU support.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 02 Sep 2014 16:05:04 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  vhost_net: start/stop guest notifiers properly
  pci: avoid losing config updates to MSI/MSIX cap regs
  virtio-net: don't run bh on vm stopped
  ioh3420: remove unused ioh3420_init() declaration
  vhost_net: cleanup start/stop condition
  intel-iommu: add IOTLB using hash table
  intel-iommu: add context-cache to cache context-entry
  intel-iommu: add supports for queued invalidation interface
  intel-iommu: fix coding style issues around in q35.c and machine.c
  intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "iommu" as a switch
  intel-iommu: add DMAR table to ACPI tables
  intel-iommu: introduce Intel IOMMU (VT-d) emulation
  iommu: add is_write as a parameter to the translate function of MemoryRegionIOMMUOps

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-02 16:07:31 +01:00
Jason Wang
aad4dce934 vhost_net: start/stop guest notifiers properly
commit a9f98bb5eb vhost: multiqueue
support changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.

To adapt this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is used to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.

Cc: qemu-stable@nongnu.org
Reported-by: "Zhangjie (HZ)" <zhangjie14@huawei.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-02 17:33:37 +03:00
Knut Omang
d7efb7e08e pci: avoid losing config updates to MSI/MSIX cap regs
Since
commit 95d6580024
    msi: Invoke msi/msix_write_config from PCI core
msix config writes are lost, the value written is always 0.

Fix pci_default_write_config to avoid this.

Cc: qemu-stable@nongnu.org
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-02 17:28:26 +03:00
Michael S. Tsirkin
0187c7989a virtio-net: don't run bh on vm stopped
commit 783e770693
    virtio-net: stop/start bh when appropriate

is incomplete: BH might execute within the same main loop iteration but
after vmstop, so in theory, we might trigger an assertion.
I was unable to reproduce this in practice,
but it seems clear enough that the potential is there, so worth fixing.

Cc: qemu-stable@nongnu.org
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-02 17:28:26 +03:00
Gonglei
fc8342f758 ioh3420: remove unused ioh3420_init() declaration
commit 0f9b1771cc
    ioh3420: Remove obsoleted, unused ioh3420_init function
removed the implementation of ioh3420_init

Drop the declaration from the header file as well.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-09-02 17:28:26 +03:00
Michael S. Tsirkin
2d2507ef23 vhost_net: cleanup start/stop condition
Checking vhost device internal state in vhost_net looks like
a layering violation since vhost_net does not
set this flag: it is set and tested by vhost.c.
There seems to be no reason to check this:
caller in virtio net uses its own flag,
vhost_started, to ensure vhost is started/stopped
as appropriate.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
2014-09-02 17:28:25 +03:00
Peter Maydell
30eaca3acd Merge remote-tracking branch 'remotes/spice/tags/pull-spice-20140902-1' into staging
sanity check for qxl, minor spice display channel tweak.

# gpg: Signature made Tue 02 Sep 2014 09:53:39 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/spice/tags/pull-spice-20140902-1:
  spice: use console index as display id
  qxl-render: add more sanity checks

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-02 10:26:10 +01:00
Xin Tong
88e89a57f9 implementing victim TLB for QEMU system emulated TLB
QEMU system mode page table walks are expensive. Taken by running QEMU
qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
4-level page tables in guest Linux OS takes ~450 X86 instructions on
average.

QEMU system mode TLB is implemented using a directly-mapped hashtable.
This structure suffers from conflict misses. Increasing the
associativity of the TLB may not be the solution to conflict misses as
all the ways may have to be walked in serial.

A victim TLB is a TLB used to hold translations evicted from the
primary TLB upon replacement. The victim TLB lies between the main TLB
and its refill path. Victim TLB is of greater associativity (fully
associative in this patch). It takes longer to lookup the victim TLB,
but its likely better than a full page table walk. The memory
translation path is changed as follows :

Before Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. TLB refill.
5. Do the memory access.
6. Return to code cache.

After Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. Victim TLB lookup.
5. If victim TLB misses, TLB refill
6. Do the memory access.
7. Return to code cache

The advantage is that victim TLB can offer more associativity to a
directly mapped TLB and thus potentially fewer page table walks while
still keeping the time taken to flush within reasonable limits.
However, placing a victim TLB before the refill path increase TLB
refill path as the victim TLB is consulted before the TLB refill. The
performance results demonstrate that the pros outweigh the cons.

some performance results taken on SPECINT2006 train
datasets and kernel boot and qemu configure script on an
Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine are shown in the
Google Doc link below.

https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing

In summary, victim TLB improves the performance of qemu-system-x86_64 by
11% on average on SPECINT2006, kernelboot and qemu configscript and with
highest improvement of in 26% in 456.hmmer. And victim TLB does not result
in any performance degradation in any of the measured benchmarks. Furthermore,
the implemented victim TLB is architecture independent and is expected to
benefit other architectures in QEMU as well.

Although there are measurement fluctuations, the performance
improvement is very significant and by no means in the range of
noises.

Signed-off-by: Xin Tong <trent.tong@gmail.com>
Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 17:43:06 +01:00
Bastian Koppelmann
44ea34309e target-tricore: Add instructions of SR opcode format
Add instructions of SR opcode format.
Add micro-op generator functions for saturate.
Add helper return from exception (rfe).

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-16-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
5a7634a28c target-tricore: Add instructions of SLR, SSRO and SRO opcode format
Add instructions of SLR, SSRO and SRO opcode format.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-15-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
5de93515f9 target-tricore: Add instructions of SC opcode format
Add instructions of SC opcode format.
Add helper for begin interrupt service routine.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-14-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
a47b50db60 target-tricore: Add instructions of SBR opcode format
Add instructions of SBR opcode format.
Add gen_loop micro-op generator function.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-13-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
70b0226250 target-tricore: Add instructions of SBC and SBRN opcode format
Add instructions of SBC and SBRN opcode format.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-12-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
9a31922b08 target-tricore: Add instructions of SB opcode format
Add instructions of SB opcode format.
Add helper call/ret.
Add micro-op generator functions for branches.
Add makro to generate helper functions.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-11-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
d279821074 target-tricore: Add instructions of SRRS and SLRO opcode format
Add instructions of SSRS and SLRO opcode format.
Add micro-op generator functions for offset loads.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-10-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
46aa848ffb target-tricore: Add instructions of SSR opcode format
Add instructions of SSR opcode format.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-9-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
2692802a37 target-tricore: Add instructions of SRR opcode format
Add instructions of SRR opcode format.
Add helper for add/sub_ssov.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-8-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:21 +01:00
Bastian Koppelmann
0707ec1bea target-tricore: Add instructions of SRC opcode format
Add instructions of SRC opcode format.
Add micro-op generator functions for add, conditional add/sub and shi/shai.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-7-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Bastian Koppelmann
7c87d074d2 target-tricore: Add masks and opcodes for decoding
Add masks and opcodes for decoding TriCore instructions.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1409572800-4116-6-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Bastian Koppelmann
0aaeb118b3 target-tricore: Add initialization for translation and activate target
Add tcg and cpu model initialization.
Add gen_intermediate_code function.
Activate target in configure and add softmmu config.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-5-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Bastian Koppelmann
2d30267e8e target-tricore: Add softmmu support
Add basic softmmu support for TriCore

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-4-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Bastian Koppelmann
e2d0501103 target-tricore: Add board for systemmode
Add basic board to allow systemmode emulation

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-3-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Bastian Koppelmann
48e06fe0ed target-tricore: Add target stubs and qom-cpu
Add TriCore target stubs, and QOM cpu, and Maintainer

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-id: 1409572800-4116-2-git-send-email-kbastian@mail.uni-paderborn.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 14:49:20 +01:00
Peter Maydell
5cd1475d28 Merge remote-tracking branch 'remotes/borntraeger/tags/kvm-s390-20140901' into staging
s390x/kvm: Several updates/fixes/features

1. s390x/kvm: avoid synchronize_rcu's in kernel
----------------------------------------------
The first patches change s390x/kvm code to issue VCPU specific ioctls
from the VCPU thread. This will avoid unnecessary synchronize_rcu in
the kernel, which caused a noticably slowdown with many guest CPUs.
It speeds up all start/restart/reset operations involving cpus
drastically.

2. s390-ccw.img: block size and DASD format support
---------------------------------------------------
The second part changes the s390-ccw bios to IPL (boot)  more disk
formats than before. Furthermore a small fix is made to the console
output of the bios.

3. s390: Support for Hotplug of Standby Memory
----------------------------------------------
The third part adds support in s390 for a pool of standby memory,
which can be set online/offline by the guest (ie, via chmem).
The standby pool of memory is allocated as the difference between
the initial memory setting and the maxmem setting.
As part of this work, additional results are provided for the
Read SCP Information SCLP, and new implentation is added for the
Read Storage Element Information, Attach Storage Element,
Assign Storage and Unassign Storage SCLPs, which enables the s390
guest to manipulate the standby memory pool.

This patchset is based on work originally done by Jeng-Fang (Nick)
Wang.

Sample qemu command snippet:

qemu -machine s390-ccw-virtio  -m 1024M,maxmem=2048M,slots=32 -enable-kvm

This will allocate 1024M of active memory, and another 1024M
of standby memory.  Example output from s390-tools lsmem:
=============================================================================
0x0000000000000000-0x000000000fffffff        256  online   no         0-127
0x0000000010000000-0x000000001fffffff        256  online   yes        128-255
0x0000000020000000-0x000000003fffffff        512  online   no         256-511
0x0000000040000000-0x000000007fffffff       1024  offline  -          512-1023

Memory device size  : 2 MB
Memory block size   : 256 MB
Total online memory : 1024 MB
Total offline memory: 1024 MB

The guest can dynamically enable part or all of the standby pool
via the s390-tools chmem, for example:

chmem -e 512M

And can attempt to dynamically disable:

chmem -d 512M

4. s390x/gdb: various fixes
---------------------------
* Patch 1 fixes a bug where the cc was changed accidentally.
* Patch 2 adds the gdb feature XML files for s390x
* Patch 3 Define acr and fpr registers as coprocessor registers. This allows us
   to reuse the feature XML files.
* Patch 4 whitespace fixes

# gpg: Signature made Mon 01 Sep 2014 12:53:39 BST using RSA key ID B5A61C7C
# gpg: Can't check signature: public key not found

* remotes/borntraeger/tags/kvm-s390-20140901:
  s390x/gdb: coding style fixes
  s390x/gdb: generate target.xml and handle fp/ac as coprocessors
  s390x/gdb: add the feature xml files for s390x
  s390x/gdb: don't touch the cc if tcg is not enabled
  sclp-s390: Add memory hotplug SCLPs
  s390-virtio: Apply same memory boundaries as virtio-ccw
  virtio-ccw: Include standby memory when calculating storage increment
  sclp-s390: Add device to manage s390 memory hotplug
  pc-bios/s390-ccw.img binary update
  pc-bios/s390-ccw: Do proper console setup
  pc-bios/s390-ccw: IPL from DASD with format variations
  pc-bios/s390-ccw Really big EAV ECKD DASD handling
  pc-bios/s390-ccw Improve ECKD informational message
  pc-bios/s390-ccw: handle more ECKD DASD block sizes
  pc-bios/s390-ccw: support all virtio block size
  s390x/kvm: execute the first cpu reset on the vcpu thread
  s390x/kvm: execute "system reset" cpu resets on the vcpu thread
  s390x/kvm: execute sigp orders on the target vcpu thread
  s390x/kvm: run guest triggered resets on the target vcpu thread

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 13:57:46 +01:00
Gerd Hoffmann
cd56cc6b07 spice: use console index as display id
... instead of maintaining our own numbering.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-09-01 10:19:03 +02:00
Gerd Hoffmann
503b3b33fe qxl-render: add more sanity checks
Damn, the dirty rectangle values are signed integers.  So the checks
added by commit 788fbf042f are not good
enough, we also have to make sure they are not negative.

[ Note: There must be something broken in spice-server so we get
  negative values in the first place.  Bug opened:
  https://bugzilla.redhat.com/show_bug.cgi?id=1135372 ]

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2014-09-01 10:19:03 +02:00
David Hildenbrand
218829db23 s390x/gdb: coding style fixes
This patch cleanes up two coding style issues (missing whitespaces).

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:45:19 +02:00
David Hildenbrand
73d510c9d3 s390x/gdb: generate target.xml and handle fp/ac as coprocessors
This patch reduces the core registers to the psw and the general purpose
registers. The fpc and ac registers are handled as coprocessors registers by gdb.
This allows to reuse the feature xml files taken from gdb without further
modification and is what other architectures do.

The target.xml is now generated and provided to the gdb client. Therefore, the
client doesn't have to guess which registers are available at which logical
register number.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:45:19 +02:00
David Hildenbrand
6117afac34 s390x/gdb: add the feature xml files for s390x
This patch adds the relevant s390x feature xml files taken from gdb.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:45:19 +02:00
David Hildenbrand
97fa52f097 s390x/gdb: don't touch the cc if tcg is not enabled
When reading/writing the psw mask, the condition code may only be touched if
running on tcg.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:45:19 +02:00
Matthew Rosato
1def6656b6 sclp-s390: Add memory hotplug SCLPs
Add memory information to read SCP info and add handlers for
Read Storage Element Information, Attach Storage Element,
Assign Storage and Unassign Storage.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:25:32 +02:00
Matthew Rosato
e7f1314f97 s390-virtio: Apply same memory boundaries as virtio-ccw
Although s390-virtio won't support memory hotplug, it should
enforce the same memory boundaries so that it can use shared codepaths
(like read_SCP_info).

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:25:32 +02:00
Matthew Rosato
b6fe01248e virtio-ccw: Include standby memory when calculating storage increment
When determining the memory increment size, use the maxmem size if
it was specified.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:25:32 +02:00
Matthew Rosato
0844df77fd sclp-s390: Add device to manage s390 memory hotplug
Add sclpMemoryHotplugDev to contain associated data structures, etc.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:25:32 +02:00
Eugene (jno) Dvurechenski
f360221988 pc-bios/s390-ccw.img binary update
Rebuild of s390-ccw.img containing these patches:

  pc-bios/s390-ccw: Do proper console setup
  pc-bios/s390-ccw: support all virtio block size
  pc-bios/s390-ccw: handle more ECKD DASD block sizes
  pc-bios/s390-ccw Improve ECKD informational message
  pc-bios/s390-ccw Really big EAV ECKD DASD handling
  pc-bios/s390-ccw: IPL from DASD with format variations

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Christian Borntraeger
1aa7f4c6aa pc-bios/s390-ccw: Do proper console setup
The final newline/return must happen before we reset the sclp via
diag 308.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Eugene (jno) Dvurechenski
14f56a2e35 pc-bios/s390-ccw: IPL from DASD with format variations
There are two known cases of DASD format where signatures are
incomplete or absent:

1. result of <dasdfmt -d ldl -L ...> (ECKD_LDL_UNLABELED)
2. CDL with zero keys in IPL1 and IPL2 records

Now the code attempts to
1. find zIPL and use SCSI layout
2. find IPL1 and use CDL layout
3. find CMS1 and use LDL layout
3. find LNX1 and use LDL layout
4. find zIPL and use unlabeled LDL layout
5. find zIPL and use CDL layout
6. die
in this sequence.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Eugene (jno) Dvurechenski
f04db28b86 pc-bios/s390-ccw Really big EAV ECKD DASD handling
For EAV ECKD DASD, the cylinder count will have the magic value
0xfffeU. Therefore, use the block number to test for valid eckd
addresses instead.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Eugene (jno) Dvurechenski
b0885f7599 pc-bios/s390-ccw Improve ECKD informational message
Add block size display to ECKD scheme report.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Eugene (jno) Dvurechenski
00a47e7e71 pc-bios/s390-ccw: handle more ECKD DASD block sizes
Using dasdfmt(8) to format a DASD allows to choose a block size.
There are four supported values: 512, 1024, 2048, and 4096 bytes
per block. Each block size leads to selection of new count of
sectors per track. The head count remains always the same: 15.

This empiric knowledge is used to detect ECKD DASD to IPL from.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Eugene (jno) Dvurechenski
92cb05574b pc-bios/s390-ccw: support all virtio block size
The block size value may be given "as is" OR as a base value and
a shift count (exponent). So, we have to use calculation to get
the proper number in the code.

The main expression reads as
        (blk_cfg.blk_size << blk_cfg.physical_block_exp)

E.g., various combinations between blk_size=1/physical_block_exp=12
and blk_size=4096/physical_block_exp=0 are valid for 4K blocks.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
David Hildenbrand
159855f098 s390x/kvm: execute the first cpu reset on the vcpu thread
As all full cpu resets currently call into the kernel to do initial cpu reset,
let's run this reset (triggered by cpu_s390x_init()) on the proper vcpu thread.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
David Hildenbrand
1fad8b3be3 s390x/kvm: execute "system reset" cpu resets on the vcpu thread
Let's execute resets triggered by qemu system resets on the target vcpu thread.
This will avoid synchronize_rcu's in the kernel.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
David Hildenbrand
6e6ad8db11 s390x/kvm: execute sigp orders on the target vcpu thread
All sigp orders that can result in ioctls on the target vcpu should be executed
on the associated vcpu thread.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
David Hildenbrand
85ca3371f6 s390x/kvm: run guest triggered resets on the target vcpu thread
Currently, load_normal_reset() and modified_clear_reset() as triggered
by a guest vcpu will initiate cpu resets on the current vcpu thread for
all cpus. The reset should happen on the individual vcpu thread
instead, so let's use run_on_cpu() for this.

This avoids calls to synchronize_rcu() in the kernel.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-09-01 09:23:02 +02:00
Peter Maydell
988f463614 Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request

# gpg: Signature made Fri 29 Aug 2014 17:25:58 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request: (35 commits)
  quorum: Fix leak of opts in quorum_open
  blkverify: Fix leak of opts in blkverify_open
  nfs: Fix leak of opts in nfs_file_open
  curl: Don't deref NULL pointer in call to aio_poll.
  curl: Allow a cookie or cookies to be sent with http/https requests.
  virtio-blk: allow drive_del with dataplane
  block: acquire AioContext in do_drive_del()
  linux-aio: avoid deadlock in nested aio_poll() calls
  qemu-iotests: add multiwrite test cases
  block: fix overlapping multiwrite requests
  nbd: Follow the BDS' AIO context
  block: Add AIO context notifiers
  nbd: Drop nbd_can_read()
  sheepdog: fix a core dump while do auto-reconnecting
  aio-win32: add support for sockets
  qemu-coroutine-io: fix for Win32
  AioContext: introduce aio_prepare
  aio-win32: add aio_set_dispatching optimization
  test-aio: test timers on Windows too
  AioContext: export and use aio_dispatch
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 18:40:04 +01:00
Fam Zheng
8df3abfcee quorum: Fix leak of opts in quorum_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Fam Zheng
3158593126 blkverify: Fix leak of opts in blkverify_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Fam Zheng
810f4f86b7 nfs: Fix leak of opts in nfs_file_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Richard W.M. Jones
a2f468e48f curl: Don't deref NULL pointer in call to aio_poll.
In commit 63f0f45f2e the following
mechanical change was made:

         if (!state) {
-            qemu_aio_wait();
+            aio_poll(state->s->aio_context, true);
         }

The new code now checks if state is NULL and then dereferences it
('state->s') which is obviously incorrect.

This commit replaces state->s->aio_context with
bdrv_get_aio_context(bs), fixing this problem.  The two other hunks
are concerned with getting the BlockDriverState pointer bs to where it
is needed.

The original bug causes a segfault when using libguestfs to access a
VMware vCenter Server and doing any kind of complex read-heavy
operations.  With this commit the segfault goes away.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:19:01 +01:00
Richard W.M. Jones
a94f83d94f curl: Allow a cookie or cookies to be sent with http/https requests.
In order to access VMware ESX efficiently, we need to send a session
cookie.  This patch is very simple and just allows you to send that
session cookie.  It punts on the question of how you get the session
cookie in the first place, but in practice you can just run a `curl'
command against the server and extract the cookie that way.

To use it, add file.cookie to the curl URL.  For example:

$ qemu-img info 'json: {
    "file.driver":"https",
    "file.url":"https://vcenter/folder/Windows%202003/Windows%202003-flat.vmdk?dcPath=Datacenter&dsName=datastore1",
    "file.sslverify":"off",
    "file.cookie":"vmware_soap_session=\"52a01262-bf93-ccce-d379-8dabb3e55560\""}'
image: [...]
file format: raw
virtual size: 8.0G (8589934592 bytes)
disk size: unavailable

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:11:14 +01:00
Stefan Hajnoczi
3255d1c21f virtio-blk: allow drive_del with dataplane
Now that drive_del acquires the AioContext we can safely allow deleting
the drive.  As with non-dataplane mode, all I/Os submitted by the guest
after drive_del will return EIO.

This patch makes hot unplug work with virtio-blk dataplane.  Previously
drive_del reported an error because the device was busy.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:01:48 +01:00
Stefan Hajnoczi
8ad4202bf6 block: acquire AioContext in do_drive_del()
Make drive_del safe for dataplane where another thread may be running
the BlockDriverState's AioContext.

Note the assumption that AioContext's lifetime exceeds DriveInfo and
BlockDriverState.  We release AioContext after DriveInfo and
BlockDriverState are potentially freed.

This is clearly safe with the global AioContext but also with -object
iothread and implicit iothreads created by -device
virtio-blk-pci,x-data-plane=on (their lifetime is tied to DeviceState,
not BlockDriverState).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:01:10 +01:00
Stefan Hajnoczi
2cdff7f620 linux-aio: avoid deadlock in nested aio_poll() calls
If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.

This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Marcin Gibuła <m.gibula@beyond.pl>
Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Marcin Gibuła <m.gibula@beyond.pl>
2014-08-29 15:59:17 +01:00
Peter Maydell
8b3030114a Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140829' into staging
target-arm queue:
 * support PMCCNTR in ARMv8
 * various GIC fixes and cleanups
 * Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
 * Fix regression that disabled VFP for ARMv5 CPUs
 * Update to upstream VIXL 1.5

# gpg: Signature made Fri 29 Aug 2014 15:34:47 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"

* remotes/pmaydell/tags/pull-target-arm-20140829:
  target-arm: Implement pmccfiltr_write function
  target-arm: Remove old code and replace with new functions
  target-arm: Implement pmccntr_sync function
  target-arm: Add arm_ccnt_enabled function
  target-arm: Implement PMCCNTR_EL0 and related registers
  arm: Implement PMCCNTR 32b read-modify-write
  target-arm: Make the ARM PMCCNTR register 64-bit
  hw/intc/arm_gic: honor target mask in gic_update()
  aarch64: raise max_cpus to 8
  arm_gic: Use GIC_NR_SGIS constant
  arm_gic: Do not force PPIs to edge-triggered mode
  arm_gic: GICD_ICFGR: Write model only for pre v1 GICs
  arm_gic: Fix read of GICD_ICFGR
  target-arm: Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
  target-arm: Fix regression that disabled VFP for ARMv5 CPUs
  disas/libvixl: Update to upstream VIXL 1.5

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:48:15 +01:00
Alistair Francis
0614601cec target-arm: Implement pmccfiltr_write function
This is the function that is called when writing to the
PMCCFILTR_EL0 register

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 73da3da6404855b17d5ae82975a32ff3a4dcae3d.1409025949.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:30 +01:00
Alistair Francis
942a155b20 target-arm: Remove old code and replace with new functions
Remove the old PMCCNTR code and replace it with calls to the new
pmccntr_sync() and arm_ccnt_enabled() functions.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 693a6e437d915c2195fd3dc7303f384ca538b7bf.1409025949.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:30 +01:00
Alistair Francis
ec7b4ce4c7 target-arm: Implement pmccntr_sync function
This is used to synchronise the PMCCNTR counter and swap its
state between enabled and disabled if required. It must always
be called twice, both before and after any logic that could
change the state of the PMCCNTR counter.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 62811d4c0f7b1384f7aab62ea2fcfda3dcb0db50.1409025949.git.peter.crosthwaite@xilinx.com
[PMM: fixed minor typos in pmccntr_sync doc comment]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Alistair Francis
87124fdea4 target-arm: Add arm_ccnt_enabled function
Include a helper function to determine if the CCNT counter
is enabled.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: e1a64f17a756e06c8bda8238ad4826d705049f7a.1409025949.git.peter.crosthwaite@xilinx.com
[ PC changes
  * Remove EL based checks
]
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Alistair Francis
8521466b39 target-arm: Implement PMCCNTR_EL0 and related registers
This patch adds support for the ARMv8 version of the PMCCNTR and
related registers. It also starts to implement the PMCCFILTR_EL0
register.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: b5d1094764a5416363ee95216799b394ecd011e8.1409025949.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Peter Crosthwaite
421c7ebd93 arm: Implement PMCCNTR 32b read-modify-write
The register is now 64bit, however a 32 bit write to the register
should leave the higher bits unchanged. The open coded write handler
does not implement this, so we need to read-modify-write accordingly.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Alistair Francis <alistair23@gmail.com>
Message-id: ec350573424bb2adc1701c3b9278d26598e2f2d1.1409025949.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Alistair Francis
c92c06872a target-arm: Make the ARM PMCCNTR register 64-bit
This makes the PMCCNTR register 64-bit to allow for the
64-bit ARMv8 version.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 6c5bac5fd0ea54963b1fc0e7f9464909f2e19a73.1409025949.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Sergey Fedorov
b52b81e44f hw/intc/arm_gic: honor target mask in gic_update()
Take IRQ target mask into account when determining the highest priority
pending interrupt.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Message-id: 1407947471-26981-1-git-send-email-serge.fdrv@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Joel Schopp
d3579f362f aarch64: raise max_cpus to 8
I'm running on a system with 8 cpus and it would be nice to have qemu
support all of them.  The attached patch does that and has been tested.

That said, I'm not sure if 8 is enough or if we want to bump this even higher
now before systems with many more cpus come along. 255 anyone?

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Message-id: 20140819213304.19537.2834.stgit@joelaarch64.amd.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Adam Lackorzynski
93b5f6f1a6 arm_gic: Use GIC_NR_SGIS constant
Use constant rather than a plain number.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Message-id: 1408372255-12358-5-git-send-email-adam@os.inf.tu-dresden.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:29 +01:00
Adam Lackorzynski
de7a900f0c arm_gic: Do not force PPIs to edge-triggered mode
Only SGIs must be WI, done by forcing them to their default
(edge-triggered).

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Message-id: 1408372255-12358-4-git-send-email-adam@os.inf.tu-dresden.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:28 +01:00
Adam Lackorzynski
24b790df43 arm_gic: GICD_ICFGR: Write model only for pre v1 GICs
Setting the model is only available in pre-v1 GIC models.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Message-id: 1408372255-12358-3-git-send-email-adam@os.inf.tu-dresden.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:28 +01:00
Adam Lackorzynski
71a62046ae arm_gic: Fix read of GICD_ICFGR
The GICD_ICFGR register covers 4 interrupts per byte.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Message-id: 1408372255-12358-2-git-send-email-adam@os.inf.tu-dresden.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:28 +01:00
Peter Maydell
c379621451 target-arm: Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
We implement the crypto extensions but were incorrectly reporting
ID register values for the Cortex-A57 which did not advertise
crypto. Use the correct values as described in the TRM.
With this fix Linux correctly detects presence of the crypto
features and advertises them in /proc/cpuinfo.

Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1408718660-7295-1-git-send-email-peter.maydell@linaro.org
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:28 +01:00
Peter Maydell
ed1f13d607 target-arm: Fix regression that disabled VFP for ARMv5 CPUs
Commit 2c7ffc414 added support for honouring the CPACR coprocessor
access control register bits which may disable access to VFP
and Neon instructions. However it failed to account for the
fact that the CPACR is only present starting from the ARMv6
architecture version, so it accidentally disabled VFP completely
for ARMv5 CPUs like the ARM926. Linux would detect this as
"no VFP present" and probably fall back to its own emulation,
but other guest OSes might crash or misbehave.

This fixes bug LP:1359930.

Reported-by: Jakub Jermar <jakub@jermar.eu>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1408714940-7192-1-git-send-email-peter.maydell@linaro.org
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 15:00:28 +01:00
Peter Maydell
508280f566 disas/libvixl: Update to upstream VIXL 1.5
Update our copy of libvixl to upstream's 1.5 release.
This includes the upstream versions of the fixes we
were carrying locally (commit ffebe899).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1407162987-4659-1-git-send-email-peter.maydell@linaro.org
2014-08-29 15:00:27 +01:00
Stefan Hajnoczi
12ade76090 qemu-iotests: add multiwrite test cases
This test case covers the basic bdrv_aio_multiwrite() scenarios:
1. Single request
2. Sequential requests (AABB)
3. Superset overlapping requests (AABBAA)
4. Subset overlapping requests (BBAABB)
5. Head overlapping requests (AABB)
6. Tail overlapping requests (BBAA)
7. Disjoint requests (AA BB)

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-29 14:10:15 +01:00
Stefan Hajnoczi
391827eb10 block: fix overlapping multiwrite requests
When request A is a strict superset of request B:

  AAAAAAAA
    BBBB

multiwrite_merge() merges them as follows:

  AABBBB

The tail of request A should have been included:

  AABBBBAA

This patch fixes data loss but this code path is probably rare.  Since
guests cannot assume ordering between in-flight requests, few
applications submit overlapping write requests.

Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-29 14:09:43 +01:00
Peter Maydell
d9aa688557 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20140829-1' into staging
usb: bugfix collection.
usb: add cleanup functions for host adapters,
     in preparation for hotplug support.
usb: add simple qtests for uhci,ohci,xhci.

# gpg: Signature made Fri 29 Aug 2014 12:56:20 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20140829-1:
  tests: add xHCI qtest
  tests: add UHCI qtest
  tests: add OHCI qtest
  usb: add usb host adapters exit trace
  usb-xhci: add exit function
  usb-ehci: add ehci-pci device exit function
  usb-ehci: add ehci unrealize funciton
  usb-ehci: add vmstate properity for EHCIState
  usb-uhci: clean up uhci resource when pci-uhci exit
  usb-ohci: add exit function
  usb-ohci: Fix memory leak for ohci timer
  usb: add usb_bus_release function
  Revert "xhci: Fix number of streams allocated when using streams"
  xhci: use (1u << i)
  Fix OHCI ISO TD state never being written back.
  xhci: fix debug print compiling error
  usb: Fix bootindex for portnr > 9

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 13:08:04 +01:00
Gonglei
25e89ec5d2 tests: add xHCI qtest
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:53:47 +02:00
Gonglei
44ced58e3a tests: add UHCI qtest
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:53:47 +02:00
Gonglei
28edfce0f3 tests: add OHCI qtest
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:53:47 +02:00
Gonglei
d733f74c33 usb: add usb host adapters exit trace
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:14 +02:00
Gonglei
53c30545fb usb-xhci: add exit function
clean up xhci resource when xhci pci device exit.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:14 +02:00
Gonglei
96e14926c6 usb-ehci: add ehci-pci device exit function
clean up ehci resource when ehci pci device exit.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:14 +02:00
Gonglei
4e130cf6a8 usb-ehci: add ehci unrealize funciton
cleanup ehci controller resource, both pci and sysbus
if they're necessary.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:14 +02:00
Gonglei
05a36991c5 usb-ehci: add vmstate properity for EHCIState
since hotunplug the ehci host adapter, we should
delete vm_change_state_handler also, so the
VMChangeStateEntry should be saved in EHCIState.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:14 +02:00
Gonglei
3a3464b000 usb-uhci: clean up uhci resource when pci-uhci exit
clean up uhci resource when uhci pci device exit.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:13 +02:00
Gonglei
07832c38d3 usb-ohci: add exit function
clean up ohci resource when ohci pci device exit.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:52:13 +02:00
Gonglei
80be63df5a usb-ohci: Fix memory leak for ohci timer
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:44 +02:00
Gonglei
e5a9bece9b usb: add usb_bus_release function
add global variables releasing logic when the usb buses
were removed or hot-unpluged.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:44 +02:00
Gerd Hoffmann
f90e160b50 Revert "xhci: Fix number of streams allocated when using streams"
This reverts commit d063c3112c.

"2 << x" is the same as "2 ^ (x + 1)", so the old code is correct.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:44 +02:00
Gerd Hoffmann
3d80365b55 xhci: use (1u << i)
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-29 12:51:43 +02:00
Jack Un
cae7f29c47 Fix OHCI ISO TD state never being written back.
There appears to be typo in OHCI with isochronous transfers
resulting in isoch. transfer descriptor state never being written back.
The'put_words' function is in a OR statement hence it is never called.

Signed-off-by: Jack Un <jack.un@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:43 +02:00
Gonglei
8c244210d8 xhci: fix debug print compiling error
after commit 003e15a180
the DPRINTF will broke compiling, adjust its location.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:43 +02:00
Markus Armbruster
830cd54fca usb: Fix bootindex for portnr > 9
We identify devices by their Open Firmware device paths.  The encoding
of the host controller and hub port numbers is incorrect:
usb_get_fw_dev_path() formats them in decimal, while SeaBIOS uses
hexadecimal.  When some port number > 9, SeaBIOS will miss the
bootindex (lucky case), or apply it to another device (unlucky case).

The relevant spec[*] agrees with SeaBIOS (and OVMF, for that matter).
Change %d to %x.

Bug can bite only with host controllers or hubs sporting more than ten
ports.  I'm not aware of any.

[*] Open Firmware Recommended Practice: Universal Serial Bus,
Version 1, Section 3.2.1 Device Node Address Representation
http://www.openfirmware.org/1275/bindings/usb/usb-1_0.ps

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Note: xhci can be configured with up to 15 ports (default is 4 ports).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-29 12:51:43 +02:00
Max Reitz
f21492817b nbd: Follow the BDS' AIO context
Keep the NBD server always in the same AIO context as the exported BDS
by calling bdrv_add_aio_context_notifier() and implementing the required
callbacks.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:48:45 +01:00
Max Reitz
33384421b3 block: Add AIO context notifiers
If a long-running operation on a BDS wants to always remain in the same
AIO context, it somehow needs to keep track of the BDS changing its
context. This adds a function for registering callbacks on a BDS which
are called whenever the BDS is attached or detached from an AIO context.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:48:45 +01:00
Max Reitz
958c717df9 nbd: Drop nbd_can_read()
There is no variant of aio_set_fd_handler() like qemu_set_fd_handler2(),
so we cannot give a can_read() callback function. Instead, unregister
the nbd_read() function whenever we cannot read and re-register it as
soon as we can read again.

All this is hidden behind the functions nbd_set_handlers() (which
registers all handlers for the AIO context and file descriptor belonging
to the given client), nbd_unset_handlers() (which unregisters them) and
nbd_update_can_read() (which checks whether NBD can read for the given
client and acts accordingly).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:48:45 +01:00
Liu Yuan
a780dea045 sheepdog: fix a core dump while do auto-reconnecting
We should reinit local_err as NULL inside the while loop or g_free() will report
corrupption and abort the QEMU when sheepdog driver tries reconnecting.

This was broken in commit 356b4ca.

qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: (null)
[xcb] Unknown sequence number while awaiting reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
qemu-system-x86_64: ../../src/xcb_io.c:298: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted (core dumped)

Cc: qemu-devel@nongnu.org
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
b493317d34 aio-win32: add support for sockets
Uses the same select/WSAEventSelect scheme as main-loop.c.
WSAEventSelect() is edge-triggered, so it cannot be used
directly, but it is still used as a way to exit from a
blocking g_poll().

Before g_poll() is called, we poll sockets with a non-blocking
select() to achieve the level-triggered semantics we require:
if a socket is ready, the g_poll() is made non-blocking too.

Based on a patch from Or Goshen.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
79d9b6566b qemu-coroutine-io: fix for Win32
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
a3462c6561 AioContext: introduce aio_prepare
This will be used to implement socket polling on Windows.
On Windows, select() and g_poll() are completely different;
sockets are polled with select() before calling g_poll,
and the g_poll must be nonblocking if select() says a
socket is ready.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
0a9dd1664a aio-win32: add aio_set_dispatching optimization
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
363285d4b3 test-aio: test timers on Windows too
Use EventNotifier instead of a pipe, which makes it trivial to test
timers on Windows.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
e4c7e2d12d AioContext: export and use aio_dispatch
So far, aio_poll's scheme was dispatch/poll/dispatch, where
the first dispatch phase was used only in the GSource case in
order to avoid a blocking poll.  Earlier patches changed it to
dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout.

By making aio_dispatch public, we can remove the first dispatch
phase altogether, so that both aio_poll and the GSource use the same
prepare/poll/dispatch scheme.

This patch breaks the invariant that aio_poll(..., true) will not block
the first time it returns false.  This used to be fundamental for
qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but
no code in QEMU relies on this invariant anymore.  The return value
of aio_poll() is now comparable with that of g_main_context_iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
3672fa5083 AioContext: run bottom halves after polling
Make the dispatching phase the same before blocking and afterwards.
The next patch will make aio_dispatch public and use it directly
for the GSource case, instead of aio_poll.  aio_poll can then be
simplified heavily.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
a398dea34c aio-win32: Factor out duplicate code into aio_dispatch_handlers
Later, the call to aio_dispatch will move int the GSource wrapper, while the
standalone case will still be call the component functions aio_bh_poll,
aio_dispatch_handlers and timerlistgroup_run_timers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
d397ec99bb aio-win32: Evaluate timers after handles
This is similar to what aio_poll does in the stand-alone case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
845ca10dd0 AioContext: take bottom halves into account when computing aio_poll timeout
Right now, QEMU invokes aio_bh_poll before the "poll" phase
of aio_poll.  It is simpler to do it afterwards and skip the
"poll" phase altogether when the OS-dependent parts of AioContext
are invoked from GSource.  This way, AioContext behaves more
similarly when used as a GSource vs. when used as stand-alone.

As a start, take bottom halves into account when computing the
poll timeout.  If a bottom half is ready, do a non-blocking
poll.  As a side effect, this makes idle bottom halves work
with aio_poll; an improvement, but not really an important
one since they are deprecated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Stefan Hajnoczi
3cbbe9fd1f blockdev: fix drive-mirror 'granularity' error message
Name the 'granularity' parameter and give its expected value range.
Previously the device name was mistakenly reported as the parameter
name.

Note that the error class is unchanged from ERROR_CLASS_GENERIC_ERROR.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-29 10:46:58 +01:00
Fam Zheng
0b9caf9b31 coroutine: Drop co_sleep_ns
block_job_sleep_ns is the only user. Since we are moving towards
AioContext aware code, it's better to use the explicit version and drop
the old one.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Liu Yuan
a9db86b223 block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.

For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.

For some use cases as following:

        VM
  --------------
  |            |
  v            v
  A            B

Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.

This patch generalize the above 2 nodes case in the N nodes. That is,

vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.

The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:

- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.

After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.

E.g, Enable single read pattern on 2 children,

-drive driver=quorum,children.0.file.filename=0.qcow2,\
children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1

[1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device

[Dropped \n from an error_setg() error message
--Stefan]

Cc: Benoit Canet <benoit@irqsave.net>
Cc: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Liu Yuan
62c6031f96 qapi: add read-pattern enum for quorum
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Hitoshi Mitake
38890b246d sheepdog: improve error handling for a case of failed lock
Recently, sheepdog revived its VDI locking functionality. This patch
updates sheepdog driver of QEMU for this feature. It changes an error
code for a case of failed locking. -EBUSY is a suitable one.

Reported-by: Valerio Pachera <sirio81@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Hitoshi Mitake
1dbfafed7f sheepdog: adopting protocol update for VDI locking
The update is required for supporting iSCSI multipath. It doesn't
affect behavior of QEMU driver but adding a new field to vdi request
struct is required.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Stefan Hajnoczi
40ed35a3c4 qemu-img: always goto out in img_snapshot() error paths
The out label has the qemu_progress_end() and other cleanup calls.
Always goto out in error paths so the cleanup happens.  These error
paths now return 1 instead of -1.

Note that bdrv_unref(NULL) is safe.  We just need to initialize bs to
NULL at the top of the function.

We can now remove the obsolete bs_old_backing = NULL and bs_new_backing
= NULL for safe mode.  Originally it was necessary in commit 3e85c6fd
("qemu-img rebase") but became useless in commit c2abcce ("qemu-img:
avoid calling exit(1) to release resources properly") because the
variables are already initialized during declaration.

Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-08-29 10:46:57 +01:00
Stefan Hajnoczi
cbda016d94 qemu-img: fix img_compare() flags error path
If img_compare() fails to parse the cache flags the goto out3 code path
will call qemu_progress_end().  Make sure we actually call
qemu_progress_init() first.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-08-29 10:46:57 +01:00
Stefan Hajnoczi
a3981eb978 qemu-img: fix img_commit() error return value
The img_commit() return value is a process exit code.  Use 1 for failure
instead of -1.  The other failure paths in this function already use 1.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-08-29 10:46:57 +01:00
Daniel Henrique Barboza
212aefaa53 block.curl: adding 'timeout' option
The curl hardcoded timeout (5 seconds) sometimes is not long
enough depending on the remote server configuration and network
traffic. The user should be able to set how much long he is
willing to wait for the connection.

Adding a new option to set this timeout gives the user this
flexibility. The previous default timeout of 5 seconds will be
used if this option is not present.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Benoit Canet <benoit.canet@nodalink.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Markus Armbruster
28fa7133b8 ide: Fix bootindex for bus_id > 9
We identify devices by their Open Firmware device paths.  The encoding
of bus numbers is incorrect: idebus_get_fw_dev_path() formats them in
decimal, while SeaBIOS uses hexadecimal.  With bus number > 9, SeaBIOS
will miss the bootindex (lucky case), or apply it to another device
(unlucky case).

Bug can't bite right now: ich9-ahci has six ports, and the sysbus-ahci
created by Calxeda Highbank has just one.

Fix it anyway, by changing %d to %x.

I couldn't find an Open Firmware spec covering this.  For what it's
worth, OVMF agrees with SeaBIOS.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Le Tan
b5a280c008 intel-iommu: add IOTLB using hash table
Add IOTLB to cache information about the translation of input-addresses. IOTLB
use a GHashTable as cache. The key of the hash table is the logical-OR of gfn
and source id after left-shifting.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
d92fa2dc6e intel-iommu: add context-cache to cache context-entry
Add context-cache to cache context-entry encountered on a page-walk. Each
VTDAddressSpace has a member of VTDContextCacheEntry which represents an entry
in the context-cache. Since devices with different bus_num and devfn have their
respective VTDAddressSpace, this will be a good way to reference the cached
entries.
Each VTDContextCacheEntry will have a context_cache_gen and the cached entry
is valid only when context_cache_gen equals IntelIOMMUState.context_cache_gen.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
ed7b8fbcfb intel-iommu: add supports for queued invalidation interface
Add supports for queued invalidation interface, an expended invalidation
interface with extended capabilities.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
ac40aa1540 intel-iommu: fix coding style issues around in q35.c and machine.c
Fix coding style issues around in hw/pci-host/q35.c and hw/core/machine.c.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
a52a7fdfa7 intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "iommu" as a switch
Add Intel IOMMU emulation to q35 chipset and expose it to the guest.
1. Add a machine option. Users can use "-machine iommu=on|off" in the command
line to enable/disable Intel IOMMU. The default is off.
2. Accroding to the machine option, q35 will initialize the Intel IOMMU and
use pci_setup_iommu() to setup q35_host_dma_iommu() as the IOMMU function for
the pci bus.
3. q35_host_dma_iommu() will return different address space according to the
bus_num and devfn of the device.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
d4eb911935 intel-iommu: add DMAR table to ACPI tables
Expose Intel IOMMU to the BIOS. If object of TYPE_INTEL_IOMMU_DEVICE exists,
add DMAR table to ACPI RSDT table. For now the DMAR table indicates that there
is only one hardware unit without INTR_REMAP capability on the platform.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
1da12ec4c8 intel-iommu: introduce Intel IOMMU (VT-d) emulation
Add support for emulating Intel IOMMU according to the VT-d specification for
the q35 chipset machine. Implement the logics for DMAR (DMA remapping) without
PASID support. The emulation supports register-based invalidation and primary
fault logging.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Le Tan
8d7b8cb9c2 iommu: add is_write as a parameter to the translate function of MemoryRegionIOMMUOps
Add a bool variable is_write as a parameter to the translate function of
MemoryRegionIOMMUOps to indicate the operation of the access. It can be
used for correct fault reporting from within the callback.
Change the interface of related functions.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Peter Maydell
a6aebb38ba Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
SCSI patches include bug fixes from Fam and Peter, improved error
reporting from Fam and a fix for DPRINTF bitrot.  Memory patches try
again to initialize name from the QOM name.

# gpg: Signature made Thu 28 Aug 2014 15:10:31 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream:
  memory: Lazy init name from QOM name as needed
  xen: hvm: Abstract away memory region name ref
  xen-hvm: Constify string
  virtio-scsi: Report error if num_queues is 0 or too large
  scsi-generic: remove superfluous DPRINTF avoid to break compiling
  block/iscsi: fix memory corruption on iscsi resize
  scsi-bus: Convert DeviceClass init to realize
  block: Pass errp in blkconf_geometry

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-28 17:08:13 +01:00
Peter Maydell
38a01e55d2 Merge remote-tracking branch 'remotes/kvm/tags/for-upstream' into staging
Mostly bugfixes + Alexey's interface-based implementation
of the NMI monitor command.

# gpg: Signature made Thu 28 Aug 2014 15:07:22 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/kvm/tags/for-upstream:
  mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
  target-i386: Add "tsc_adjust" CPU feature name
  target-i386: Add "mpx" CPU feature name
  vl: process -object after other backend options
  checkpatch.pl: adjust typedef definition to QEMU coding style
  x86: Clear MTRRs on vCPU reset
  x86: kvm: Add MTRR support for kvm_get|put_msrs()
  x86: Use common variable range MTRR counts
  target-i386: Don't forbid NX bit on PAE PDEs and PTEs
  spapr: Add support for new NMI interface
  s390x: Migrate to new NMI interface
  s390x: Convert QEMUMachine to MachineClass
  cpus: Define callback for QEMU "nmi" command
  kvm: run cpu state synchronization on target vcpu thread

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-28 16:07:23 +01:00
Peter Crosthwaite
d1dd32af6f memory: Lazy init name from QOM name as needed
To support name retrieval of MemoryRegions that were created
dynamically (that is, not via memory_region_init and friends). We
cache the name in MemoryRegion's state as
object_get_canonical_path_component mallocs the returned value
so it's not suitable for direct return to callers. Memory already
frees the name field, so this will be garbage collected along with
the MR object.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-28 16:09:44 +02:00
Peter Crosthwaite
3e1f50867b xen: hvm: Abstract away memory region name ref
The mr->name field is removed. This slipped through compile testing.
Fix.

Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-28 16:09:44 +02:00
Peter Crosthwaite
dc6c4fe837 xen-hvm: Constify string
It's constant, and sourced from existing const strings. Avoid dodgy
casts by converting to const.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-28 16:09:44 +02:00
Peter Maydell
795c050e37 Merge remote-tracking branch 'remotes/stefanha/tags/fix-buildbot-12082014-pull-request' into staging
Pull request

# gpg: Signature made Thu 28 Aug 2014 13:43:00 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/fix-buildbot-12082014-pull-request:
  Revert "qemu-img: sort block formats in help message"
  block: sort formats alphabetically in bdrv_iterate_format()
  mirror: fix uninitialized variable delay_ns warnings
  trace: avoid Python 2.5 all() in tracetool
  libqtest: launch QEMU with QEMU_AUDIO_DRV=none
  qapi.py: avoid Python 2.5+ any() function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-28 14:51:12 +01:00
Stefan Hajnoczi
00c6d403a3 Revert "qemu-img: sort block formats in help message"
This reverts commit 1a443c1b8b and the
later commit 395071a763.

GSequence was introduced in glib 2.14.  RHEL 5 fails to compile since it
uses glib 2.12.3.

Now that bdrv_iterate_format() invokes the iteration callback in sorted
order these commits are unnecessary.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Stefan Hajnoczi
ada4240103 block: sort formats alphabetically in bdrv_iterate_format()
Format names are best consumed in alphabetical order.  This makes
human-readable output easy to produce.

bdrv_iterate_format() already has an array of format strings.  Sort them
before invoking the iteration callback.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Stefan Hajnoczi
6d0de8eb21 mirror: fix uninitialized variable delay_ns warnings
The gcc 4.1.2 compiler warns that delay_ns may be uninitialized in
mirror_iteration().

There are two break statements in the do ... while loop that skip over
the delay_ns assignment.  These are probably the cause of the warning.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Stefan Hajnoczi
73735f7218 trace: avoid Python 2.5 all() in tracetool
Red Hat Enterprise Linux 5 ships Python 2.4.3.  The all() function was
added in Python 2.5 so we cannot use it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Stefan Hajnoczi
6b02921605 libqtest: launch QEMU with QEMU_AUDIO_DRV=none
No test case actually uses the audio backend.  Disable audio to prevent
warnings on hosts with no sound hardware present:

  GTESTER check-qtest-aarch64
  sdl: SDL_OpenAudio failed
  sdl: Reason: No available audio device
  sdl: SDL_OpenAudio failed
  sdl: Reason: No available audio device
  audio: Failed to create voice `lm4549.out'

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
2014-08-28 13:42:25 +01:00
Stefan Hajnoczi
7ac9a9d6e1 qapi.py: avoid Python 2.5+ any() function
There is one instance of any() in qapi.py that breaks builds on older
distros that ship Python 2.4 (like RHEL5):

  GEN   qmp-commands.h
Traceback (most recent call last):
  File "build/scripts/qapi-commands.py", line 445, in ?
    exprs = parse_schema(input_file)
  File "build/scripts/qapi.py", line 329, in parse_schema
    schema = QAPISchema(open(input_file, "r"))
  File "build/scripts/qapi.py", line 110, in __init__
    if any(include_path == elem[1]
NameError: global name 'any' is not defined

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Paolo Bonzini
172dbc52b3 mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
This field was forgotten, and it makes the state after reset
non-deterministic.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-27 17:54:52 +02:00
Peter Maydell
0265361a72 Merge remote-tracking branch 'remotes/mcayland/qemu-openbios' into staging
* remotes/mcayland/qemu-openbios:
  Update OpenBIOS images

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-26 14:18:40 +01:00
Eduardo Habkost
7b458bfd12 target-i386: Add "tsc_adjust" CPU feature name
tsc_adjust migration support is already implemented (commit
f28558d3d3), so we can add it to the list
of known feature names.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 14:52:43 +02:00
Eduardo Habkost
5bd8ff07e6 target-i386: Add "mpx" CPU feature name
Migration support for MPX is already implemented (commit
79e9ebebbf), so we can add it to the list
of known feature names.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 14:52:38 +02:00
Mark Cave-Ayland
d2a68f3a30 Update OpenBIOS images
Update OpenBIOS images to SVN r1316 built from submodule.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2014-08-26 13:52:15 +01:00
Paolo Bonzini
7b71758d79 vl: process -object after other backend options
QOM backends can refer to chardevs, but not vice versa.  So
process -chardev and -fsdev options before -object

This fixes the rng-egd backend to virtio-rng.

Reported-by: Amos Kong <akong@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:44:39 +02:00
Paolo Bonzini
a6859deb69 checkpatch.pl: adjust typedef definition to QEMU coding style
Most QEMU typedefs are camelcase, starting with one uppercase letter
and containing at least one lowercase letter.  There are a few
all-uppercase types, add the most common too.

This fixes recognition of types in lines such as

    static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v)

(Example provided by Peter Maydell).

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:44:28 +02:00
Fam Zheng
c9f6552803 virtio-scsi: Report error if num_queues is 0 or too large
No cmd vq surprises guest (Linux panics in virtscsi_probe), too many
queues abort qemu (in the following virtio_add_queue).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:20:44 +02:00
Gonglei
f93d2c15d6 scsi-generic: remove superfluous DPRINTF avoid to break compiling
variables lun and tag had been eliminated, break compiling
when enable debug switch. Meanwhile traces provide the same
information with this DPRINTF, so remove it.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:20:44 +02:00
Peter Lieven
9db693f764 block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:20:44 +02:00
Fam Zheng
a818a4b69d scsi-bus: Convert DeviceClass init to realize
Replace "init/destroy" with "realize/unrealize" in SCSIDeviceClass,
which has errp as a parameter. So all the implementations now use
error_setg instead of error_report for reporting error.

Also in scsi_bus_legacy_handle_cmdline, report the error when
initializing the if=scsi devices, before returning it, because in the
callee, error_report is changed to error_setg. And the callers don't
have the right locations (e.g. "-drive if=scsi").

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:20:44 +02:00
Fam Zheng
5ff5efb46c block: Pass errp in blkconf_geometry
This allows us to pass error information to caller.

Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-26 13:20:44 +02:00
Peter Maydell
c47c61be8d Merge remote-tracking branch 'remotes/awilliam/tags/vfio-pci-for-qemu-20140825.0' into staging
VFIO: Enable primary NVIDIA quirk regardless of VGA support

# gpg: Signature made Mon 25 Aug 2014 20:29:37 BST using RSA key ID 3BB08B22
# gpg: Can't check signature: public key not found

* remotes/awilliam/tags/vfio-pci-for-qemu-20140825.0:
  vfio: Enable NVIDIA 88000 region quirk regardless of VGA

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-26 10:42:06 +01:00
Alex Williamson
fe08275db9 vfio: Enable NVIDIA 88000 region quirk regardless of VGA
If we make use of OVMF for the BIOS then we can use GPUs without VGA
space access, but we still need this quirk.  Disassociate it from the
x-vga option and enable it on all NVIDIA VGA display class devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-25 12:10:15 -06:00
Peter Maydell
a44a12b78a Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features

A bunch of bugfixes - these will make sense for 2.1.1

ACPI support for TPM and partial ARI support for PCIE.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Sun 24 Aug 2014 23:16:35 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  pcie: fix trailing whitespace
  ioh3420: Enable ARI forwarding
  ioh3420: Remove obsoleted, unused ioh3420_init function
  pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
  pcie: Fix incorrect write to the ari capability next function field
  ssdt-tpm: add generated hex file to git
  Add ACPI tables for TPM
  pc: reserve more memory for ACPI for new machine types
  pcihp: fix possible array out of bounds
  pci_bridge: manually destroy memory regions within PCIBridgeWindows
  hostmem: set MPOL_MF_MOVE

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-25 18:49:25 +01:00
Alex Williamson
9db2efd95e x86: Clear MTRRs on vCPU reset
The SDM specifies (June 2014 Vol3 11.11.5):

    On a hardware reset, the P6 and more recent processors clear the
    valid flags in variable-range MTRRs and clear the E flag in the
    IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
    MTRRs are undefined.

We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset.  Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway.  However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset.  The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 18:53:42 +02:00
Alex Williamson
d1ae67f626 x86: kvm: Add MTRR support for kvm_get|put_msrs()
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
target loses MTRR state from the source.  Generally that's ok though
because KVM ignores it and maps everything as write-back anyway.  The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent.  In that case KVM trusts the guest mapping of memory as
configured in the MTRR.

This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration.  kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.

Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 18:53:42 +02:00
Alex Williamson
d8b5c67b05 x86: Use common variable range MTRR counts
We currently define the number of variable range MTRR registers as 8
in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
(also 8) to report to guests the number available.  Change this to
use MSR_MTRRcap_VCNT consistently.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 18:53:42 +02:00
William Grant
1844e68eca target-i386: Don't forbid NX bit on PAE PDEs and PTEs
Commit e8f6d00c30 ("target-i386: raise
page fault for reserved physical address bits") added a check that the
NX bit is not set on PAE PDPEs, but it also added it to rsvd_mask for
the rest of the function. This caused any PDEs or PTEs with NX set to be
erroneously rejected, making PAE guests with NX support unusable.

Signed-off-by: William Grant <wgrant@ubuntu.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 18:53:42 +02:00
Peter Maydell
3dd359c2d3 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-24' into staging
trivial patches for 2014-08-24

# gpg: Signature made Sun 24 Aug 2014 14:28:49 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514  66A7 BEE5 9D74 A4C3 D7DB

* remotes/mjt/tags/trivial-patches-2014-08-24:
  vmxnet3: Pad short frames to minimum size (60 bytes)
  libdecnumber: Fix warnings from smatch (missing static, boolean operations)
  linux-user: fix file descriptor leaks
  po: Fix Makefile rules for in-tree builds without configuration
  slirp/misc: Use the GLib memory allocation APIs
  configure: no need to mkdir QMP
  dma: axidma: Variablise repeated s->streams[i] sub-expr
  microblaze: ml605: Get rid of ddr_base variable
  tests/bios-tables-test: check the value returned by fopen()
  tcg: dump op count into qemu log
  util/path: Use the GLib memory allocation routines

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-25 17:34:30 +01:00
Alexey Kardashevskiy
3431648272 spapr: Add support for new NMI interface
This implements an NMI interface POWERPC SPAPR machine.
This enables an "nmi" HMP/QMP command supported on SPAPR.

This calls POWERPC_EXCP_RESET (vector 0x100) in the guest to deliver NMI
to every CPU. The expected result is XMON (in-kernel debugger) invocation.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 13:25:16 +02:00
Alexey Kardashevskiy
3dd7852f19 s390x: Migrate to new NMI interface
This implements an NMI interface for s390 and s390-ccw machines.

This removes #ifdef s390 branch in qmp_inject_nmi so new s390's
nmi_monitor_handler() callback is going to be used for NMI.

Since nmi_monitor_handler()-calling code is platform independent,
CPUState::cpu_index is used instead of S390CPU::env.cpu_num.
There should not be any change in behaviour as both @cpu_index and
@cpu_num are global CPU numbers.

Note that s390_cpu_restart() already takes care of the specified cpu,
so we don't need to schedule via async_run_on_cpu().

Since the only error s390_cpu_restart() can return is ENOSYS, convert
it to QERR_UNSUPPORTED.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 13:25:16 +02:00
Alexey Kardashevskiy
d07aa7c7bb s390x: Convert QEMUMachine to MachineClass
This converts s390-virtio and s390-ccw-virtio machines to QOM MachineClass.
This brings ability to add interfaces to the machine classes. The first
interface for addition will be NMI.

The patch is mechanical so no change in behavior is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 13:25:16 +02:00
Alexey Kardashevskiy
9cb805fd26 cpus: Define callback for QEMU "nmi" command
This introduces an NMI (Non Maskable Interrupt) interface with
a single nmi_monitor_handler() method. A machine or a device can
implement it. This searches for an QOM object with this interface
and if it is implemented, calls it. The callback implements an action
required to cause debug crash dump on in-kernel debugger invocation.
The callback returns Error**.

This adds a nmi_monitor_handle() helper which walks through
all objects to find the interface. The interface method is called
for all found instances.

This adds support for it in qmp_inject_nmi(). Since no architecture
supports it at the moment, there is no change in behaviour.

This changes inject-nmi command description for HMP and QMP.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-25 13:25:16 +02:00
Michael S. Tsirkin
187de915e8 pcie: fix trailing whitespace
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:07 +02:00
Knut Omang
a74b870270 ioh3420: Enable ARI forwarding
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Knut Omang
0f9b1771cc ioh3420: Remove obsoleted, unused ioh3420_init function
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Knut Omang
821be9dbb2 pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
Rename helper functions to make a clearer distinction between
the PCIe capability/control register feature ARI forwarding and a
device that supports the ARI feature via an ARI extended PCIe capability.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Knut Omang
ec70b46bab pcie: Fix incorrect write to the ari capability next function field
PCI_ARI_CAP_NFN, a macro for reading next function was used instead of
the intended write.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Michael S. Tsirkin
cec391d752 ssdt-tpm: add generated hex file to git
Needed for systems without IASL.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Stefan Berger
711b20b479 Add ACPI tables for TPM
Add an SSDT ACPI table for the TPM device.
Add a TCPA table for BIOS logging area when a TPM is being used.

The latter follows this spec here:

http://www.trustedcomputinggroup.org/files/static_page_files/DCD4188E-1A4B-B294-D050A155FB6F7385/TCG_ACPIGeneralSpecification_PublicReview.pdf

This patch has Michael Tsirkin's patches folded in.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Michael S. Tsirkin
927766c7d3 pc: reserve more memory for ACPI for new machine types
commit 868270f23d
    acpi-build: tweak acpi migration limits
broke kernel loading with -kernel/-initrd: it doubled
the size of ACPI tables but did not reserve
enough memory.

As a result, issues on boot and halt are observed.

Fix this up by doubling reserved memory for new machine types.

Cc: qemu-stable@nongnu.org
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Gonglei
fa365d7cd1 pcihp: fix possible array out of bounds
Prevent out-of-bounds array access on
acpi_pcihp_pci_status.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2014-08-25 00:16:06 +02:00
Paolo Bonzini
9f6b2f1c64 pci_bridge: manually destroy memory regions within PCIBridgeWindows
The regions are destroyed and recreated on configuration space accesses.
We need to destroy them before the containing PCIBridgeWindows object
is freed.

Reported-by: Gonglei <arei.gonglei@huawei.com>
Reported-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-25 00:16:06 +02:00
Ben Draper
40a87c6c9b vmxnet3: Pad short frames to minimum size (60 bytes)
When running VMware ESXi under qemu-kvm the guest discards frames
that are too short. Short ARP Requests will be dropped, this prevents
guests on the same bridge as VMware ESXi from communicating. This patch
simply adds the padding on the network device itself.

Signed-off-by: Ben Draper <ben@xrsa.net>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 17:11:08 +04:00
Stefan Weil
d072cdf3ba libdecnumber: Fix warnings from smatch (missing static, boolean operations)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:21:06 +04:00
zhanghailiang
680dfde919 linux-user: fix file descriptor leaks
Handle variable "fd_orig" going out of scope leaks the handle.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:18:28 +04:00
Stefan Weil
bcc55f327c po: Fix Makefile rules for in-tree builds without configuration
Adding 'update' to the phony targets fixes this error:

$ LANG=C make -C po update
make: Entering directory `/qemu/po'
  LINK  update
/qemu/po/de_DE.po: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [update] Error 1
make: Leaving directory `/qemu/po'

Some other phony targets (build, install) were also added, and the
existing .PHONY statement was moved to a more prominent position at
the beginning of the Makefile.

The patch also fixes a 2nd bug. The default target should be 'all',
but instead 'modules' (from rules.mak) was the default. Fix this by
adding 'all' as a target before any include statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:42 +04:00
zhanghailiang
2fd5d86409 slirp/misc: Use the GLib memory allocation APIs
Here we don't check the return value of malloc() which may fail.
Use the g_new() instead, which will abort the program when
there is not enough memory.

Also, use g_strdup instead of strdup and remove the unnecessary
strdup function.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
Liming Wang
58ab9e6807 configure: no need to mkdir QMP
commit 7537fe04 QMP: QMP/ -> docs/qmp/

Above commit has moved last QMP files to docs/qmp and it's not necessary
to create QMP directory. So remove it from configure.

Signed-off-by: Liming Wang <liming.wang@canonical.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
Peter Crosthwaite
6a07a695b0 dma: axidma: Variablise repeated s->streams[i] sub-expr
This have 6 inline usages. Make it a bit more readable by using a local
variable.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
Peter Crosthwaite
f55f885267 microblaze: ml605: Get rid of ddr_base variable
It's a constant based on a macro. Just use the macro in place.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
zhanghailiang
c39a28a43d tests/bios-tables-test: check the value returned by fopen()
The function fopen() may fail, so check its return value.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Liu <john.liuli@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
zhanghailiang
d70724cec8 tcg: dump op count into qemu log
fopen() may fail and it does not check its return vaule here,
it is better to dump op count to the normal log file.

Signed-off-by: Li Liu <john.liuli@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
zhanghailiang
bc5008a832 util/path: Use the GLib memory allocation routines
In this file, we don't check the return value of malloc/strdup/realloc which may fail.
Instead of using these routines, we use the GLib memory APIs g_malloc/g_strdup/g_realloc.
They will exit on allocation failure, so there is no need to test for failure,
which would be fine for setup.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-24 13:16:32 +04:00
Peter Maydell
33886ebeec Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches

# gpg: Signature made Fri 22 Aug 2014 14:47:53 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (29 commits)
  qemu-img: Allow cache mode specification for amend
  qemu-img: Allow source cache mode specification
  vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
  blkdebug: Delete BH in bdrv_aio_cancel
  qemu-iotests: add test case 101 for short file I/O
  raw-posix: fix O_DIRECT short reads
  block/iscsi: fix memory corruption on iscsi resize
  block/vvfat.c: remove debugging code to reinit stderr if NULL
  iotests: Add test for image filename construction
  quorum: Implement bdrv_refresh_filename()
  nbd: Implement bdrv_refresh_filename()
  blkverify: Implement bdrv_refresh_filename()
  blkdebug: Implement bdrv_refresh_filename()
  block: Add bdrv_refresh_filename()
  virtio-blk: fix reference a pointer which might be freed
  virtio-blk: allow block_resize with dataplane
  block: acquire AioContext in qmp_block_resize()
  qemu-iotests: Fix 028 reference output for qed
  test-coroutine: test cost introduced by coroutine
  iotests: Add test for qcow2's cache options
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-22 16:12:51 +01:00
Peter Maydell
43fe62757b Merge remote-tracking branch 'remotes/riku/linux-user-for-upstream' into staging
* remotes/riku/linux-user-for-upstream: (22 commits)
  linux-user: check return value of malloc()
  linux-user: writev Partial Writes
  linux-user: Support target-to-host translation of mlockall argument
  linux-user: clock_nanosleep errno Handling on PPC
  linux-user: Minimum Sig Handler Stack Size for PPC64 ELF V2
  linux-user: Move get_ppc64_abi
  linux-user: Detect fault in sched_rr_get_interval
  linux-user: Handle NULL sched_param argument to sched_*
  linux-user: Detect Negative Message Sizes in msgsnd System Call
  linux-user: Conditionally Pass Attribute Pointer to mq_open()
  linux-user: Make ipc syscall's third argument an abi_long
  linux-user: Properly Handle semun Structure In Cross-Endian Situations
  linux-user: Dereference Pointer Argument to ipc/semctl Sys Call
  linux-user: PPC64 semid_ds Doesnt Include _unused1 and _unused2
  linux-user: add setns and unshare
  linux-user: support ioprio_{get, set} syscalls
  linux-user: support timerfd_{create, gettime, settime} syscalls
  linux-user: fix readlink handling with magic exe symlink
  linux-user: Fix conversion of sigevent argument to timer_create
  linux-user: Fix syscall instruction usermode emulation on X86_64
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-22 14:39:53 +01:00
Max Reitz
bd39e6ed0b qemu-img: Allow cache mode specification for amend
qemu-img amend may extensively modify the target image, depending on the
options to be amended (e.g. conversion to qcow2 compat level 0.10 from
1.1 for an image with many unallocated zero clusters). Therefore it
makes sense to allow the user to specify the cache mode to be used.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 14:54:48 +02:00
Max Reitz
40055951a7 qemu-img: Allow source cache mode specification
Many qemu-img subcommands only read the source file(s) once. For these
use cases, a full write-back cache is unnecessary and mainly clutters
host cache memory. Though this is generally no concern as cache memory
is freely available and can be scaled by the host OS, it may become a
concern with thin provisioning.

For these cases, it makes sense to allow users to freely specify the
source cache mode (e.g. use no cache at all).

This commit adds a new switch (-T) for the qemu-img subcommands check,
compare, convert and rebase to specify the cache to be used for source
images (the backing file in case of rebase).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 14:54:44 +02:00
zhanghailiang
29e03fcb62 linux-user: check return value of malloc()
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
29560a6cb7 linux-user: writev Partial Writes
Although not technically not required by POSIX, the writev system call will
typically write out its buffers individually.  That is, if the first buffer
is written successfully, but the second buffer pointer is invalid, then
the first chuck will be written and its size is returned.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
6f6a40328b linux-user: Support target-to-host translation of mlockall argument
The argument to the mlockall system call is not necessarily the same on
all platforms and thus may require translation prior to passing to the
host.

For example, PowerPC 64 bit platforms define values for MCL_CURRENT
(0x2000) and MCL_FUTURE (0x4000) which are different from Intel platforms
(0x1 and 0x2, respectively)

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
8fbe8fdfbc linux-user: clock_nanosleep errno Handling on PPC
The clock_nanosleep syscall is unusual in that it returns positive
numbers in error handling situations, versus returning -1 and setting
errno, or returning a negative errno value.  On POWER, the kernel will
set the SO bit of CR0 to indicate failure in a syscall.  QEMU has
generic handling to do this for syscalls with standard return values.

Add special case code for clock_nanosleep to handle CR0 properly.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
0903c8be9e linux-user: Minimum Sig Handler Stack Size for PPC64 ELF V2
The ELF V2 ABI for PPC64 defines MINSIGSTKSZ as 4096 bytes whereas it was
2048 previously.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
67d6d829cd linux-user: Move get_ppc64_abi
The get_ppc64_abi is used to determine the ELF ABI (i.e. V1 or V2). This
routine is currently implemented in the linux-user/elfload.c file but
is useful in other scenarios.  Move the routine to a more generally
available location (linux-user/ppc/target_cpu.h).

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
d4290c40a4 linux-user: Detect fault in sched_rr_get_interval
Properly detect a fault when attempting to store into an invalid
struct timespec pointer.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
a1d5c5b25d linux-user: Handle NULL sched_param argument to sched_*
The sched_getparam, sched_setparam and sched_setscheduler system
calls take a pointer argument to a sched_param structure.  When
this pointer is null, errno should be set to EINVAL.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
edcc5f9dc3 linux-user: Detect Negative Message Sizes in msgsnd System Call
The msgsnd system call takes an argument that describes the message
size (msgsz) and is of type size_t.  The system call should set
errno to EINVAL in the event that a negative message size is passed.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:35 +03:00
Tom Musta
b6ce1f6b90 linux-user: Conditionally Pass Attribute Pointer to mq_open()
The mq_open system call takes an optional struct mq_attr pointer
argument in the fourth position.  This pointer is used when O_CREAT
is specified in the flags (second) argument.  It may be NULL, in
which case the queue is created with implementation defined attributes.

Change the code to properly handle the case when NULL is passed in the
arg4 position.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Tom Musta
37ed09560c linux-user: Make ipc syscall's third argument an abi_long
For those target ABIs that use the ipc system call (e.g. POWER),
the third argument is used in the shmat path as a pointer.  It
therefore must be declared as an abi_long (versus int) so that
the address bits are not lost in truncation.  In fact, all arguments
to do_ipc should be declared as abit_long.

In fact, it makes more sense for all of the arguments to be declaried
as abi_long (except call).

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Tom Musta
5464baecf5 linux-user: Properly Handle semun Structure In Cross-Endian Situations
The semun union used in the semctl system call contains both an int (val) and
pointers.  In cross-endian situations on 64 bit targets, the value passed to
semctl is an 8 byte (abi_long) value and thus does not have the 4-byte val
field in the correct location.  In order to rectify this, the other half
of the union must be accessed.  This is achieved in code by performing
a byte swap on the entire 8 byte union, followed by a 4-byte swap of the
first half.

Also, eliminate an extraneous (dead) line of code that sets target_su.val in
the IPC_SET/IPC_GET case.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Tom Musta
5d2fa8ebb4 linux-user: Dereference Pointer Argument to ipc/semctl Sys Call
When the ipc system call is used to wrap a semctl system call,
the ptr argument to ipc needs to be dereferenced prior to passing
it to the semctl handler.  This is because the fourth argument to
semctl is a union and not a pointer to a union.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Tom Musta
035273440b linux-user: PPC64 semid_ds Doesnt Include _unused1 and _unused2
The 64 bit PowerPC platforms eliminate the _unused1 and _unused2
elements of the semid_ds structure from <sys/sem.h>.  So eliminate
these from the target_semid_ds structure.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Riku Voipio
9af5c906d1 linux-user: add setns and unshare
Add support for the setns and unshare syscalls, trivially passed through to
the host. Based on patches by Paul Burton, added configure check.

Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Paul Burton
ab31cda327 linux-user: support ioprio_{get, set} syscalls
Add support for the ioprio_get & ioprio_set syscalls, allowing their
use by target programs.

Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:34 +03:00
Riku Voipio
518343413f linux-user: support timerfd_{create, gettime, settime} syscalls
Adds support for the timerfd_create, timerfd_gettime & timerfd_settime
syscalls, allowing use of timerfds by target programs.

v2: By Riku - added configure check for timerfd and ifdefs
for benefit of old distributions like RHEL5.

Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Mike Frysinger
f17f4989fa linux-user: fix readlink handling with magic exe symlink
The current code always returns the length of the path when it should
be returning the number of bytes it wrote to the output string.

Further, readlink is not supposed to append a NUL byte, but the current
snprintf logic will always do just that.

Even further, if you pass in a length of 0, you're suppoesd to get back
an error (EINVAL), but the current logic just returns 0.

Further still, if there was an error reading the symlink, we should not
go ahead and try to read the target buffer as it is garbage.

Simple test for the first two issues:
$ cat test.c
int main() {
    char buf[50];
    size_t len;
    for (len = 0; len < 10; ++len) {
        memset(buf, '!', sizeof(buf));
        ssize_t ret = readlink("/proc/self/exe", buf, len);
        buf[20] = '\0';
        printf("readlink(/proc/self/exe, {%s}, %zu) = %zi\n", buf, len, ret);
    }
    return 0;
}

Now compare the output of the native:
$ gcc test.c -o /tmp/x
$ /tmp/x
$ strace /tmp/x

With what qemu does:
$ armv7a-cros-linux-gnueabi-gcc test.c -o /tmp/x -static
$ qemu-arm /tmp/x
$ qemu-arm -strace /tmp/x

Signed-off-by: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Peter Maydell
c065976f2b linux-user: Fix conversion of sigevent argument to timer_create
There were a number of bugs in the conversion of the sigevent
argument to timer_create from target to host format:
 * signal number not converted from target to host
 * thread ID not copied across
 * sigev_value not copied across
 * we never unlocked the struct when we were done

Between them, these problems meant that SIGEV_THREAD_ID
timers (and the glibc-implemented SIGEV_THREAD timers which
depend on them) didn't work.

Fix these problems and clean up the code a little by pulling
the struct conversion out into its own function, in line with
how we convert various other structs. This allows the test
program in bug LP:1042388 to run.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Jincheng Miao
47575997be linux-user: Fix syscall instruction usermode emulation on X86_64
Currently syscall instruction is buggy on user mode X86_64,
the EIP is updated after do_syscall(), that is too late for
clone(). Because clone() will create a thread at the env->EIP
(the address of syscall insn), and then child thread enters
do_syscall() again, that is not expected. Sometimes it is tragic.

User mode syscall insn emulation is not used MSR, so the
action should be same to INT 0x80. INT 0x80 will update EIP in
do_interrupt(), ditto for syscall() for consistency.

Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Riku Voipio
0b2effd744 linux-user: redirect openat calls
While Mikhail fixed /proc/self/maps, it was noticed openat calls are
not redirected currently. Some archs don't have open at all, so
openat needs to be redirected.

Fix this by consolidating open/openat code to do_openat - open
is implemented using openat(AT_FDCWD, ... ), which according
to open(2) man page is identical.

Since all targets now have openat, remove the ifdef around sys_openat
and openat: case in do_syscall.

Cc: Mikhail Ilin <m.ilin@samsung.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Mikhail Ilyin
d67f4aaae8 linux-user: /proc/self/maps content
Build /proc/self/maps doing a match against guest memory translation table.
Output only that map records which are valid for guest memory layout.

Signed-off-by: Mikhail Ilyin <m.ilin@samsung.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2014-08-22 15:06:33 +03:00
Markus Armbruster
0a156f7c75 vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().

Commit 57322b7 did this all over block, but one more bdrv_getlength()
has crept in since.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:10:12 +02:00
Fam Zheng
cbf95a0b11 blkdebug: Delete BH in bdrv_aio_cancel
Otherwise error_callback_bh will access the already released acb.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:07:00 +02:00
Stefan Hajnoczi
8d9eb33ca0 qemu-iotests: add test case 101 for short file I/O
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:01:12 +02:00
Stefan Hajnoczi
61ed73cff4 raw-posix: fix O_DIRECT short reads
The following O_DIRECT read from a <512 byte file fails:

  $ truncate -s 320 test.img
  $ qemu-io -n -c 'read -P 0 0 512' test.img
  qemu-io: can't open device test.img: Could not read image for determining its format: Invalid argument

Note that qemu-io completes successfully without the -n (O_DIRECT)
option.

This patch fixes qemu-iotests ./check -nocache -vmdk 059.

Cc: qemu-stable@nongnu.org
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:00:56 +02:00
Peter Lieven
d832fb4d66 block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 10:55:22 +02:00
Peter Maydell
fd3cced366 Merge remote-tracking branch 'remotes/otubo/seccomp' into staging
* remotes/otubo/seccomp:
  seccomp: add semctl() to the syscall whitelist

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-21 12:48:44 +01:00
Michael Tokarev
13b552c2f4 block/vvfat.c: remove debugging code to reinit stderr if NULL
Just log to stderr unconditionally, like other similar code does.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-21 10:36:29 +02:00
Paul Moore
b22876cc2f seccomp: add semctl() to the syscall whitelist
QEMU needs to call semctl() for correct operation.  This particular
problem was identified on shutdown with the following commandline:

 # qemu -sandbox on -monitor stdio \
   -device intel-hda -device hda-duplex -vnc :0

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2014-08-21 10:29:16 +02:00
Michael S. Tsirkin
288d332202 hostmem: set MPOL_MF_MOVE
When memory is allocated on a wrong node, MPOL_MF_STRICT
doesn't move it - it just fails the allocation.
A simple way to reproduce the failure is with mlock=on
realtime feature.

The code comment actually says: "ensure policy won't be ignored"
so setting MPOL_MF_MOVE seems like a better way to do this.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-20 21:15:56 +02:00
David Hildenbrand
c8e2085d8e kvm: run cpu state synchronization on target vcpu thread
As already done for kvm_cpu_synchronize_state(), let's trigger
kvm_arch_put_registers() via run_on_cpu() for kvm_cpu_synchronize_post_reset()
and kvm_cpu_synchronize_post_init().

This way, we make sure that the register synchronizing ioctls are
called from the proper vcpu thread; this avoids calls to
synchronize_rcu() in the kernel.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-20 15:21:00 +02:00
Max Reitz
911864c6e5 iotests: Add test for image filename construction
Testing a real in-use protocol such as NBD is hard; testing blkdebug and
blkverify in its stead is easier and tests basically the same
functionality.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:33:42 +02:00
Max Reitz
fafcfe228d quorum: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
2019d68b3b nbd: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
74b36b2eb8 blkverify: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
2c31b04c94 blkdebug: Implement bdrv_refresh_filename()
Because blkdebug cannot simply create a configuration file, simply
refuse to reconstruct a plain filename and only generate an options
QDict from the rules instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
91af701412 block: Add bdrv_refresh_filename()
Some block devices may not have a filename in their BDS; and for some,
there may not even be a normal filename at all. To work around this, add
a function which tries to construct a valid filename for the
BDS.filename field.

If a filename exists or a block driver is able to reconstruct a valid
filename (which is placed in BDS.exact_filename), this can directly be
used.

If no filename can be constructed, we can still construct an options
QDict which is then converted to a JSON object and prefixed with the
"json:" pseudo protocol prefix. The QDict is placed in
BDS.full_open_options.

For most block drivers, this process can be done automatically; those
that need special handling may define a .bdrv_refresh_filename() method
to fill BDS.exact_filename and BDS.full_open_options themselves.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
zhanghailiang
1bdb176ac5 virtio-blk: fix reference a pointer which might be freed
In function virtio_blk_handle_request, it may freed memory pointed by req,
So do not access member of req after calling this function.

Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:57:05 +02:00
Stefan Hajnoczi
466560b9fc virtio-blk: allow block_resize with dataplane
Now that block_resize acquires the AioContext we can safely allow
resizing the disk.

Reported-by: Andrey Korolyov <andrey@xdel.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:53:52 +02:00
Stefan Hajnoczi
927e0e769f block: acquire AioContext in qmp_block_resize()
Make block_resize safe for dataplane where another thread may be running
the BlockDriverState's AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:53:42 +02:00
Kevin Wolf
6ffb4cb6fd qemu-iotests: Fix 028 reference output for qed
We need to filter out driver-specific options in the "Formatting..."
string printed by qemu when creating the backup image.

Reported-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Peter Wu <peter@lekensteyn.nl>
2014-08-20 11:51:28 +02:00
Ming Lei
61ff8cfbec test-coroutine: test cost introduced by coroutine
This test runs dummy function with coroutine by using
two enter and one yield since which is a common usage.

So we can see the cost introduced by corouting for running
one function, for example:

	Run operation 20000000 iterations 4.841071 s, 4131K operations/s
	242ns per coroutine

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
a1cb48a3bf iotests: Add test for qcow2's cache options
Add a test which tests various combinations of qcow2's cache options
(some of which are valid, some of which are not).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
6c1c8d5d3e qcow2: Add runtime options for cache sizes
Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
02004bd4ba qcow2: Use g_try_new0() for cache array
With a variable cache size, the number given to qcow2_cache_create() may
be huge. Therefore, use g_try_new0().

While at it, use g_new0() instead of g_malloc0() for allocating the
Qcow2Cache object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
440ba08aea qcow2: Constant cache size in bytes
Specifying the metadata cache sizes in clusters results in less clusters
(and much less bytes) covered for small cluster sizes and vice versa.
Using a constant byte size reduces this difference, and makes it
possible to manually specify the cache size in an easily comprehensible
unit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Maria Kustova
18a7d0c56e runner: Kill a program under test by time-out
If a program under test get frozen, the test should finish and report about its
failure.
In such cases the runner waits for 10 minutes until the program ends its
execution. After this time-out the program will be terminated and the test will
be marked as failed.

For current limitation of test image size to 10 MB as a maximum an execution of
each command takes about several seconds in general, so 10 minutes is enough to
discriminate freeze, but not drastically increase an overall test duration.

Signed-off-by: Maria Kustova <maria.k@catit.be>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Maria Kustova
9d256ca616 runner: Add an argument for test duration
After the specified duration the runner stops executing new tests, but it
doesn't interrupt running ones.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
d4df3dbc02 block: Drop some superfluous casts from void *
They clutter the code.  Unfortunately, I can't figure out how to make
Coccinelle drop all of them, so I have to settle for common special
cases:

    @@
    type T;
    T *pt;
    void *pv;
    @@
    - pt = (T *)pv;
    + pt = pv;
    @@
    type T;
    @@
    - (T *)
      (\(g_malloc\|g_malloc0\|g_realloc\|g_new\|g_new0\|g_renew\|
	 g_try_malloc\|g_try_malloc0\|g_try_realloc\|
	 g_try_new\|g_try_new0\|g_try_renew\)(...))

Topped off with minor manual style cleanups.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
08193dd52b qemu-io-cmds: g_renew() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
02c4f26b15 block: Use g_new() & friends to avoid multiplying sizes
g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons.
One, it catches multiplication overflowing size_t.  Two, it returns
T * rather than void *, which lets the compiler catch more type
errors.

Perhaps a conversion to g_malloc_n() would be neater in places, but
that's merely four years old, and we can't use such newfangled stuff.

This commit only touches allocations with size arguments of the form
sizeof(T), plus two that use 4 instead of sizeof(uint32_t).  We can
make the others safe by converting to g_malloc_n() when it becomes
available to us in a couple of years.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
5839e53bbc block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

Patch created with Coccinelle, with two manual changes on top:

* Add const to bdrv_iterate_format() to keep the types straight

* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
  inexplicably misses

Coccinelle semantic patch:

    @@
    type T;
    @@
    -g_malloc(sizeof(T))
    +g_new(T, 1)
    @@
    type T;
    @@
    -g_try_malloc(sizeof(T))
    +g_try_new(T, 1)
    @@
    type T;
    @@
    -g_malloc0(sizeof(T))
    +g_new0(T, 1)
    @@
    type T;
    @@
    -g_try_malloc0(sizeof(T))
    +g_try_new0(T, 1)
    @@
    type T;
    expression n;
    @@
    -g_malloc(sizeof(T) * (n))
    +g_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc(sizeof(T) * (n))
    +g_try_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_malloc0(sizeof(T) * (n))
    +g_new0(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc0(sizeof(T) * (n))
    +g_try_new0(T, n)
    @@
    type T;
    expression p, n;
    @@
    -g_realloc(p, sizeof(T) * (n))
    +g_renew(T, p, n)
    @@
    type T;
    expression p, n;
    @@
    -g_try_realloc(p, sizeof(T) * (n))
    +g_try_renew(T, p, n)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Peter Maydell
2656eb7c59 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140819' into staging
target-arm:
 * fix preferred return address for A64 BRK insn
 * implement AArch64 single-stepping
 * support loading gzip compressed AArch64 kernels
 * use correct PSCI function IDs in the DT when KVM uses PSCI 0.2
 * minor cleanups

# gpg: Signature made Tue 19 Aug 2014 19:04:09 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"

* remotes/pmaydell/tags/pull-target-arm-20140819:
  arm: stellaris: Remove misleading address_space_mem var
  arm: armv7m: Rename address_space_mem -> system_memory
  aarch64: Allow -kernel option to take a gzip-compressed kernel.
  loader: Add load_image_gzipped function.
  arm: cortex-a9: Fix cache-line size and associativity
  arm/virt: Use PSCI v0.2 function IDs in the DT when KVM uses PSCI v0.2
  target-arm: Rename QEMU PSCI v0.1 definitions
  target-arm: Implement MDSCR_EL1 as having state
  target-arm: Implement ARMv8 single-stepping for AArch32 code
  target-arm: Implement ARMv8 single-step handling for A64 code
  target-arm: A64: Avoid duplicate exit_tb(0) in non-linked goto_tb
  target-arm: Set PSTATE.SS correctly on exception return from AArch64
  target-arm: Correctly handle PSTATE.SS when taking exception to AArch32
  target-arm: Don't allow AArch32 to access RES0 CPSR bits
  target-arm: Adjust debug ID registers per-CPU
  target-arm: Provide both 32 and 64 bit versions of debug registers
  target-arm: Allow STATE_BOTH reginfo descriptions for more than cp14
  target-arm: Collect up the debug cp register definitions
  target-arm: Fix return address for A64 BRK instructions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-20 09:55:42 +01:00
Peter Maydell
302fa28378 Revert "memory: Use canonical path component as the name"
This reverts commit b0225c2c0d
(which breaks building with Xen enabled and also leaks memory).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 20:05:46 +01:00
Peter Crosthwaite
14a906f755 arm: stellaris: Remove misleading address_space_mem var
It's a MemoryRegion and not an AddressSpace. But since it's single use,
just inline the get_system_memory() call to the only usage to remove it.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: d6914047e10b956514cfaa5f391ef56c7d851b34.1408347860.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:40 +01:00
Peter Crosthwaite
6e9322dea3 arm: armv7m: Rename address_space_mem -> system_memory
This argument is a MemoryRegion and not an AddressSpace.

"Address space" means something quite different to "memory region"
in QEMU parlance so rename the variable to reduce confusion.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: f666cf7f2318d9b461b1e320a45bf0d82da9b7dd.1408347860.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:40 +01:00
Richard W.M. Jones
6f5d3cbe88 aarch64: Allow -kernel option to take a gzip-compressed kernel.
On aarch64 it is the bootloader's job to uncompress the kernel.  UEFI
and u-boot bootloaders do this automatically when the kernel is
gzip-compressed.

However the qemu -kernel option does not do this.  The following
command does not work:

  qemu-system-aarch64 [...] -kernel /boot/vmlinuz

because it tries to execute the gzip-compressed data.

This commit lets gzip-compressed kernels be uncompressed
transparently.

Currently this is only done when emulating aarch64.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1407831259-2115-3-git-send-email-rjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:40 +01:00
Richard W.M. Jones
235e74afcb loader: Add load_image_gzipped function.
As the name suggests this lets you load a ROM/disk image that is
gzipped.  It is uncompressed before storing it in guest memory.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1407831259-2115-2-git-send-email-rjones@redhat.com
[PMM: removed stray space before ')']
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:40 +01:00
Peter Crosthwaite
f7838b5290 arm: cortex-a9: Fix cache-line size and associativity
For A9, The cache associativity is 4 and the lines size is 32B.
Self identify in CCSIDR accordingly. Cache size remains at 16k.

QEMU doesn't emulate caches, but we should still report the correct
cache-line size to the guest. Some guests (like u-boot) complain if
the cache-line size mismatches a requested flush or invalidate
operation.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 1de6bd40155a1d2f2e93e24b1b1d1d677a432641.1408346233.git.peter.crosthwaite@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:40 +01:00
Christoffer Dall
863714ba6c arm/virt: Use PSCI v0.2 function IDs in the DT when KVM uses PSCI v0.2
The current code supplies the PSCI v0.1 function IDs in the DT even when
KVM uses PSCI v0.2.

This will break guest kernels that only support PSCI v0.1 as they will
use the IDs provided in the DT.  Guest kernels with PSCI v0.2 support
are not affected by this patch, because they ignore the function IDs in
the device tree and rely on the architecture definition.

Define QEMU versions of the constants and check that they correspond to
the Linux defines on Linux build hosts.  After this patch, both guest
kernels with PSCI v0.1 support and guest kernels with PSCI v0.2 should
work.

Tested on TC2 for 32-bit and APM Mustang for 64-bit (aarch64 guest
only).  Both cases tested with 3.14 and linus/master and verified I
could bring up 2 cpus with both guest kernels.  Also tested 32-bit with
a 3.14 host kernel with only PSCI v0.1 and both guests booted here as
well.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:25 +01:00
Christoffer Dall
a65c9c17ce target-arm: Rename QEMU PSCI v0.1 definitions
The function IDs for PSCI v0.1 are exported by KVM and defined as
KVM_PSCI_FN_<something>.  To build using these defines in non-KVM code,
QEMU defines these IDs locally and check their correctness against the
KVM headers when those are available.

However, the naming scheme used for QEMU (almost) clashes with the PSCI
v0.2 definitions from Linux so to avoid unfortunate naming when we
introduce local PSCI v0.2 defines, rename the current local defines with
QEMU_ prependend and clearly identify the PSCI version as v0.1 in the
defines.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 19:02:03 +01:00
Peter Maydell
0e5e8935bb target-arm: Implement MDSCR_EL1 as having state
Now that all the new code to support single-stepping is in
place, wire up the guest-visible MDSCR_EL1, so the guest
can enable single-stepping.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
50225ad0c1 target-arm: Implement ARMv8 single-stepping for AArch32 code
ARMv8 single-stepping requires the exception level that controls
the single-stepping to be in AArch64 execution state, but the
code being stepped may be in AArch64 or AArch32. Implement the
necessary support code for single-stepping AArch32 code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
7ea47fe7be target-arm: Implement ARMv8 single-step handling for A64 code
Implement ARMv8 software single-step handling for A64 code:
correctly update the single-step state machine and generate
debug exceptions when stepping A64 code.

This patch has no behavioural change since MDSCR_EL1.SS can't
be set by the guest yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
cc9c1ed14e target-arm: A64: Avoid duplicate exit_tb(0) in non-linked goto_tb
If gen_goto_tb() decides not to link the two TBs, then the
fallback path generates unnecessary code:
 * if singlestep is enabled then we generate unreachable code
   after the gen_exception_internal(EXCP_DEBUG)
 * if singlestep is disabled then we will generate exit_tb(0)
   twice, once in gen_goto_tb() and once coming out of the
   main loop with is_jmp set to DISAS_JUMP

Correct these deficiencies by only emitting exit_tb() in the
non-singlestep case, in which case we can use DISAS_TB_JUMP
to suppress the main-loop exit_tb().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
3a2982038a target-arm: Set PSTATE.SS correctly on exception return from AArch64
Set the PSTATE.SS bit correctly on exception returns from AArch64,
as required by the debug single-step functionality.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
662cefb775 target-arm: Correctly handle PSTATE.SS when taking exception to AArch32
When an exception is taken to AArch32, we must clear the PSTATE.SS
bit for the exception handler, and must also ensure that the SS bit
is not set in the value saved to SPSR_<mode>. Achieve both of these
aims by clearing the bit in uncached_cpsr before saving it to the SPSR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
4051e12c5d target-arm: Don't allow AArch32 to access RES0 CPSR bits
The CPSR has a new-in-v8 execution state bit (IL), and
also some state which has effects in AArch32 but appears
only in the SPSR format (SS) but is RES0 in the CPSR.

Add the IL bit to CPSR_EXEC, and enforce that guest direct
reads and writes to CPSR can't read or write the RES0
bits, so the guest can't get at the SS bit which we store
in uncached_cpsr. This includes not permitting exception
returns to copy reserved bits from an SPSR into CPSR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
48eb3ae64b target-arm: Adjust debug ID registers per-CPU
Allow each CPU type to specify the value for the debug ID
registers, by putting them in the ARMCPU struct, and use
the resulting information to only expose the correct number
of watchpoint and breakpoint registers for the CPU.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
10aae1049f target-arm: Provide both 32 and 64 bit versions of debug registers
Bring the 32 bit and 64 bit views of the debug registers into
line by providing the same set of registers in both cases.
(This still isn't a complete set, but it is consistent.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
58a1d8ceab target-arm: Allow STATE_BOTH reginfo descriptions for more than cp14
Currently the STATE_BOTH shorthand for allowing a single reginfo struct
to define handling for both AArch32 and AArch64 views of a register
only permits this where the AArch32 view is in cp15. It turns out that
the debug registers in cp14 also have neatly lined up encodings;
allow these also to share reginfo structs by permitting a STATE_BOTH
reginfo to specify the .cp field (and continue to default to 15 if
it is not specified).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
503006983a target-arm: Collect up the debug cp register definitions
At the moment we have a mixed set of mostly dummy register
definitions for various debug related registers which have
been added piecemeal in order to get Linux kernels to boot.
In preparation for actually implementing debug support,
bring them all together into one place.

This commit doesn't change behaviour: we still expose
exactly the same registers and behaviour to the guest
in all configurations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-08-19 19:02:03 +01:00
Peter Maydell
229a138d74 target-arm: Fix return address for A64 BRK instructions
When we take an exception resulting from a BRK instruction,
the architecture requires that the "preferred return address"
reported to the exception handler is the address of the BRK
itself, not the following instruction (like undefined
insns, and in contrast with SVC, HVC and SMC). Follow this,
rather than incorrectly reporting the address of the following
insn.

(We do get this correct for the A32/T32 BKPT insns.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
2014-08-19 18:56:24 +01:00
Peter Maydell
0e4a773705 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
SCSI changes that enable sending vendor-specific commands via virtio-scsi.

Memory changes for QOMification and automatic tracking of MR lifetime.

# gpg: Signature made Mon 18 Aug 2014 13:03:09 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream:
  mtree: remove write-only field
  memory: Use canonical path component as the name
  memory: Use memory_region_name for name access
  memory: constify memory_region_name
  exec: Abstract away ref to memory region names
  loader: Abstract away ref to memory region names
  tpm_tis: remove instance_finalize callback
  memory: remove memory_region_destroy
  memory: convert memory_region_destroy to object_unparent
  ioport: split deletion and destruction
  nic: do not destroy memory regions in cleanup functions
  vga: do not dynamically allocate chain4_alias
  sysbus: remove unused function sysbus_del_io
  qom: object: move unparenting to the child property's release callback
  qom: object: delete properties before calling instance_finalize
  virtio-scsi: implement parse_cdb
  scsi-block, scsi-generic: implement parse_cdb
  scsi-block: extract scsi_block_is_passthrough
  scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
  scsi-bus: prepare scsi_req_new for introduction of parse_cdb

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 13:00:57 +01:00
Peter Maydell
8e6e2c2ae7 Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
* remotes/qmp-unstable/queue/qmp:
  monitor: fix use after free
  dump.c: Fix memory leak issue in cleanup processing for dump_init()
  monitor: Remove hardcoded watchdog event names

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 10:30:36 +01:00
Michael S. Tsirkin
b3dd1b8c29 monitor: fix use after free
The function monitor_fdset_dup_fd_find_remove() references member of
'mon_fdset' which - when remove flag is set - may be freed in function
monitor_fdset_cleanup().
remove is set by monitor_fdset_dup_fd_remove which in practice
does not need the returned value, so make it void,
and return -1 from monitor_fdset_dup_fd_find_remove.

Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-08-18 14:39:10 -04:00
Chen Gang
2928207ac1 dump.c: Fix memory leak issue in cleanup processing for dump_init()
In dump_init(), when failure occurs, need notice about 'fd' and memory
mapping. So call dump_cleanup() for it (need let all initializations at
front).

Also simplify dump_cleanup(): remove redundant 'ret' and redundant 'fd'
checking.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-08-18 14:39:10 -04:00
Hani Benhabiles
4bb08af34e monitor: Remove hardcoded watchdog event names
Signed-off-by: Hani Benhabiles <hani@linux.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-08-18 14:39:10 -04:00
Peter Maydell
073fd73e56 Merge remote-tracking branch 'remotes/amit/for-2.2' into staging
* remotes/amit/for-2.2:
  virtio-serial: search for duplicate port names before adding new ports
  virtio-serial: create a linked list of all active devices

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-18 18:24:38 +01:00
Amit Shah
d0a0bfe672 virtio-serial: search for duplicate port names before adding new ports
Before adding new ports to VirtIOSerial devices, check if there's a
conflict in the 'name' parameter.  This ensures two virtserialports with
identical names are not initialized.

Reported-by: <mazhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2014-08-18 22:42:49 +05:30
Amit Shah
a1857ad1ac virtio-serial: create a linked list of all active devices
To ensure two virtserialports don't get added to the system with the
same 'name' parameter, we need to access all the ports on all the
devices added, and compare the names.

We currently don't have a list of all VirtIOSerial devices added to the
system.  This commit adds a simple linked list in which devices are put
when they're initialized, and removed when they go away.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2014-08-18 22:42:37 +05:30
Peter Maydell
08ab59770d Merge remote-tracking branch 'remotes/mcayland/qemu-sparc' into staging
* remotes/mcayland/qemu-sparc:
  target-sparc64: implement Short Floating-Point Store Instructions
  apb: add IOMMU flush register implementation
  sun4u: switch second PCI-ebus bridge BAR over to PCI IO space

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-18 12:55:02 +01:00
Peter Maydell
da398fcc25 Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request

# gpg: Signature made Fri 15 Aug 2014 18:04:23 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request: (55 commits)
  qcow2: fix new_blocks double-free in alloc_refcount_block()
  image-fuzzer: Reduce number of generator functions in __init__
  image-fuzzer: Add generators of L1/L2 tables
  image-fuzzer: Add fuzzing functions for L1/L2 table entries
  docs: Expand the list of supported image elements with L1/L2 tables
  image-fuzzer: Public API for image-fuzzer/runner/runner.py
  image-fuzzer: Generator of fuzzed qcow2 images
  image-fuzzer: Fuzzing functions for qcow2 images
  image-fuzzer: Tool for fuzz tests execution
  docs: Specification for the image fuzzer
  ide: only constrain read/write requests to drive size, not other types
  virtio-blk: Correct bug in support for flexible descriptor layout
  libqos: Change free function called in malloc
  libqos: Correct mask to align size to PAGE_SIZE in malloc-pc
  libqtest: add QTEST_LOG for debugging qtest testcases
  ide: Fix segfault when flushing a device that doesn't exist
  qemu-options: add missing -drive discard option to cmdline help
  parallels: 2TB+ parallels images support
  parallels: split check for parallels format in parallels_open
  parallels: replace tabs with spaces in block/parallels.c
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-18 11:59:27 +01:00
Paolo Bonzini
f54bb15f9d mtree: remove write-only field
ml->printed is never set to true.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Peter Crosthwaite
b0225c2c0d memory: Use canonical path component as the name
Rather than having the name as separate state. This prepares support
for creating a MemoryRegion dynamically (i.e. without
memory_region_init() and friends) and the MemoryRegion still getting
a usable name.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Peter Crosthwaite
3fb18b4da7 memory: Use memory_region_name for name access
Despite being local to memory.c, use the helper function. This prepares
support for fully QOMifiying the name field of MR (which will remove
this state from MR completely).

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Peter Crosthwaite
5d546d4b65 memory: constify memory_region_name
It doesn't change the MR and some prospective call sites will have
const MRs at hand.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Peter Crosthwaite
83234bf2fa exec: Abstract away ref to memory region names
Use the function provided rather than spying on the struct.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Peter Crosthwaite
401cf7fdc4 loader: Abstract away ref to memory region names
Use the function provided rather than spying on the struct.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Paolo Bonzini
c54779f962 tpm_tis: remove instance_finalize callback
It is never used, since ISA device are not hot-unpluggable.

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Paolo Bonzini
469b046ead memory: remove memory_region_destroy
The function is empty after the previous patch, so remove it.

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Paolo Bonzini
d8d9581460 memory: convert memory_region_destroy to object_unparent
Explicitly call object_unparent in the few places where we
will re-create the memory region.  If the memory region is
simply being destroyed as part of device teardown, let QOM
handle it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:20 +02:00
Paolo Bonzini
e3fb0ade83 ioport: split deletion and destruction
Of the two functions portio_list_del and portio_list_destroy,
the latter is just freeing a memory area.  However, portio_list_del
is the logical equivalent of memory_region_del_subregion so
destruction of memory regions does not belong there.

Actually, neither of these APIs are in use; portio is mostly used by
ISA devices or VGAs, and neither of these is currently hot-unpluggable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Paolo Bonzini
eed7930950 nic: do not destroy memory regions in cleanup functions
The memory regions should be destroyed in the unrealize function;
since these NICs are not even qdev-ified, they cannot be unplugged
and they do not have to do anything to destroy their memory regions.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Paolo Bonzini
ad37168cbd vga: do not dynamically allocate chain4_alias
Instead, add a boolean variable to indicate the presence of the region.
This avoids a repeated malloc/free (later we can also avoid the
add_child/unparent by changing the offset/size of the alias).

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Paolo Bonzini
1dd79a237e sysbus: remove unused function sysbus_del_io
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Paolo Bonzini
bffc687d66 qom: object: move unparenting to the child property's release callback
This ensures that the unparent callback is called automatically
when the parent object is finalized.

Note that there's no need to keep a reference neither in
object_unparent nor in object_finalize_child_property.  The
reference held by the child property itself will do.

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Paolo Bonzini
76a6e1cc7c qom: object: delete properties before calling instance_finalize
This ensures that the children's unparent callback will still
have a usable parent.

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-17 23:25:24 +02:00
Artyom Tarasenko
2a5fade753 target-sparc64: implement Short Floating-Point Store Instructions
Implement Short Floating-Point Store Instructions as described
in the chapter 13.5.2 of UltraSPARC-IIi User's Manual.

Particularly this instructions are used by NetBSD 4.0.1+ /sparc64

Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2014-08-17 13:24:27 +01:00
Mark Cave-Ayland
b87b0644bc apb: add IOMMU flush register implementation
The IOMMU flush register is a write-only register used to remove entries from the
hardware TLB. Allow guest writes to this register as a no-op, and return a value
of 0 for reads.

This fixes IOMMU DMA operations under NetBSD SPARC64.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2014-08-17 13:13:01 +01:00
Mark Cave-Ayland
a1cf8be550 sun4u: switch second PCI-ebus bridge BAR over to PCI IO space
The ebus is the sun4u equivalent of the old ISA bus which is already mapped at
the beginning of PCI IO space within QEMU. NetBSD attempts to find the physical
addresses of devices connected to the ebus by parsing the BARs of the PCI-ebus
bridge and using the base address found by matching both the address space
type and range for a particular ebus address.

Since the second PCI-ebus bridge BAR is already aliased onto IO space, switch
the BAR over to match and reduce the size to 0x1000 which is enough to cover
all the legacy ioport devices whilst leaving the remaining IO space for other
PCI devices. This allows NetBSD SPARC64 to correctly detect and access devices
on the ebus.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2014-08-17 13:12:52 +01:00
Peter Maydell
142f4ac5d5 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-15' into staging
trivial patches for 2014-08-15

# gpg: Signature made Fri 15 Aug 2014 16:13:03 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514  66A7 BEE5 9D74 A4C3 D7DB

* remotes/mjt/tags/trivial-patches-2014-08-15:
  ivshmem: check the value returned by fstat()
  l2cap: fix access to freed memory
  intc: i8259: Convert Array allocation to g_new0
  ppc: convert g_new(qemu_irq usages to g_new0
  ssi: xilinx_spi: Initialise CS GPIOs as NULL
  vl: free err
  qemu-options.hx: fix typo about l2tpv3
  vmxnet3: don't use 'Yoda conditions'
  vl: don't use 'Yoda conditions'
  spice: don't use 'Yoda conditions'
  don't use 'Yoda conditions'
  isa-bus: don't use 'Yoda conditions'
  audio: don't use 'Yoda conditions'
  usb: don't use 'Yoda conditions'
  CODING_STYLE: Section about conditional statement
  pci-host: update uncorresponding description
  pci-host: update obsolete reference about piix_pci.c
  qemu-options.hx: fix a typo of chardev
  memory: Update obsolete comment about AddrRange field type
  apic: Fix reported DFR content

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-15 18:44:48 +01:00
Stefan Hajnoczi
39ba3bf69c qcow2: fix new_blocks double-free in alloc_refcount_block()
Commit de82815db1 ("qcow2: Handle failure
for potentially large allocations") introduced a double-free of
new_blocks in the alloc_refcount_block() error path.

The qemu-iotests qcow2 026 test case was failing because qemu-io
segfaulted.

Make sure new_blocks is NULL after we free it the first time.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:26 +01:00
Maria Kustova
94c83a24c1 image-fuzzer: Reduce number of generator functions in __init__
Some issues can be found only when a fuzzed image has a partial structure,
e.g. has L1/L2 tables but no refcount ones. Generation of an entirely
defined image limits these cases. Now the Image constructor creates only
a header and a backing file name (if any), other image elements are generated
in the 'create_image' API.

Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
38eb193b8b image-fuzzer: Add generators of L1/L2 tables
Entries in L1/L2 entries are based on a portion of random guest clusters.
L2 entries contain offsets to host image clusters filled with random data.
Clusters for L1/L2 tables and guest data are selected randomly.

Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
eeadd92487 image-fuzzer: Add fuzzing functions for L1/L2 table entries
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
489cb4d7f9 docs: Expand the list of supported image elements with L1/L2 tables
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
071e649194 image-fuzzer: Public API for image-fuzzer/runner/runner.py
__init__.py provides the public API required by the test runner

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
e123232331 image-fuzzer: Generator of fuzzed qcow2 images
The layout submodule of the qcow2 package creates a random valid image,
randomly selects some amount of its fields, fuzzes them and write the fuzzed
image to the file. Fuzzing process can be controlled by an external
configuration.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
6d5e9372f6 image-fuzzer: Fuzzing functions for qcow2 images
The fuzz submodule of the qcow2 image generator contains fuzzing functions for
image fields.
Each fuzzing function contains a list of constraints and a call of a helper
function that randomly selects a fuzzed value satisfied to one of constraints.
For now constraints include only known as invalid or potentially dangerous
values. But after investigation of code coverage by fuzz tests they will be
expanded by heuristic values based on inner checks and flows of a program
under test.

Now fuzzing of a header, header extensions and a backing file name is
supported.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
ad724dd728 image-fuzzer: Tool for fuzz tests execution
The purpose of the test runner is to prepare the test environment (e.g. create
a work directory, a test image, etc), execute a program under test with
parameters, indicate a test failure if the program was killed during the test
execution and collect core dumps, logs and other test artifacts.

The test runner doesn't depend on an image format, so it can be used with any
external image generator.

[Fixed path to qcow2 format module "qcow2" instead of "../qcow2" since
runner.py is no longer in a sub-directory.
--Stefan]

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Maria Kustova
d6dc10aad8 docs: Specification for the image fuzzer
'Overall fuzzer requirements' chapter contains the current product vision and
features done and to be done. This chapter is still in progress.

Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Michael Tokarev
d66168ed68 ide: only constrain read/write requests to drive size, not other types
Commit 58ac321135 introduced a check to ide dma processing which
constrains all requests to drive size.  However, apparently, some
valid requests (like TRIM) does not fit in this constraint, and
fails in 2.1.  So check the range only for reads and writes.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Marc Marí
a83ceea8ff virtio-blk: Correct bug in support for flexible descriptor layout
Without this correction, only a three descriptor layout is accepted, and
requests with just two descriptors are not completed and no error message is
displayed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Marc Marí <marc.mari.barcelo@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Marc Marí
220c1a2fad libqos: Change free function called in malloc
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Marc Marí <marc.mari.barcelo@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Marc Marí
f75ffc5857 libqos: Correct mask to align size to PAGE_SIZE in malloc-pc
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Marí <marc.mari.barcelo@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Marc Marí
ae74f18782 libqtest: add QTEST_LOG for debugging qtest testcases
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Marí <marc.mari.barcelo@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:14 +01:00
Kevin Wolf
f7f3ff1da0 ide: Fix segfault when flushing a device that doesn't exist
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Peter Lieven
2f7133b2e5 qemu-options: add missing -drive discard option to cmdline help
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
d25d598020 parallels: 2TB+ parallels images support
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img and also adds specific
check for an incorrect image. Images with block size greater than
INT_MAX/513 are not supported. The biggest available Parallels image
cluster size in the field is 1 Mb. Thus this limit will not hurt
anyone.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
418a7adb77 parallels: split check for parallels format in parallels_open
and rework error path a bit. There is no difference at the moment, but
the code will be definitely shorter when additional processing will
be required for WithouFreSpacExt

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
f08e2f8465 parallels: replace tabs with spaces in block/parallels.c
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
8c27d54fa0 parallels: extend parallels format header with actual data values
Parallels image format has several additional fields inside:
- nb_sectors is actually 64 bit wide. Upper 32bits are not used for
  images with signature "WithoutFreeSpace" and must be explicitly
  zeroed according to Parallels. They will be used for images with
  signature "WithouFreSpacExt"
- inuse is magic which means that the image is currently opened for
  read/write or was not closed correctly, the magic is 0x746f6e59
- data_off is the location of the first data block. It can be zero
  and in this case data starts just beyond the header aligned to
  512 bytes. Though this field does not matter for read-only driver

This patch adds these values to struct parallels_header and adds
proper handling of nb_sectors for currently supported WithoutFreeSpace
images.

WithouFreSpacExt will be covered in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Cornelia Huck
2f5f70fa5f dataplane: stop trying on notifier error
If we fail to set up guest or host notifiers, there's no use trying again
every time the guest kicks, so disable dataplane in that case.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Cornelia Huck
f9907ebc4c dataplane: fail notifier setting gracefully
The dataplane code is currently doing a hard exit if it fails to set
up either guest or host notifiers. In practice, this may mean that a
guest suddenly dies after a dataplane device failed to come up (e.g.,
when a file descriptor limit is hit for tne nth device).

Let's just try to unwind the setup instead and return.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Cornelia Huck
267e1a204c dataplane: print why starting failed
Setting up guest or host notifiers may fail, but the user will have
no idea why: Let's print the error returned by the callback.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Gonglei
16b38080e3 channel-posix: using qemu_set_nonblock() instead of fcntl(O_NONBLOCK)
Technically, fcntl(soc, F_SETFL, O_NONBLOCK)
is incorrect since it clobbers all other file flags.
We can use F_GETFL to get the current flags, set or
clear the O_NONBLOCK flag, then use F_SETFL to set the flags.

Using the qemu_set_nonblock() wrapper.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Wangxin <wangxinxin.wang@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Gonglei
4ff12bdb1d qemu-char: using qemu_set_nonblock() instead of fcntl(O_NONBLOCK)
Technically, fcntl(soc, F_SETFL, O_NONBLOCK)
is incorrect since it clobbers all other file flags.
We can use F_GETFL to get the current flags, set or
clear the O_NONBLOCK flag, then use F_SETFL to set the flags.

Using the qemu_set_nonblock() wrapper.

Signed-off-by: Wangxin <wangxinxin.wang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Mark Cave-Ayland
271dddd133 cmd646: synchronise UDMA interrupt status with DMA interrupt status
Make sure that both registers are synchronised when being accessed through
PCI configuration space.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Mark Cave-Ayland
1d113ef874 cmd646: allow MRDMODE interrupt status bits clearing from PCI config space
Make sure that we also update the normal DMA interrupt status bits at the
same time, and alter the IRQ if being cleared accordingly.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Mark Cave-Ayland
dab91a1e13 cmd646: switch cmd646_update_irq() to accept PCIDevice instead of PCIIDEState
This is in preparation for adding configuration space accessors which accept
PCIDevice as a parameter.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Mark Cave-Ayland
5bbc0a703d cmd646: synchronise DMA interrupt status with UDMA interrupt status
Make sure that the standard DMA interrupt status bits reflect any changes made
to the UDMA interrupt status bits. The CMD646U2 datasheet claims that these
bits are equivalent, and they must be synchronised for guests that manipulate
both registers.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Mark Cave-Ayland
58f16a7b47 cmd646: add constants for CNTRL register access
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
John Snow
e42de189e8 qtest/ide: Fix small memory leak
For libqos debugging purposes, it's nice to
be able to assert that tests and associated libraries
have no memory leaks. To that end, free up the
trivial cmdline leak.

The remaining leaks caused by pc_alloc_init are fixed
instead by my first-fit pc_alloc implementation already
on the qemu-devel mailing list.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
John Snow
6ce7100e7f libqos: allow qpci_iomap to return BAR mapping size
This patch allows qpci_iomap to return the size of the
BAR mapping that it created, to allow driver applications
(e.g, ahci-test) to make determinations about the suitability
or the mapping size, or in the specific case of AHCI, how
many ports are supported by the HBA.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
John Snow
7f2a5ae6c1 libqos: Fixes a small memory leak.
Allow users the chance to clean up the QPCIBusPC structure
by adding a small cleanup routine. Helps clear up small
memory leaks during setup/teardown, to allow for cleaner
debug output messages.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
John Snow
a7afc6b8c1 libqtest: Correct small memory leak.
Fixes a small memory leak inside of libqtest.
After we produce a test path and glib copies the string
for itself, we should clean up our temporary copy.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
John Snow
f3cdcbaee1 libqos: Correct memory leak
Fix a small memory leak inside of libqos, in the pc_alloc_init routine.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
John Snow
86298845e1 qtest: Adding qtest_memset and qmemset.
Currently, libqtest allows for memread and memwrite, but
does not offer a simple way to zero out regions of memory.
This patch adds a simple function to do so.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
John Snow
552b48f44d q35: Enable the ioapic device to be seen by qtest.
Currently, the ioapic device can not be found in a qtest environment
when requesting "irq_interrupt_in ioapic" via the qtest socket.

By mirroring how the ioapic is added in i44ofx (hw/i440/pc_piix.c),
as a child of "q35," the device is able to be seen by qtest.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
088415202b ahci: construct PIO Setup FIS for PIO commands
PIO commands should put a PIO Setup FIS in the receive area when data
transfer ends.  Currently QEMU does not do this and only places the
D2H FIS at the end of the operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
c7e73adb48 ide: make all commands go through cmd_done
AHCI has code to fill in the D2H FIS trigger the IRQ all over the place.
Centralize this in a single cmd_done callback by generalizing the existing
async_cmd_done callback.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
08ee9e3368 ide: stop PIO transfer on errors
This will provide a hook for sending the result of the command via the
FIS receive area.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
1f88f77348 ahci: remove duplicate PORT_IRQ_* constants
These are defined twice, just use one set consistently.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
fd648f10af ide: move retry constants out of BM_STATUS_* namespace
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
7e2648df86 ide: move BM_STATUS bits to pci.[ch]
They are not used by AHCI, and should not be even available there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
0e7ce54cf5 ide: fold add_status callback into set_inactive
It is now called only after the set_inactive callback.  Put the two together.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
0def37baf9 ide: remove wrong setting of BM_STATUS_INT
Similar to the case removed in commit 69c38b8 (ide/core: Remove explicit
setting of BM_STATUS_INT, 2011-05-19), the only remaining use of
add_status(..., BM_STATUS_INT) is for short PRDs.  The flag should
not be raised in this case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
4855b57639 ide: wrap start_dma callback
Make it optional and prepare for the next patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
446351236b ide: simplify start_transfer callbacks
Drop the unused return value and make the callback optional.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
c039cb1e5a ide: simplify async_cmd_done callbacks
Drop the unused return value.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
829b933b70 ide: simplify set_inactive callbacks
Drop the unused return value and make the callback optional.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
1374bec063 ide: simplify reset callbacks
Drop the unused return value and make the callback optional.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
69f72a2221 ide: stash aiocb for flushes
This ensures that operations are completed after a reset

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
14a92e5fe1 ide-test: add test for werror=stop
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:12 +01:00
Paolo Bonzini
7c282cb4c5 libqtest: add QTEST_LOG for debugging qtest testcases
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:11 +01:00
Paolo Bonzini
9e52c53b8c blkdebug: report errors on flush too
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:11 +01:00
Peter Maydell
f2c85a2f36 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
post-2.1 bugfixes

A bunch of fixes that missed 2.1 by a small margin.
If we do 2.1.1, some of these would be good candidates,
added Cc qemu-stable as appropriate.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 14 Aug 2014 17:07:25 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  pc: Get rid of pci-info leftovers
  e1000: use symbolic constants to init phy ctrl & status registers
  e1000: correctly handle phy_ctrl reserved & self-clearing bits
  ivshmem: fix building when debug mode is enabled
  acpi: align RSDP
  numa: show hex number in error message for consistency and prefix them with 0x
  pc-dimm: fix up error message
  pc-dimm: validate node property
  hw:i386: typo fix: MEMORY_HOPTLUG_DEVICE -> MEMORY_HOTPLUG_DEVICE
  hw/audio/intel-hda: Fix MSI capability address
  pc: Create 2.2 machine type
  pci: Use bus master address space for delivering MSI/MSI-X messages

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-15 17:43:51 +01:00
Peter Maydell
5c6b3c50cc Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
Tracing pull request

* remotes/stefanha/tags/tracing-pull-request:
  virtio-rng: add some trace events
  trace: add some tcg tracing support
  trace: teach lttng backend to use format strings
  trace: [tcg] Include TCG-tracing header on all targets
  trace: [tcg] Include event definitions in "trace.h"
  trace: [tcg] Generate TCG tracing routines
  trace: [tcg] Include TCG-tracing helpers
  trace: [tcg] Define TCG tracing helper routine wrappers
  trace: [tcg] Define TCG tracing helper routines
  trace: [tcg] Declare TCG tracing helper routines
  trace: [tcg] Add 'tcg' event property
  trace: [tcg] Argument type transformation machinery
  trace: [tcg] Argument type transformation rules
  trace: [tcg] Add documentation
  trace: install simpletrace SystemTap tapset
  simpletrace: add simpletrace.py --no-header option
  trace: add tracetool simpletrace_stap format
  trace: extract stap_escape() function for reuse

Conflicts:
	Makefile.objs
2014-08-15 16:37:17 +01:00
zhanghailiang
5edbdbcdf8 ivshmem: check the value returned by fstat()
The function fstat() may fail, so check its return value.

Acked-by: Levente Kurusa <lkurusa@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 19:12:58 +04:00
zhanghailiang
2c145d7a73 l2cap: fix access to freed memory
Pointer 'ch' will be used in function 'l2cap_channel_open_req_msg' after
it was previously freed in 'l2cap_channel_open'.
Assigned it to NULL after it is freed.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 19:12:48 +04:00
Peter Crosthwaite
8945c7f754 intc: i8259: Convert Array allocation to g_new0
To be more array friendly and to indicate the IRQs are initially
disconnected.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:55 +04:00
Peter Crosthwaite
aa2ac1dac3 ppc: convert g_new(qemu_irq usages to g_new0
To indicate the IRQs are initially disconnected.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:50 +04:00
Peter Crosthwaite
c75f3c041a ssi: xilinx_spi: Initialise CS GPIOs as NULL
To properly indicate they are unconnected.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:40 +04:00
Hu Tao
3a9cbfe009 vl: free err
err is not freed after use, thus causing memory leak. This patch fixes
it.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Cc: qemu-trivial@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
3952651a75 qemu-options.hx: fix typo about l2tpv3
two duplicate destport description.

s/destport/srcport/, s/destination/source/

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
f7472ca405 vmxnet3: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
28de2f883c vl: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
fe8e8327f1 spice: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
8108fd3e26 don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:07 +04:00
Gonglei
337a3e5c7d isa-bus: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Gonglei
2ab5bf67b7 audio: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Gonglei
d0657b2aab usb: don't use 'Yoda conditions'
imitate nearby code about using '!value' or 'value == NULL'

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Gonglei
2bb0020cf9 CODING_STYLE: Section about conditional statement
Yoda conditions lack readability, and QEMU has a
strict compiler configuration for checking a common
mistake like "if (dev = NULL)". Make it a written rule.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Gonglei
30dc600bbf pci-host: update uncorresponding description
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Gonglei
ef9f7b587d pci-host: update obsolete reference about piix_pci.c
piix_pci.c has been renamed into piix.c at commit
c0907c9e64

update the obsolete reference.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Liming Wang
38a24c8b74 qemu-options.hx: fix a typo of chardev
Change host to port.

Signed-off-by: Liming Wang <liming.wang@canonical.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Fam Zheng
c9cdaa3ab9 memory: Update obsolete comment about AddrRange field type
We are not 64 bit any more since

08dafab4 memory: use 128-bit integers for sizes and intermediates

but the comment is forgotten to be updated.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Jan Kiszka
d6c140a771 apic: Fix reported DFR content
IA-32 SDM, Figure 10-14: Bits 27:0 are reserved as 1.

Fixes Jailhouse hypervisor start with in-kernel irqchips off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15 18:54:06 +04:00
Peter Maydell
f2fb1da941 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches

# gpg: Signature made Fri 15 Aug 2014 14:07:42 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (59 commits)
  block: Catch !bs->drv in bdrv_check()
  iotests: Add test for image header overlap
  qcow2: Catch !*host_offset for data allocation
  qcow2: Return useful error code in refcount_init()
  mirror: Handle failure for potentially large allocations
  vpc: Handle failure for potentially large allocations
  vmdk: Handle failure for potentially large allocations
  vhdx: Handle failure for potentially large allocations
  vdi: Handle failure for potentially large allocations
  rbd: Handle failure for potentially large allocations
  raw-win32: Handle failure for potentially large allocations
  raw-posix: Handle failure for potentially large allocations
  qed: Handle failure for potentially large allocations
  qcow2: Handle failure for potentially large allocations
  qcow1: Handle failure for potentially large allocations
  parallels: Handle failure for potentially large allocations
  nfs: Handle failure for potentially large allocations
  iscsi: Handle failure for potentially large allocations
  dmg: Handle failure for potentially large allocations
  curl: Handle failure for potentially large allocations
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-15 14:49:50 +01:00
Max Reitz
908bcd540f block: Catch !bs->drv in bdrv_check()
qemu-img check calls bdrv_check() twice if the first run repaired some
inconsistencies. If the first run however again triggered corruption
prevention (on qcow2) due to very bad inconsistencies, bs->drv may be
NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:16 +02:00
Max Reitz
a42f8a3d05 iotests: Add test for image header overlap
Add a test for an image with an unallocated image header; instead of an
assertion, this should result in the image being marked corrupt.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:16 +02:00
Max Reitz
ff52aab2df qcow2: Catch !*host_offset for data allocation
qcow2_alloc_cluster_offset() uses host_offset == 0 as "no preferred
offset" for the (data) cluster range to be allocated. However, this
offset is actually valid and may be allocated on images with a corrupted
refcount table or first refcount block.

In this case, the corruption prevention should normally catch that
write anyway (because it would overwrite the image header). But since 0
is a special value here, the function assumes that nothing has been
allocated at all which it asserts against.

Because this condition is not qemu's fault but rather that of a broken
image, it shouldn't throw an assertion but rather mark the image corrupt
and show an appropriate message, which this patch does by calling the
corruption check earlier than it would be called normally (before the
assertion).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:16 +02:00
Max Reitz
8fcffa9853 qcow2: Return useful error code in refcount_init()
If bdrv_pread() returns an error, it is very unlikely that it was
ENOMEM. In this case, the return value should be passed along; as
bdrv_pread() will always either return the number of bytes read or a
negative value (the error code), the condition for checking whether
bdrv_pread() failed can be simplified (and clarified) as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
7504edf477 mirror: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the mirror block job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
5fb09cd586 vpc: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vpc block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
d6e5993197 vmdk: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vmdk block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
a67e128a4f vhdx: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vhdx block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
17cce73578 vdi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vdi block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
0f7a02379b rbd: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the rbd block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:16 +02:00
Kevin Wolf
4b6af3d58a raw-win32: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the raw-win32 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:16 +02:00
Kevin Wolf
50d4a858e6 raw-posix: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the raw-posix block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4f4896db5f qed: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qed block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
de82815db1 qcow2: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qcow2 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
0df93305f2 qcow1: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qcow1 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
f7b593d937 parallels: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the parallels block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
2347dd7b68 nfs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the nfs block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4d5a3f888c iscsi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the iscsi block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
b546a94474 dmg: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the dmg block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
8dc7a7725b curl: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the curl block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4ae7a52e43 cloop: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the cloop block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
7bf665ee35 bochs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the bochs block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
857d4f46c3 block: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses bounce buffer allocations in block.c. While at it,
convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
7d2a35cc92 block: Introduce qemu_try_blockalign()
This function returns NULL instead of aborting when an allocation fails.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Jeff Cody
23d20b5b4f block: iotest - update 084 to test static VDI image creation
This updates the VDI corruption test to also test static VDI image
creation, as well as the default dynamic image creation.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:15 +02:00
Jeff Cody
fef6070eff block: vpc - use block layer ops in vpc_create, instead of posix calls
Use the block layer to create, and write to, the image file in the VPC
.bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:15 +02:00
Jeff Cody
dddc7750d6 block: use the standard 'ret' instead of 'result'
Most QEMU code uses 'ret' for function return values. The VDI driver
uses a mix of 'result' and 'ret'.  This cleans that up, switching over
to the standard 'ret' usage.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:15 +02:00
Jeff Cody
70747862f1 block: vdi - use block layer ops in vdi_create, instead of posix calls
Use the block layer to create, and write to, the image file in the
VDI .bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Also some minor cleanup for error handling.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Jeff Cody
9a4d5ca607 block: allow bdrv_unref() to be passed NULL pointers
If bdrv_unref() is passed a NULL BDS pointer, it is safe to
exit with no operation.  This will allow cleanup code to blindly
call bdrv_unref() on a BDS that has been initialized to NULL.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Paolo Bonzini
58803ce74f test-coroutine: add baseline test that times the cost of function calls
This can be used to compute the cost of coroutine operations.  In the
end the cost of the function call is a few clock cycles, so it's pretty
cheap for now, but it may become more relevant as the coroutine code
is optimized.

For example, here are the results on my machine:

   Function call 100000000 iterations: 0.173884 s
   Yield 100000000 iterations: 8.445064 s
   Lifecycle 1000000 iterations: 0.098445 s
   Nesting 10000 iterations of 1000 depth each: 7.406431 s

One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
one coroutine allocation takes (roughly, since some of the allocations
in the nesting test do hit the pool) 739 nanoseconds:

   (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
   (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
   (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Jeff Cody
4f75b52a07 block: VHDX endian fixes
This patch contains several changes for endian conversion fixes for
VHDX, particularly for big-endian machines (multibyte values in VHDX are
all on disk in LE format).

Tests were done with existing qemu-iotests on an IBM POWER7 (8406-71Y).
This includes sample images created by Hyper-V, both with dirty logs and
without.

In addition, VHDX image files created (and written to) on a BE machine
were tested on a LE machine, and vice-versa.

Reported-by: Markus Armburster <armbru@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Jeff Cody
349592e0b9 block: vhdx - add error check
This add an error check for an invalid descriptor entry signature,
when flushing the log descriptor entries.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Stefan Hajnoczi
3c80ca158c thread-pool: avoid deadlock in nested aio_poll() calls
The thread pool has a race condition if two elements complete before
thread_pool_completion_bh() runs:

  If element A's callback waits for element B using aio_poll() it will
  deadlock since pool->completion_bh is not marked scheduled when the
  nested aio_poll() runs.

Fix this by marking the BH scheduled while thread_pool_completion_bh()
is executing.  This way any nested aio_poll() loops will enter
thread_pool_completion_bh() and complete the remaining elements.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Stefan Hajnoczi
c2e50e3d11 thread-pool: avoid per-thread-pool EventNotifier
EventNotifier is implemented using an eventfd or pipe.  It therefore
consumes file descriptors, which can be limited by rlimits and should
therefore be used sparingly.

Switch from EventNotifier to QEMUBH in thread-pool.c.  Originally
EventNotifier was used because qemu_bh_schedule() was not thread-safe
yet.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Stefan Hajnoczi
2a87151fb2 block: bump coroutine pool size for drives
When a BlockDriverState is associated with a storage controller
DeviceState we expect guest I/O.  Use this opportunity to bump the
coroutine pool size by 64.

This patch ensures that the coroutine pool size scales with the number
of drives attached to the guest.  It should increase coroutine pool
usage (which makes qemu_coroutine_create() fast) without hogging too
much memory when fewer drives are attached.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:14 +02:00
Stefan Hajnoczi
ac2662a913 coroutine: make pool size dynamic
Allow coroutine users to adjust the pool size.  For example, if the
guest has multiple emulated disk drives we should keep around more
coroutines.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
746ebfa77a qemu-iotests: add support for Archipelago protocol
Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
b1de5f439d QMP: Add support for Archipelago
Introduce new enum BlockdevOptionsArchipelago.

@volume:              #Name of the Archipelago volume image

@mport:               #'mport' is the port number on which mapperd is
                      listening. This is optional and if not specified,
                      QEMU will make Archipelago to use the default port.

@vport:               #'vport' is the port number on which vlmcd is
                      listening. This is optional and if not specified,
                      QEMU will make Archipelago to use the default port.

@segment:             #optional The name of the shared memory segment
                      Archipelago stack is using. This is optional
                      and if not specified, QEMU will make Archipelago
                      use the default value, 'archipelago'.

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
76d3d83a37 block/archipelago: Add support for creating images
qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>]
 [:segment=<segment_name>]] [size]

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
70537a8506 block/archipelago: Implement bdrv_parse_filename()
VM Image on Archipelago volume can also be specified like this:

file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][:
segment=<segment_name>]]

Examples:

file=archipelago:my_vm_volume
file=archipelago:my_vm_volume/mport=123
file=archipelago:my_vm_volume/mport=123:vport=1234
file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
c9a12e751b block: Support Archipelago as a QEMU block backend
VM Image on Archipelago volume is specified like this:

file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
file.vport=<vlmcd_port>][,file.segment=<segment_name>]]

'archipelago' is the protocol.

'mport' is the port number on which mapperd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'vport' is the port number on which vlmcd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'segment' is the name of the shared memory segment Archipelago stack is using.
This is optional and if not specified, QEMU will make Archipelago to use the
default value, 'archipelago'.

Examples:

file.driver=archipelago,file.volume=my_vm_volume
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234,file.segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Chunyan Liu
000c4dfff4 qemu-img info: show nocow info
Add nocow info in 'qemu-img info' output to show whether the file
currently has NOCOW flag set or not.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Fam Zheng
c6ac36e145 vmdk: Optimize cluster allocation
This drops the unnecessary bdrv_truncate() from, and also improves,
cluster allocation code path.

Before, when we need a new cluster, get_cluster_offset truncates the
image to bdrv_getlength() + cluster_size, and returns the offset of
added area, i.e. the image length before truncating.

This is not efficient, so it's now rewritten as:

  - Save the extent file length when opening.

  - When allocating cluster, use the saved length as cluster offset.

  - Don't truncate image, because we'll anyway write data there: just
    write any data at the EOF position, in descending priority:

    * New user data (cluster allocation happens in a write request).

    * Filling data in the beginning and/or ending of the new cluster, if
      not covered by user data: either backing file content (COW), or
      zero for standalone images.

One major benifit of this change is, on host mounted NFS images, even
over a fast network, ftruncate is slow (see the example below). This
change significantly speeds up cluster allocation. Comparing by
converting a cirros image (296M) to VMDK on an NFS mount point, over
1Gbe LAN:

    $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk

    Before:
        real    0m21.796s
        user    0m0.130s
        sys     0m0.483s

    After:
        real    0m2.017s
        user    0m0.047s
        sys     0m0.190s

We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and
get a little more documentation in function comments.

Tested that this passes qemu-iotests for all VMDK subformats.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Fam Zheng
a8d8a1a06c qemu-iotests: Add data pattern in version3 VMDK sample image in 059
It's possible that we diverge from the specification with our
implementation.  Having a reference image in the test cases may detect
such problems when we introduce a bug that can read what it creates, but
can't handle a real VMDK.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Stefan Hajnoczi
ef523587da qdev-monitor: include QOM properties in -device FOO, help output
Update -device FOO,help to include QOM properties in addition to qdev
properties.  Devices are gradually adding more QOM properties that are
not reflected as qdev properties.

It is important to report all device properties since management tools
like libvirt use this information (and device-list-properties QMP) to
detect the presence of QEMU features.

This patch reuses the device-list-properties QMP machinery to avoid code
duplication.

Reported-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
2014-08-15 15:07:13 +02:00
Stefan Hajnoczi
4115dd6527 qmp: hide "hotplugged" device property from device-list-properties
The "hotplugged" device property was not reported before commit
f4eb32b590 ("qmp: show QOM properties in
device-list-properties").  Fix this difference.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:13 +02:00
Stefan Hajnoczi
ef558696b5 docs/multiple-iothreads.txt: add documentation on IOThread programming
This document explains how IOThreads and the main loop are related,
especially how to write code that can run in an IOThread.  Currently
only virtio-blk-data-plane uses these techniques.  The next obvious
target is virtio-scsi; there has also been work on virtio-net.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:13 +02:00
Gonglei (Arei)
8cced12143 xen_disk: fix possible null-ptr dereference
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Hu Tao
8efc936336 configure: explicitly state version requirements to devel packages
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Maria Kustova
8e436ec1f3 docs: Make the recommendation for the backing file name position a requirement
The current version of the qcow2 specification recommends to save the backing
file name in the end of the first cluster. It follows that the backing file
name can be saved somewhere in the image, but the first cluster, which
contradicts the current QEMU implementation.

The patch makes the backing file name required to be placed after the header
extensions in the first image cluster.

Signed-off-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
52bf1e722d block: Avoid bdrv_get_geometry() where errors should be detected
bdrv_get_geometry() hides errors.  Use bdrv_nb_sectors() or
bdrv_getlength() instead where that's obviously inappropriate.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
d739f1c410 qemu-img: Make img_convert() get image size just once per image
Chiefly so I don't have to do the error checking in quadruplicate in
the next commit.  Moreover, replacing the frequently updated
bs_sectors by an array assigned just once makes the code easier to
understand.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
75d3d21f9e block: Drop superfluous aligning of bdrv_getlength()'s value
It returns a multiple of the sector size.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
57322b7811 block: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().

Aside: a few of these callers don't handle errors.  I didn't
investigate whether they should.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
43716fa805 block: Use bdrv_nb_sectors() in img_convert()
Instead of bdrv_getlength().  Replace variable output_length by
output_sectors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
30a7f2fc91 block: Use bdrv_nb_sectors() in bdrv_co_get_block_status()
Instead of bdrv_getlength().

Replace variables length, length2 by total_sectors, nb_sectors2.
Bonus: use total_sectors instead of the slightly unclean
bs->total_sectors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
4049082c4b block: Use bdrv_nb_sectors() in bdrv_aligned_preadv()
Instead of bdrv_getlength().  Eliminate variable len.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
d32f7c101b block: Use bdrv_nb_sectors() in bdrv_make_zero()
Instead of bdrv_getlength().

Variable target_size is initially in bytes, then changes meaning to
sectors.  Ugh.  Replace by target_sectors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
65a9bb25d6 block: New bdrv_nb_sectors()
A call to retrieve the image size converts between bytes and sectors
several times:

* BlockDriver method bdrv_getlength() returns bytes.

* refresh_total_sectors() converts to sectors, rounding up, and stores
  in total_sectors.

* bdrv_getlength() converts total_sectors back to bytes (now rounded
  up to a multiple of the sector size).

* Callers wanting sectors rather bytes convert it right back.
  Example: bdrv_get_geometry().

bdrv_nb_sectors() provides a way to omit the last two conversions.
It's exactly bdrv_getlength() with the conversion to bytes omitted.
It's functionally like bdrv_get_geometry() without its odd error
handling.

Reimplement bdrv_getlength() and bdrv_get_geometry() on top of
bdrv_nb_sectors().

The next patches will convert some users of bdrv_getlength() to
bdrv_nb_sectors().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:12 +02:00
Peter Maydell
f083201667 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-09' into staging
trivial patches for 2014-08-09

# gpg: Signature made Fri 08 Aug 2014 21:36:44 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514  66A7 BEE5 9D74 A4C3 D7DB

* remotes/mjt/tags/trivial-patches-2014-08-09:
  build-sys: Move qapi-{types, visit, event}.o into util-obj-y
  po: Add Chinese translation
  qemu-img: Check getchar() return value in read_password() for WIN32
  hw/timer: Move extern declaration from .c to .h file
  virtio: Move extern declaration to header file
  Show length mismatch error is hex
  target-i386/cpu.c: Fix two error output indentation
  l2tpv3 (configure): it is linux-specific
  hw/timer/imx_*: fix TIMER_MAX clash with system symbol

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-15 13:41:55 +01:00
Markus Armbruster
260cb1c409 pc: Get rid of pci-info leftovers
pc_fw_cfg_guest_info() never does anything, because has_pci_info is
always false.

Introduced in commit f8c457b "pc: pass PCI hole ranges to Guests",
disabled in commit 9604f70 "pc: disable pci-info for 1.6", and hasn't
been enabled since.  Obviously a dead end.  Get of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Gabriel L. Somlo
9616c29045 e1000: use symbolic constants to init phy ctrl & status registers
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Gabriel L. Somlo
1195fed9e6 e1000: correctly handle phy_ctrl reserved & self-clearing bits
Make phyreg_writeops responsible for actually writing their
respective phy registers, rather than rely on set_mdic() to
do it on their behalf.

The only current instance of phyreg_writeops is set_phy_ctrl();
modify it to write the register on its own, while also correctly
handling reserved and self-clearing bits.

have_autoneg() does not need to check for MII_CR_RESTART_AUTO_NEG,
since the only time the flag comes into play is during set_phy_ctrl(),
and, following this patch, never actually gets written to the phy
control register.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Levente Kurusa
7f9efb6b80 ivshmem: fix building when debug mode is enabled
ivsmem_offset was removed, however this debug statement was not updated.
Modify the statement to fit the new mechanic.

Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Michael S. Tsirkin
d67aadccfa acpi: align RSDP
RSDP should be aligned at a 16-byte boundary.
This would by chance at the moment, fix up acpi build
to make it robust.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-08-14 13:22:16 +02:00
Hu Tao
c68233aee8 numa: show hex number in error message for consistency and prefix them with 0x
The error messages before and after patch are:

before:
qemu-system-x86_64: total memory for NUMA nodes (134217728) should equal RAM size (20000000)

after:
qemu-system-x86_64: total memory for NUMA nodes (0x8000000) should equal RAM size (0x20000000)

Cc: qemu-stable@nongnu.org
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:07 +02:00
Michael S. Tsirkin
988eba0f68 pc-dimm: fix up error message
- int should be printed using %d
- print actual wrong value for property

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:00 +02:00
Hu Tao
cfe0ffd027 pc-dimm: validate node property
If user specifies a node number that exceeds the available numa nodes in
emulated system for pc-dimm device, the device will report an invalid _PXM
to OSPM. Fix this by checking the node property value.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:20:59 +02:00
Hu Tao
41d2f71376 hw:i386: typo fix: MEMORY_HOPTLUG_DEVICE -> MEMORY_HOTPLUG_DEVICE
Cc: qemu-stable@nongnu.org
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:20:49 +02:00
Jan Kiszka
d209c7440a hw/audio/intel-hda: Fix MSI capability address
According to ICH9 spec, the MSI capability is located at 0x60. This is
important for guest drivers that do not parse the capability chain and
use absolute addresses instead.

CC: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:20:49 +02:00
Jan Kiszka
f9f218730c pc: Create 2.2 machine type
Yet identical to 2.1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:20:49 +02:00
Jan Kiszka
cc943c36fa pci: Use bus master address space for delivering MSI/MSI-X messages
The spec says (and real HW confirms this) that, if the bus master bit
is 0, the device will not generate any PCI accesses. MSI and MSI-X
messages fall among these, so we should use the corresponding address
space to deliver them. This will prevent delivery if bus master support
is disabled.

Cc: qemu-stable@nongnu.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:20:33 +02:00
Amit Shah
4ac4458076 virtio-rng: add some trace events
Add some trace events to virtio-rng for easier debugging

Signed-off-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:29:55 +01:00
Alex Bennée
6db8b53866 trace: add some tcg tracing support
This adds a couple of tcg specific trace-events which are useful for
tracing execution though tcg generated blocks. It's been tested with
lttng user space tracing but is generic enough for all systems. The tcg
events are:

  * translate_block - when a subject block is translated
  * exec_tb - when a translated block is entered
  * exec_tb_exit - when we exit the translated code
  * exec_tb_nocache - special case translations

Of course we can only trace the entrance to the first block of a chain
as each block will jump directly to the next when it can. See the -d
nochain patch to allow more complete tracing at the expense of
performance.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Alex Bennée
41ef7b00ab trace: teach lttng backend to use format strings
This makes the UST backend pay attention to the format string arguments
that are defined when defining payload data. With this you can now
ensure integers are reported in hex mode if you want.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
a7e30d84ce trace: [tcg] Include TCG-tracing header on all targets
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
85d8bf2f36 trace: [tcg] Include event definitions in "trace.h"
Otherwise the user has to explicitly include an auto-generated header.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
465830fbd9 trace: [tcg] Generate TCG tracing routines
Generate header "trace/generated-tcg-tracers.h" with the necessary routines for
tracing events in guest code:

* trace_${event}_tcg

  Convenience wrapper that calls the translation-time tracer
  'trace_${event}_trans', and calls 'gen_helper_trace_${event}_exec to
  generate the TCG code to later trace the event at execution time.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
76b53aa324 trace: [tcg] Include TCG-tracing helpers
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
f4654226d4 trace: [tcg] Define TCG tracing helper routine wrappers
Generates header "trace/generated-helpers-wrappers.h" with definitions for TCG
helper wrappers.

These wrappers ('gen_helper_trace_${event}_exec_wrapper') transform mixed native
and TCG argument types to TCG types and call the actual TCG helpers
('gen_helper_trace_${event}_exec_proxy').

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
341ea69185 trace: [tcg] Define TCG tracing helper routines
Generates file "trace/generated-helpers.c" with TCG helper definitions to trace
events in guest code at execution time.

The helpers ('helper_trace_${event}_exec_proxy') cast the TCG-compatible native
argument types to their original types (as defined in "trace-events") and call
the tracing routine ('trace_${event}_exec').

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
707c8a98e4 trace: [tcg] Declare TCG tracing helper routines
Generates file "trace/generated-helpers.h" with TCG helper declarations to trace
events in guest code at execution time ('trace_${event}_exec_proxy').

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:12 +01:00
Lluís Vilanova
b2b36c22bd trace: [tcg] Add 'tcg' event property
Transforms event:

  tcg name(...) "...", "..."

into two internal events:

  tcg-trans name_trans(...) "..."
  tcg-exec name_exec(...) "..."

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Lluís Vilanova
b55835ac10 trace: [tcg] Argument type transformation machinery
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Lluís Vilanova
e6d6c4bebf trace: [tcg] Argument type transformation rules
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Lluís Vilanova
0bb403b0ae trace: [tcg] Add documentation
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Stefan Hajnoczi
e0b2fd0efb trace: install simpletrace SystemTap tapset
The simpletrace SystemTap tapset outputs simpletrace binary traces for
SystemTap probes.  This is useful because SystemTap has no default way
to format or store traces.  The simpletrace SystemTap tapset provides an
easy way to store traces.

The simpletrace.py tool or custom Python scripts using the
simpletrace.py API can analyze SystemTap these traces:

  $ ./configure --enable-trace-backends=dtrace ...
  $ make && make install
  $ stap -e 'probe qemu.system.x86_64.simpletrace.* {}' \
         -c qemu-system-x86_64 >/tmp/trace.out
  $ scripts/simpletrace.py --no-header trace-events /tmp/trace.out
  g_malloc 4.531 pid=15519 size=0xb ptr=0x7f8639c10470
  g_malloc 3.264 pid=15519 size=0x300 ptr=0x7f8639c10490
  g_free 5.155 pid=15519 ptr=0x7f8639c0f7b0

Note that, unlike qemu-system-x86_64.stp and
qemu-system-x86_64.stp-installed, only one file is needed since the
simpletrace SystemTap tapset does not reference the QEMU binary by path.
Therefore it doesn't matter whether the QEMU binary is installed or not.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Stefan Hajnoczi
15327c3df0 simpletrace: add simpletrace.py --no-header option
It can be useful to read simpletrace files that have no header.  For
example, a ring buffer may not have a header record but can still be
processed if the user is sure the file format version is compatible.

  $ scripts/simpletrace.py --no-header trace-events trace-file

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Stefan Hajnoczi
3f8b112d6b trace: add tracetool simpletrace_stap format
This new tracetool "format" generates a SystemTap .stp file that outputs
simpletrace binary trace data.

In contrast to simpletrace or ftrace, SystemTap does not define its own
trace format.  All output from SystemTap is generated by .stp files.
This patch lets us generate a .stp file that outputs in the simpletrace
binary format.

This makes it possible to reuse simpletrace.py to analyze traces
recorded using SystemTap.  The simpletrace binary format is especially
useful for long-running traces like flight-recorder mode where string
formatting can be expensive.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Stefan Hajnoczi
a76ccf3c1c trace: extract stap_escape() function for reuse
SystemTap reserved words sometimes conflict with QEMU variable names.
We escape them to prevent conflicts.

Move escaping into its own function so the next patch can reuse it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-12 14:26:11 +01:00
Fam Zheng
169a24aea4 build-sys: Move qapi-{types, visit, event}.o into util-obj-y
These three objects are repeated in multiple times in Makefiles. Let's
just add them to libqemuutil.a, and don't list explicitly elsewhere.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:09:17 +04:00
Fam Zheng
90bda0823a po: Add Chinese translation
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Reviewed-by: Dongsheng Song <songdongsheng@live.cn>
Reviewed-by: Wei Huang <wehuang@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:41 +04:00
Chen Gang
fdcf6e65bc qemu-img: Check getchar() return value in read_password() for WIN32
getchar() is a standard c library function which may return with failure
(e.g. -1), so like another platforms, also need check it under WIN32.

And make the related code match current qemu code styles, too.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Stefan Weil
f13bef9592 hw/timer: Move extern declaration from .c to .h file
This fixes a warning from smatch (static code analyser).

Fix also the comment with the renamed source file name.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

 hw/timer/tusb6010.c |    3 ---
 include/hw/usb.h    |    7 ++++++-
 2 files changed, 6 insertions(+), 4 deletions(-)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Stefan Weil
0f03fb6094 virtio: Move extern declaration to header file
This fixes a warning from smatch (static code analyser).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Alex Bligh
a3f1f040d2 Show length mismatch error is hex
When live migrate fails due to a section length mismatch we currently
see an error message like:

Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000

The section lengths are in fact in hex, so this should read

Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x20000

Correct the error string to reflect this.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
chenfan
5bb4c35dca target-i386/cpu.c: Fix two error output indentation
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Michael Tokarev
bff6cb7296 l2tpv3 (configure): it is linux-specific
Some non-linux systems, for example a system with
FreeBSD kernel and glibc, may declare struct mmsghdr
(in glibc) but may not have linux-specific header
file linux/ip.h.  The actual implementation in qemu
includes this linux-specific header file unconditionally,
so compilation fails if it is not present.  Include
this header in the configure test too.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Michael Tokarev
203d65a470 hw/timer/imx_*: fix TIMER_MAX clash with system symbol
The symbol TIMER_MAX used in imx_epit.c and imx_gpt.c
clashes with system symbol with the same name.  Because
all qemu source files includes qemu-common.h which, in
turn, includes limits.h, which is not unusual to define
it.  Rename local symbol to have a reasonable prefix.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-09 00:06:32 +04:00
Peter Maydell
2d591ce2ae Merge remote-tracking branch 'remotes/mdroth/qga-pull-2014-08-08' into staging
* remotes/mdroth/qga-pull-2014-08-08:
  qga: Disable unsupported commands by default
  qga: Add guest-get-fsinfo command
  qga: Add guest-fsfreeze-freeze-list command

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-08 14:16:05 +01:00
Tomoki Sekiyama
1281c08a46 qga: Disable unsupported commands by default
Currently management softwares cannot know whether a qemu-ga command is
supported or not on the running platform until they actually execute it.
This patch disables unsupported commands at launch time of qemu-ga, so that
management softwares can check whether they are supported from 'enabled'
property of the result from 'guest-info' command.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2014-08-07 17:15:53 -05:00
Tomoki Sekiyama
46d4c5723e qga: Add guest-get-fsinfo command
Add command to get mounted filesystems information in the guest.
The returned value contains a list of mountpoint paths and
corresponding disks info such as disk bus type, drive address,
and the disk controllers' PCI addresses, so that management layer
such as libvirt can resolve the disk backends.

For example, when `lsblk' result is:

    NAME           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sdb              8:16   0    1G  0 disk
    `-sdb1           8:17   0 1024M  0 part
      `-vg0-lv0    253:1    0  1.4G  0 lvm  /mnt/test
    sdc              8:32   0    1G  0 disk
    `-sdc1           8:33   0  512M  0 part
      `-vg0-lv0    253:1    0  1.4G  0 lvm  /mnt/test
    vda            252:0    0   25G  0 disk
    `-vda1         252:1    0   25G  0 part /

where sdb is a SCSI disk with PCI controller 0000:00:0a.0 and ID=1,
      sdc is an IDE disk with PCI controller 0000:00:01.1, and
      vda is a virtio-blk disk with PCI device 0000:00:06.0,

guest-get-fsinfo command will return the following result:

    {"return":
     [{"name":"dm-1",
       "mountpoint":"/mnt/test",
       "disk":[
        {"bus-type":"scsi","bus":0,"unit":1,"target":0,
         "pci-controller":{"bus":0,"slot":10,"domain":0,"function":0}},
        {"bus-type":"ide","bus":0,"unit":0,"target":0,
         "pci-controller":{"bus":0,"slot":1,"domain":0,"function":1}}],
       "type":"xfs"},
      {"name":"vda1", "mountpoint":"/",
       "disk":[
        {"bus-type":"virtio","bus":0,"unit":0,"target":0,
         "pci-controller":{"bus":0,"slot":6,"domain":0,"function":0}}],
       "type":"ext4"}]}

In Linux guest, the disk information is resolved from sysfs. So far,
it only supports virtio-blk, virtio-scsi, IDE, SATA, SCSI disks on x86
hosts, and "disk" parameter may be empty for unsupported disk types.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>

*updated schema to report 2.2 as initial supported version

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2014-08-07 17:15:14 -05:00
Tomoki Sekiyama
e99bce2021 qga: Add guest-fsfreeze-freeze-list command
If an array of mount point paths is specified as 'mountpoints' argument
of guest-fsfreeze-freeze-list, qemu-ga will only freeze the file systems
mounted on specified paths in Linux guests. Otherwise, it works as the
same way as guest-fsfreeze-freeze.
This would be useful when the host wants to create partial disk snapshots.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

*updated schema to report 2.2 as initial supported version

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2014-08-07 17:13:10 -05:00
Peter Maydell
2ee55b8351 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
KVM changes include a MIPS patch and the testdev backend used by the
ARM kvm-unit-tests.  icount include the first part of reverse execution
and Sebastian Tanase's patches to slow down -icount execution to the
desired speed of the target.

v1->v2: fix dump_drift_info to print nothing outside icount mode,
        and to compile on 32-bit architectures

# gpg: Signature made Thu 07 Aug 2014 14:09:58 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream:
  target-mips: Ignore unassigned accesses with KVM
  monitor: Add drift info to 'info jit'
  cpu-exec: Print to console if the guest is late
  cpu-exec: Add sleeping algorithm
  icount: Add align option to icount
  icount: Add QemuOpts for icount
  icount: Fix virtual clock start value on ARM
  timer: add cpu_icount_to_ns function.
  migration: migrate icount fields.
  icount: put icount variables into TimerState.
  backends: Introduce chr-testdev

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-07 14:54:47 +01:00
James Hogan
eddedd546a target-mips: Ignore unassigned accesses with KVM
MIPS registers an unassigned access handler which raises a guest bus
error exception. However this causes QEMU to crash when KVM is enabled
as it isn't called from the main execution loop so longjmp() gets called
without a corresponding setjmp().

Until the KVM API can be updated to trigger a guest exception in
response to an MMIO exit, prevent the bus error exception being raised
from mips_cpu_unassigned_access() if KVM is enabled.

The check is at run time since the do_unassigned_access callback is
initialised before it is known whether KVM will be enabled.

The problem can be triggered with Malta emulation by making the guest
write to the reset region at physical address 0x1bf00000, since it is
marked read-only which is treated as unassigned for writes.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-07 15:09:48 +02:00
Sebastian Tanase
27498bef35 monitor: Add drift info to 'info jit'
Show in 'info jit' the current delay between the host clock
and the guest clock. In addition, print the maximum advance
and delay of the guest compared to the host.

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-07 15:09:48 +02:00
Peter Maydell
9d8bb35574 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-pci-for-qemu-20140805.0' into staging
VFIO patches: Fix MSI-X vector expansion, remove MSI/X message caching

# gpg: Signature made Tue 05 Aug 2014 20:25:57 BST using RSA key ID 3BB08B22
# gpg: Can't check signature: public key not found

* remotes/awilliam/tags/vfio-pci-for-qemu-20140805.0:
  vfio: Don't cache MSIMessage
  vfio: Fix MSI-X vector expansion

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-07 11:30:38 +01:00
Sebastian Tanase
7f7bc144ed cpu-exec: Print to console if the guest is late
If the align option is enabled, we print to the user whenever
the guest clock is behind the host clock in order for he/she
to have a hint about the actual performance. The maximum
print interval is 2s and we limit the number of messages to 100.
If desired, this can be changed in cpu-exec.c

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
Sebastian Tanase
c2aa5f8199 cpu-exec: Add sleeping algorithm
The goal is to sleep qemu whenever the guest clock
is in advance compared to the host clock (we use
the monotonic clocks). The amount of time to sleep
is calculated in the execution loop in cpu_exec.

At first, we tried to approximate at each for loop the real time elapsed
while searching for a TB (generating or retrieving from cache) and
executing it. We would then approximate the virtual time corresponding
to the number of virtual instructions executed. The difference between
these 2 values would allow us to know if the guest is in advance or delayed.
However, the function used for measuring the real time
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
We had an added overhead of 13% of the total run time.

Therefore, we modified the algorithm and only take into account the
difference between the 2 clocks at the begining of the cpu_exec function.
During the for loop we try to reduce the advance of the guest only by
computing the virtual time elapsed and sleeping if necessary. The overhead
is thus reduced to 3%. Even though this method still has a noticeable
overhead, it no longer is a bottleneck in trying to achieve a better
guest frequency for which the guest clock is faster than the host one.

As for the the alignement of the 2 clocks, with the first algorithm
the guest clock was oscillating between -1 and 1ms compared to the host clock.
Using the second algorithm we notice that the guest is 5ms behind the host, which
is still acceptable for our use case.

The tests where conducted using fio and stress. The host machine in an i5 CPU at
3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb
built with buildroot.

Currently, on our test machine, the lowest icount we can achieve that is suitable for
aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are
slower than the cpu tests (using stress).

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
Sebastian Tanase
a8bfac3708 icount: Add align option to icount
The align option is used for activating the align algorithm
in order to synchronise the host clock and the guest clock.

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
Sebastian Tanase
1ad9580bd7 icount: Add QemuOpts for icount
Make icount parameter use QemuOpts style options in order
to easily add other suboptions.

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
Sebastian Tanase
7146839505 icount: Fix virtual clock start value on ARM
When using the icount option on ARM, the virtual
clock starts counting at realtime clock but it
should start at 0.

The reason why the virtual clock starts at realtime clock
is because the first time we call qemu_clock_warp (which
calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias
(which is part of the virtual time computation mechanism)
will increment by realtime - vm_clock_warp_start, with
vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c).

By changing the value of vm_clock_warp_start from 0 to -1,
the first time we call qemu_clock_warp which calls
icount_warp_rt, we will return immediatly because
icount_warp_rt first checks if vm_clock_warp_start is -1
and if it's the case it returns. Therefore, qemu_icount_bias
will first be incremented by the value of a virtual timer
deadline when the virtual cpu goes from active to inactive.

The virtual time will start at 0 and increment based
on the instruction counter when the vcpu is active or
the qemu_icount_bias value when inactive.

Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
KONRAD Frederic
3f03131390 timer: add cpu_icount_to_ns function.
This adds cpu_icount_to_ns function which is needed for reverse execution.

It returns the time for a specific instruction.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
KONRAD Frederic
d09eae3726 migration: migrate icount fields.
This fixes a bug where qemu_icount and qemu_icount_bias are not migrated.
It adds a subsection "timer/icount" to vmstate_timers so icount is migrated only
when needed.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
KONRAD Frederic
c96778bb84 icount: put icount variables into TimerState.
This puts qemu_icount and qemu_icount_bias into TimerState structure to allow
them to be migrated.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:07 +02:00
Paolo Bonzini
5692399f0a backends: Introduce chr-testdev
From: Paolo Bonzini <pbonzini@redhat.com>

chr-testdev enables a virtio serial channel to be used for guest
initiated qemu exits. hw/misc/debugexit already enables guest
initiated qemu exits, but only for PC targets. chr-testdev supports
any virtio-capable target. kvm-unit-tests/arm is already making use
of this backend.

Currently there is a single command implemented, "q".  It takes a
(prefix) argument for the exit code, thus an exit is implemented by
writing, e.g. "1q", to the virtio-serial port.

It can be used as:
   $QEMU ... \
     -device virtio-serial-device \
     -device virtserialport,chardev=ctd -chardev testdev,id=ctd

or, use:
   $QEMU ... \
     -device virtio-serial-device \
     -device virtconsole,chardev=ctd -chardev testdev,id=ctd

to bind it to virtio-serial port0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-06 17:53:05 +02:00
Alex Williamson
9b3af4c0e4 vfio: Don't cache MSIMessage
Commit 40509f7f added a test to avoid updating KVM MSI routes when the
MSIMessage is unchanged and f4d45d47 switched to relying on this
rather than doing our own comparison.  Our cached msg is effectively
unused now.  Remove it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-05 13:05:57 -06:00
Alex Williamson
c048be5cc9 vfio: Fix MSI-X vector expansion
When new MSI-X vectors are enabled we need to disable MSI-X and
re-enable it with the correct number of vectors.  That means we need
to reprogram the eventfd triggers for each vector.  Prior to f4d45d47
vector->use tracked whether a vector was masked or unmasked and we
could always pick the KVM path when available for unmasked vectors.
Now vfio doesn't track mask state itself and vector->use and virq
remains configured even for masked vectors.  Therefore we need to ask
the MSI-X code whether a vector is masked in order to select the
correct signaling path.  As noted in the comment, MSI relies on
hardware to handle masking.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # QEMU 2.1
2014-08-05 13:05:52 -06:00
Peter Maydell
69f87f7130 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140804' into staging
target-arm queue:
 * Set PC correctly when loading AArch64 ELF files
 * sdhci: Fix ADMA dma_memory_read access
 * some more foundational work for EL2/EL3 support
 * fix bugs which reveal themselves if the TARGET_PAGE_SIZE
   is not set to 1K

# gpg: Signature made Mon 04 Aug 2014 14:51:34 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"

* remotes/pmaydell/tags/pull-target-arm-20140804:
  target-arm: A64: fix TLB flush instructions
  target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
  target-arm: Fix bit test in sp_el0_access
  target-arm: Add FAR_EL2 and 3
  target-arm: Add ESR_EL2 and 3
  target-arm: Make far_el1 an array
  target-arm: A64: Respect SPSEL when taking exceptions
  target-arm: A64: Respect SPSEL in ERET SP restore
  target-arm: A64: Break out aarch64_save/restore_sp
  sd: sdhci: Fix ADMA dma_memory_read access
  hw/arm/virt: formatting: memory map
  hw/arm/boot: Set PC correctly when loading AArch64 ELF files

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 15:01:38 +01:00
Alex Bennée
dbb1fb277c target-arm: A64: fix TLB flush instructions
According to the ARM ARM we weren't correctly flushing the TLB entries
where bits 63:56 didn't match bit 55 of the virtual address. This
exposed a problem when we switched QEMU's internal TARGET_PAGE_BITS to
12 for aarch64.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1406733627-24255-3-git-send-email-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:56 +01:00
Alex Bennée
dcd82c118c target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
Otherwise we break quickly when we change TARGET_PAGE_SIZE.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1406733627-24255-2-git-send-email-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:55 +01:00
Stefan Weil
cdcf14057d target-arm: Fix bit test in sp_el0_access
Static code analyzers complain about a dubious & operation used for a
boolean value. The code does not test the PSTATE_SP bit as it should.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1406359601-25583-1-git-send-email-sw@weilnetz.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:55 +01:00
Edgar E. Iglesias
63b60551a7 target-arm: Add FAR_EL2 and 3
Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1402994746-8328-7-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:55 +01:00
Edgar E. Iglesias
f2c30f42f5 target-arm: Add ESR_EL2 and 3
Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1402994746-8328-6-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:55 +01:00
Edgar E. Iglesias
2f0180c51b target-arm: Make far_el1 an array
No functional change.
Prepares for future additions of the EL2 and 3 versions of this reg.

Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1402994746-8328-5-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:54 +01:00
Edgar E. Iglesias
f151b123a3 target-arm: A64: Respect SPSEL when taking exceptions
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Message-id: 1402994746-8328-4-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:54 +01:00
Edgar E. Iglesias
98ea5615ab target-arm: A64: Respect SPSEL in ERET SP restore
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Message-id: 1402994746-8328-3-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:54 +01:00
Edgar E. Iglesias
9208b9617f target-arm: A64: Break out aarch64_save/restore_sp
Break out code to save/restore AArch64 SP into functions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
Message-id: 1402994746-8328-2-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:54 +01:00
Peter Crosthwaite
9db11cef8c sd: sdhci: Fix ADMA dma_memory_read access
This dma_memory_read was giving too big a size when begin was non-zero.
This could cause segfaults in some circumstances. Fix.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:54 +01:00
Andrew Jones
fab4693239 hw/arm/virt: formatting: memory map
Add some spacing and zeros to make it easier to read and
modify the map. This patch has no functional changes. The
review looks ugly, but it's actually pretty easy to confirm
all the addresses are as they should be - thanks to the new
formatting ;-)

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:53 +01:00
Peter Maydell
a9047ec3f6 hw/arm/boot: Set PC correctly when loading AArch64 ELF files
The code in do_cpu_reset() correctly handled AArch64 CPUs
when running Linux kernels, but was missing code in the
branch of the if() that deals with loading ELF files.
Correctly jump to the ELF entry point on reset rather than
leaving the reset PC at zero.

Reported-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Christopher Covington <cov@codeaurora.org>
Cc: qemu-stable@nongnu.org
2014-08-04 14:41:53 +01:00
Peter Maydell
cc11a0623a Merge remote-tracking branch 'remotes/amit-migration/for-2.2' into staging
* remotes/amit-migration/for-2.2:
  checker: ignore fields marked unused
  vmstate static checker: whitelist additions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 14:41:19 +01:00
Peter Maydell
924c09db51 Merge remote-tracking branch 'remotes/amit-virtio-rng/for-2.2' into staging
* remotes/amit-virtio-rng/for-2.2:
  virtio-rng: replace error_set calls with error_setg
  virtio-rng: Move error-checking forward to prevent memory leak

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 13:07:02 +01:00
Peter Maydell
7b13ff3f15 Merge remote-tracking branch 'remotes/sstabellini/xen-20140801' into staging
* remotes/sstabellini/xen-20140801:
  qemu: support xen hvm direct kernel boot
  tap-bsd: implement a FreeBSD only version of tap_open
  xen: fix usage of ENODATA

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-04 11:17:24 +01:00
Amit Shah
32ce1b4817 checker: ignore fields marked unused
While comparing qemu-1.0 json output with qemu-2.1, a few fields got
marked unused.  These need to be skipped over, and not flagged as
mismatches.

For handling unused fields, the exact number of bytes need to be skipped
over as the size of the unused field.

Currently, only the term "unused" is matched.  When more field names
turn up, this will have to be updated based on the whitelist matching
method to match more such terms.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-08-04 15:02:37 +05:30
John Snow
c617dd3b7e virtio-rng: replace error_set calls with error_setg
Under recommendation from Luiz Capitulino, we are changing
the error_set calls to error_setg while we are fixing up
the error handling pathways of virtio-rng.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-08-04 14:50:11 +05:30
John Snow
1efd6e072c virtio-rng: Move error-checking forward to prevent memory leak
This patch pushes the error-checking forward and the virtio
initialization backward in the device realization function
in order to prevent memory leaks for hot plug scenarios.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-08-04 14:49:53 +05:30
Peter Maydell
c79805802b Open 2.2 development tree
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-01 18:30:08 +01:00
Chunyan Liu
b33a5bbfba qemu: support xen hvm direct kernel boot
qemu side patch to support xen HVM direct kernel boot:
if -kernel exists, calls xen_load_linux(), which will read kernel/initrd
and add a linuxboot.bin or multiboot.bin option rom. The
linuxboot.bin/multiboot.bin will load kernel/initrd and jump to execute
kernel directly. It's working when xen uses seabios.

During this work, found the 'kvmvapic' is in option_rom list, it should
not be there in xen case. Set s->vapic_control = 0 in xen_apic_realize()
to handle that.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-01 15:58:12 +00:00
Roger Pau Monne
8677de2b4d tap-bsd: implement a FreeBSD only version of tap_open
The current behaviour of tap_open for BSD systems differ greatly from
it's Linux counterpart. Since FreeBSD supports interface renaming and
tap device cloning by opening /dev/tap, implement a FreeBSD specific
version of tap_open that behaves like it's Linux counterpart.

This is specially important for toolstacks that use Qemu (like Xen
libxl), in order to have a unified behaviour across suported
platforms.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-01 15:57:48 +00:00
Roger Pau Monne
74bc41511a xen: fix usage of ENODATA
ENODATA doesn't exist on FreeBSD, so ENODATA errors returned by the
hypervisor are translated to ENOENT.

Also, the error code is returned in errno if the call returns -1, so
compare the error code with the value in errno instead of the value
returned by the function.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
2014-08-01 15:57:28 +00:00
Paolo Bonzini
33cbb2c546 virtio-scsi: implement parse_cdb
Enable passthrough of vendor-specific commands.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-29 17:36:38 +02:00
Paolo Bonzini
3e7e180ab3 scsi-block, scsi-generic: implement parse_cdb
The callback lets the bus provide the direction and transfer count
for passthrough commands, enabling passthrough of vendor-specific
commands.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-29 17:36:33 +02:00
Paolo Bonzini
592c3b289f scsi-block: extract scsi_block_is_passthrough
This will be used for both scsi_block_new_request and the scsi-block
implementation of parse_cdb.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-29 17:36:29 +02:00
Paolo Bonzini
ff34c32ccc scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
These callbacks will let devices do their own request parsing, or
defer it to the bus.  If the bus does not provide an implementation,
in turn, fall back to the default parsing routine.

Swap the first two arguments to scsi_req_parse, and rename it to
scsi_req_parse_cdb, for consistency.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-29 17:36:25 +02:00
Paolo Bonzini
769998a1db scsi-bus: prepare scsi_req_new for introduction of parse_cdb
The per-SCSIDevice parse_cdb callback must not be called if the
request will go through special SCSIReqOps, so detect the special
cases early enough.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-29 17:36:09 +02:00
Amit Shah
bb9c3636d9 vmstate static checker: whitelist additions
Comparing json outputs from qemu-1.0 with qemu-2.1 turned up a few
description name changes; whitelist them here.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-07-22 17:06:54 +05:30
489 changed files with 20146 additions and 3019 deletions

4
.gitignore vendored
View File

@@ -11,6 +11,10 @@
/trace/generated-tracers.dtrace
/trace/generated-events.h
/trace/generated-events.c
/trace/generated-helpers-wrappers.h
/trace/generated-helpers.h
/trace/generated-helpers.c
/trace/generated-tcg-tracers.h
/trace/generated-ust-provider.h
/trace/generated-ust.c
/libcacard/trace/generated-tracers.c

View File

@@ -91,3 +91,17 @@ Mixed declarations (interleaving statements and declarations within blocks)
are not allowed; declarations should be at the beginning of blocks. In other
words, the code should not generate warnings if using GCC's
-Wdeclaration-after-statement option.
6. Conditional statements
When comparing a variable for (in)equality with a constant, list the
constant on the right, as in:
if (a == 1) {
/* Reads like: "If a equals 1" */
do_something();
}
Rationale: Yoda conditions (as in 'if (1 == a)') are awkward to read.
Besides, good compilers already warn users when '==' is mis-typed as '=',
even when the constant is on the right.

View File

@@ -161,6 +161,12 @@ S: Maintained
F: target-xtensa/
F: hw/xtensa/
TriCore
M: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
S: Maintained
F: target-tricore/
F: hw/tricore/
Guest CPU Cores (KVM):
----------------------
@@ -614,7 +620,7 @@ USB
M: Gerd Hoffmann <kraxel@redhat.com>
S: Maintained
F: hw/usb/*
F: tests/usb-hcd-ehci-test.c
F: tests/usb-*-test.c
VFIO
M: Alex Williamson <alex.williamson@redhat.com>
@@ -678,6 +684,12 @@ S: Maintained
F: hw/*/xilinx_*
F: include/hw/xilinx.h
Vmware
M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
F: hw/net/vmxnet*
F: hw/scsi/vmw_pvscsi*
Subsystems
----------
Audio
@@ -1000,3 +1012,9 @@ SSH
M: Richard W.M. Jones <rjones@redhat.com>
S: Supported
F: block/ssh.c
ARCHIPELAGO
M: Chrysostomos Nanakos <cnanakos@grnet.gr>
M: Chrysostomos Nanakos <chris@include.gr>
S: Maintained
F: block/archipelago.c

View File

@@ -57,6 +57,12 @@ GENERATED_HEADERS += trace/generated-tracers-dtrace.h
endif
GENERATED_SOURCES += trace/generated-tracers.c
GENERATED_HEADERS += trace/generated-tcg-tracers.h
GENERATED_HEADERS += trace/generated-helpers-wrappers.h
GENERATED_HEADERS += trace/generated-helpers.h
GENERATED_SOURCES += trace/generated-helpers.c
ifeq ($(findstring ust,$(TRACE_BACKENDS)),ust)
GENERATED_HEADERS += trace/generated-ust-provider.h
GENERATED_SOURCES += trace/generated-ust.c
@@ -202,7 +208,7 @@ Makefile: $(version-obj-y) $(version-lobj-y)
# Build libraries
libqemustub.a: $(stub-obj-y)
libqemuutil.a: $(util-obj-y) qapi-types.o qapi-visit.o qapi-event.o
libqemuutil.a: $(util-obj-y)
block-modules = $(foreach o,$(block-obj-m),"$(basename $(subst /,-,$o))",) NULL
util/module.o-cflags = -D'CONFIG_BLOCK_MODULES=$(block-modules)'

View File

@@ -1,7 +1,7 @@
#######################################################################
# Common libraries for tools and emulators
stub-obj-y = stubs/
util-obj-y = util/ qobject/ qapi/ trace/
util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
#######################################################################
# block-obj-y is code used by both qemu system emulation and qemu-img
@@ -12,7 +12,6 @@ block-obj-y += main-loop.o iohandler.o qemu-timer.o
block-obj-$(CONFIG_POSIX) += aio-posix.o
block-obj-$(CONFIG_WIN32) += aio-win32.o
block-obj-y += block/
block-obj-y += qapi-types.o qapi-visit.o qapi-event.o
block-obj-y += qemu-io-cmds.o
block-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
@@ -88,11 +87,6 @@ common-obj-y += qmp-marshal.o
common-obj-y += qmp.o hmp.o
endif
######################################################################
# some qapi visitors are used by both system and user emulation:
common-obj-y += qapi-visit.o qapi-types.o
#######################################################################
# Target-independent parts used in system and user emulation
common-obj-y += qemu-log.o
@@ -106,10 +100,15 @@ common-obj-y += disas/
version-obj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.o
version-lobj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.lo
######################################################################
# tracing
util-obj-y += trace/
target-obj-y += trace/
######################################################################
# guest agent
# FIXME: a few definitions from qapi-types.o/qapi-visit.o are needed
# by libqemuutil.a. These should be moved to a separate .json schema.
qga-obj-y = qga/ qapi-types.o qapi-visit.o
qga-obj-y = qga/
qga-vss-dll-obj-y = qga/

View File

@@ -38,7 +38,7 @@ config-target.h: config-target.h-timestamp
config-target.h-timestamp: config-target.mak
ifdef CONFIG_TRACE_SYSTEMTAP
stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp
stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
ifdef CONFIG_USER_ONLY
TARGET_TYPE=user
@@ -64,6 +64,13 @@ $(QEMU_PROG).stp: $(SRC_PATH)/trace-events
--target-type=$(TARGET_TYPE) \
< $< > $@," GEN $(TARGET_DIR)$(QEMU_PROG).stp")
$(QEMU_PROG)-simpletrace.stp: $(SRC_PATH)/trace-events
$(call quiet-command,$(TRACETOOL) \
--format=simpletrace-stap \
--backends=$(TRACE_BACKENDS) \
--probe-prefix=qemu.$(TARGET_TYPE).$(TARGET_NAME) \
< $< > $@," GEN $(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp")
else
stap:
endif
@@ -152,15 +159,20 @@ endif # CONFIG_SOFTMMU
dummy := $(call unnest-vars,,obj-y)
all-obj-y := $(obj-y)
target-obj-y :=
block-obj-y :=
common-obj-y :=
include $(SRC_PATH)/Makefile.objs
dummy := $(call unnest-vars,,target-obj-y)
target-obj-y-save := $(target-obj-y)
dummy := $(call unnest-vars,.., \
block-obj-y \
block-obj-m \
common-obj-y \
common-obj-m)
target-obj-y := $(target-obj-y-save)
all-obj-y += $(common-obj-y)
all-obj-y += $(target-obj-y)
all-obj-$(CONFIG_SOFTMMU) += $(block-obj-y)
# build either PROG or PROGW
@@ -191,6 +203,7 @@ endif
ifdef CONFIG_TRACE_SYSTEMTAP
$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset"
$(INSTALL_DATA) $(QEMU_PROG).stp-installed "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG).stp"
$(INSTALL_DATA) $(QEMU_PROG)-simpletrace.stp "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG)-simpletrace.stp"
endif
GENERATED_HEADERS += config-target.h

View File

@@ -1 +1 @@
2.1.0
2.1.50

View File

@@ -100,6 +100,11 @@ void aio_set_event_notifier(AioContext *ctx,
(IOHandler *)io_read, NULL, notifier);
}
bool aio_prepare(AioContext *ctx)
{
return false;
}
bool aio_pending(AioContext *ctx)
{
AioHandler *node;
@@ -119,11 +124,20 @@ bool aio_pending(AioContext *ctx)
return false;
}
static bool aio_dispatch(AioContext *ctx)
bool aio_dispatch(AioContext *ctx)
{
AioHandler *node;
bool progress = false;
/*
* If there are callbacks left that have been queued, we need to call them.
* Do not call select in this case, because it is possible that the caller
* does not need a complete flush (as is the case for aio_poll loops).
*/
if (aio_bh_poll(ctx)) {
progress = true;
}
/*
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
@@ -184,22 +198,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
/* aio_notify can avoid the expensive event_notifier_set if
* everything (file descriptors, bottom halves, timers) will
* be re-evaluated before the next blocking poll(). This happens
* in two cases:
*
* 1) when aio_poll is called with blocking == false
*
* 2) when we are called after poll(). If we are called before
* poll(), bottom halves will not be re-evaluated and we need
* aio_notify() if blocking == true.
*
* The first aio_dispatch() only does something when AioContext is
* running as a GSource, and in that case aio_poll is used only
* with blocking == false, so this optimization is already quite
* effective. However, the code is ugly and should be restructured
* to have a single aio_dispatch() call. To do this, we need to
* reorganize aio_poll into a prepare/poll/dispatch model like
* glib's.
* be re-evaluated before the next blocking poll(). This is
* already true when aio_poll is called with blocking == false;
* if blocking == true, it is only true after poll() returns.
*
* If we're in a nested event loop, ctx->dispatching might be true.
* In that case we can restore it just before returning, but we
@@ -207,26 +208,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
*/
aio_set_dispatching(ctx, !blocking);
/*
* If there are callbacks left that have been queued, we need to call them.
* Do not call select in this case, because it is possible that the caller
* does not need a complete flush (as is the case for aio_poll loops).
*/
if (aio_bh_poll(ctx)) {
blocking = false;
progress = true;
}
/* Re-evaluate condition (1) above. */
aio_set_dispatching(ctx, !blocking);
if (aio_dispatch(ctx)) {
progress = true;
}
if (progress && !blocking) {
goto out;
}
ctx->walking_handlers++;
g_array_set_size(ctx->pollfds, 0);
@@ -249,7 +230,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
/* wait until next event */
ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
ctx->pollfds->len,
blocking ? timerlistgroup_deadline_ns(&ctx->tlg) : 0);
blocking ? aio_compute_timeout(ctx) : 0);
/* if we have any readable fds, dispatch event */
if (ret > 0) {
@@ -268,7 +249,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
progress = true;
}
out:
aio_set_dispatching(ctx, was_dispatching);
return progress;
}

View File

@@ -22,12 +22,80 @@
struct AioHandler {
EventNotifier *e;
IOHandler *io_read;
IOHandler *io_write;
EventNotifierHandler *io_notify;
GPollFD pfd;
int deleted;
void *opaque;
QLIST_ENTRY(AioHandler) node;
};
void aio_set_fd_handler(AioContext *ctx,
int fd,
IOHandler *io_read,
IOHandler *io_write,
void *opaque)
{
/* fd is a SOCKET in our case */
AioHandler *node;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->pfd.fd == fd && !node->deleted) {
break;
}
}
/* Are we deleting the fd handler? */
if (!io_read && !io_write) {
if (node) {
/* If the lock is held, just mark the node as deleted */
if (ctx->walking_handlers) {
node->deleted = 1;
node->pfd.revents = 0;
} else {
/* Otherwise, delete it for real. We can't just mark it as
* deleted because deleted nodes are only cleaned up after
* releasing the walking_handlers lock.
*/
QLIST_REMOVE(node, node);
g_free(node);
}
}
} else {
HANDLE event;
if (node == NULL) {
/* Alloc and insert if it's not already there */
node = g_malloc0(sizeof(AioHandler));
node->pfd.fd = fd;
QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
}
node->pfd.events = 0;
if (node->io_read) {
node->pfd.events |= G_IO_IN;
}
if (node->io_write) {
node->pfd.events |= G_IO_OUT;
}
node->e = &ctx->notifier;
/* Update handler with latest information */
node->opaque = opaque;
node->io_read = io_read;
node->io_write = io_write;
event = event_notifier_get_handle(&ctx->notifier);
WSAEventSelect(node->pfd.fd, event,
FD_READ | FD_ACCEPT | FD_CLOSE |
FD_CONNECT | FD_WRITE | FD_OOB);
}
aio_notify(ctx);
}
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
EventNotifierHandler *io_notify)
@@ -76,6 +144,43 @@ void aio_set_event_notifier(AioContext *ctx,
aio_notify(ctx);
}
bool aio_prepare(AioContext *ctx)
{
static struct timeval tv0;
AioHandler *node;
bool have_select_revents = false;
fd_set rfds, wfds;
/* fill fd sets */
FD_ZERO(&rfds);
FD_ZERO(&wfds);
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->io_read) {
FD_SET ((SOCKET)node->pfd.fd, &rfds);
}
if (node->io_write) {
FD_SET ((SOCKET)node->pfd.fd, &wfds);
}
}
if (select(0, &rfds, &wfds, NULL, &tv0) > 0) {
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
node->pfd.revents = 0;
if (FD_ISSET(node->pfd.fd, &rfds)) {
node->pfd.revents |= G_IO_IN;
have_select_revents = true;
}
if (FD_ISSET(node->pfd.fd, &wfds)) {
node->pfd.revents |= G_IO_OUT;
have_select_revents = true;
}
}
}
return have_select_revents;
}
bool aio_pending(AioContext *ctx)
{
AioHandler *node;
@@ -84,47 +189,37 @@ bool aio_pending(AioContext *ctx)
if (node->pfd.revents && node->io_notify) {
return true;
}
if ((node->pfd.revents & G_IO_IN) && node->io_read) {
return true;
}
if ((node->pfd.revents & G_IO_OUT) && node->io_write) {
return true;
}
}
return false;
}
bool aio_poll(AioContext *ctx, bool blocking)
static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
{
AioHandler *node;
HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
bool progress;
int count;
int timeout;
progress = false;
bool progress = false;
/*
* If there are callbacks left that have been queued, we need to call then.
* Do not call select in this case, because it is possible that the caller
* does not need a complete flush (as is the case for aio_poll loops).
*/
if (aio_bh_poll(ctx)) {
blocking = false;
progress = true;
}
/* Run timers */
progress |= timerlistgroup_run_timers(&ctx->tlg);
/*
* Then dispatch any pending callbacks from the GSource.
*
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
node = QLIST_FIRST(&ctx->aio_handlers);
while (node) {
AioHandler *tmp;
int revents = node->pfd.revents;
ctx->walking_handlers++;
if (node->pfd.revents && node->io_notify) {
if (!node->deleted &&
(revents || event_notifier_get_handle(node->e) == event) &&
node->io_notify) {
node->pfd.revents = 0;
node->io_notify(node->e);
@@ -134,6 +229,28 @@ bool aio_poll(AioContext *ctx, bool blocking)
}
}
if (!node->deleted &&
(node->io_read || node->io_write)) {
node->pfd.revents = 0;
if ((revents & G_IO_IN) && node->io_read) {
node->io_read(node->opaque);
progress = true;
}
if ((revents & G_IO_OUT) && node->io_write) {
node->io_write(node->opaque);
progress = true;
}
/* if the next select() will return an event, we have progressed */
if (event == event_notifier_get_handle(&ctx->notifier)) {
WSANETWORKEVENTS ev;
WSAEnumNetworkEvents(node->pfd.fd, event, &ev);
if (ev.lNetworkEvents) {
progress = true;
}
}
}
tmp = node;
node = QLIST_NEXT(node, node);
@@ -145,10 +262,47 @@ bool aio_poll(AioContext *ctx, bool blocking)
}
}
if (progress && !blocking) {
return true;
return progress;
}
bool aio_dispatch(AioContext *ctx)
{
bool progress;
progress = aio_bh_poll(ctx);
progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress;
}
bool aio_poll(AioContext *ctx, bool blocking)
{
AioHandler *node;
HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
bool was_dispatching, progress, have_select_revents, first;
int count;
int timeout;
if (aio_prepare(ctx)) {
blocking = false;
have_select_revents = true;
}
was_dispatching = ctx->dispatching;
progress = false;
/* aio_notify can avoid the expensive event_notifier_set if
* everything (file descriptors, bottom halves, timers) will
* be re-evaluated before the next blocking poll(). This is
* already true when aio_poll is called with blocking == false;
* if blocking == true, it is only true after poll() returns.
*
* If we're in a nested event loop, ctx->dispatching might be true.
* In that case we can restore it just before returning, but we
* have to clear it now.
*/
aio_set_dispatching(ctx, !blocking);
ctx->walking_handlers++;
/* fill fd sets */
@@ -160,64 +314,42 @@ bool aio_poll(AioContext *ctx, bool blocking)
}
ctx->walking_handlers--;
first = true;
/* wait until next event */
while (count > 0) {
HANDLE event;
int ret;
timeout = blocking ?
qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg)) : 0;
timeout = blocking
? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
ret = WaitForMultipleObjects(count, events, FALSE, timeout);
aio_set_dispatching(ctx, true);
if (first && aio_bh_poll(ctx)) {
progress = true;
}
first = false;
/* if we have any signaled events, dispatch event */
if ((DWORD) (ret - WAIT_OBJECT_0) >= count) {
event = NULL;
if ((DWORD) (ret - WAIT_OBJECT_0) < count) {
event = events[ret - WAIT_OBJECT_0];
} else if (!have_select_revents) {
break;
}
have_select_revents = false;
blocking = false;
/* we have to walk very carefully in case
* aio_set_fd_handler is called while we're walking */
node = QLIST_FIRST(&ctx->aio_handlers);
while (node) {
AioHandler *tmp;
ctx->walking_handlers++;
if (!node->deleted &&
event_notifier_get_handle(node->e) == events[ret - WAIT_OBJECT_0] &&
node->io_notify) {
node->io_notify(node->e);
/* aio_notify() does not count as progress */
if (node->e != &ctx->notifier) {
progress = true;
}
}
tmp = node;
node = QLIST_NEXT(node, node);
ctx->walking_handlers--;
if (!ctx->walking_handlers && tmp->deleted) {
QLIST_REMOVE(tmp, node);
g_free(tmp);
}
}
progress |= aio_dispatch_handlers(ctx, event);
/* Try again, but only call each handler once. */
events[ret - WAIT_OBJECT_0] = events[--count];
}
if (blocking) {
/* Run the timers a second time. We do this because otherwise aio_wait
* will not note progress - and will stop a drain early - if we have
* a timer that was not ready to run entering g_poll but is ready
* after g_poll. This will only do anything if a timer has expired.
*/
progress |= timerlistgroup_run_timers(&ctx->tlg);
}
progress |= timerlistgroup_run_timers(&ctx->tlg);
aio_set_dispatching(ctx, was_dispatching);
return progress;
}

View File

@@ -104,6 +104,8 @@ int graphic_depth = 32;
#define QEMU_ARCH QEMU_ARCH_XTENSA
#elif defined(TARGET_UNICORE32)
#define QEMU_ARCH QEMU_ARCH_UNICORE32
#elif defined(TARGET_TRICORE)
#define QEMU_ARCH QEMU_ARCH_TRICORE
#endif
const uint32_t arch_type = QEMU_ARCH;
@@ -1072,8 +1074,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
QTAILQ_FOREACH(block, &ram_list.blocks, next) {
if (!strncmp(id, block->idstr, sizeof(id))) {
if (block->length != length) {
error_report("Length mismatch: %s: " RAM_ADDR_FMT
" in != " RAM_ADDR_FMT, id, length,
error_report("Length mismatch: %s: 0x" RAM_ADDR_FMT
" in != 0x" RAM_ADDR_FMT, id, length,
block->length);
ret = -EINVAL;
}

39
async.c
View File

@@ -152,39 +152,48 @@ void qemu_bh_delete(QEMUBH *bh)
bh->deleted = 1;
}
static gboolean
aio_ctx_prepare(GSource *source, gint *timeout)
int64_t
aio_compute_timeout(AioContext *ctx)
{
AioContext *ctx = (AioContext *) source;
int64_t deadline;
int timeout = -1;
QEMUBH *bh;
int deadline;
/* We assume there is no timeout already supplied */
*timeout = -1;
for (bh = ctx->first_bh; bh; bh = bh->next) {
if (!bh->deleted && bh->scheduled) {
if (bh->idle) {
/* idle bottom halves will be polled at least
* every 10ms */
*timeout = 10;
timeout = 10000000;
} else {
/* non-idle bottom halves will be executed
* immediately */
*timeout = 0;
return true;
return 0;
}
}
}
deadline = qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg));
deadline = timerlistgroup_deadline_ns(&ctx->tlg);
if (deadline == 0) {
*timeout = 0;
return true;
return 0;
} else {
*timeout = qemu_soonest_timeout(*timeout, deadline);
return qemu_soonest_timeout(timeout, deadline);
}
}
static gboolean
aio_ctx_prepare(GSource *source, gint *timeout)
{
AioContext *ctx = (AioContext *) source;
/* We assume there is no timeout already supplied */
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
if (aio_prepare(ctx)) {
*timeout = 0;
}
return false;
return *timeout == 0;
}
static gboolean
@@ -209,7 +218,7 @@ aio_ctx_dispatch(GSource *source,
AioContext *ctx = (AioContext *) source;
assert(callback == NULL);
aio_poll(ctx, false);
aio_dispatch(ctx);
return true;
}

View File

@@ -1,7 +1,7 @@
common-obj-y += rng.o rng-egd.o
common-obj-$(CONFIG_POSIX) += rng-random.o
common-obj-y += msmouse.o
common-obj-y += msmouse.o testdev.o
common-obj-$(CONFIG_BRLAPI) += baum.o
baum.o-cflags := $(SDL_CFLAGS)

View File

@@ -257,15 +257,6 @@ static void host_memory_backend_init(Object *obj)
host_memory_backend_set_policy, NULL, NULL, NULL);
}
static void host_memory_backend_finalize(Object *obj)
{
HostMemoryBackend *backend = MEMORY_BACKEND(obj);
if (memory_region_size(&backend->mr)) {
memory_region_destroy(&backend->mr);
}
}
MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{
@@ -304,7 +295,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
/* ensure policy won't be ignored in case memory is preallocated
* before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
* this doesn't catch hugepage case. */
unsigned flags = MPOL_MF_STRICT;
unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
/* check for invalid host-nodes and policies and give more verbose
* error messages than mbind(). */
@@ -360,7 +351,6 @@ static const TypeInfo host_memory_backend_info = {
.class_init = host_memory_backend_class_init,
.instance_size = sizeof(HostMemoryBackend),
.instance_init = host_memory_backend_init,
.instance_finalize = host_memory_backend_finalize,
.interfaces = (InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },
{ }

131
backends/testdev.c Normal file
View File

@@ -0,0 +1,131 @@
/*
* QEMU Char Device for testsuite control
*
* Copyright (c) 2014 Red Hat, Inc.
*
* Author: Paolo Bonzini <pbonzini@redhat.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu-common.h"
#include "sysemu/char.h"
#define BUF_SIZE 32
typedef struct {
CharDriverState *chr;
uint8_t in_buf[32];
int in_buf_used;
} TestdevCharState;
/* Try to interpret a whole incoming packet */
static int testdev_eat_packet(TestdevCharState *testdev)
{
const uint8_t *cur = testdev->in_buf;
int len = testdev->in_buf_used;
uint8_t c;
int arg;
#define EAT(c) do { \
if (!len--) { \
return 0; \
} \
c = *cur++; \
} while (0)
EAT(c);
while (isspace(c)) {
EAT(c);
}
arg = 0;
while (isdigit(c)) {
arg = arg * 10 + c - '0';
EAT(c);
}
while (isspace(c)) {
EAT(c);
}
switch (c) {
case 'q':
exit((arg << 1) | 1);
break;
default:
break;
}
return cur - testdev->in_buf;
}
/* The other end is writing some data. Store it and try to interpret */
static int testdev_write(CharDriverState *chr, const uint8_t *buf, int len)
{
TestdevCharState *testdev = chr->opaque;
int tocopy, eaten, orig_len = len;
while (len) {
/* Complete our buffer as much as possible */
tocopy = MIN(len, BUF_SIZE - testdev->in_buf_used);
memcpy(testdev->in_buf + testdev->in_buf_used, buf, tocopy);
testdev->in_buf_used += tocopy;
buf += tocopy;
len -= tocopy;
/* Interpret it as much as possible */
while (testdev->in_buf_used > 0 &&
(eaten = testdev_eat_packet(testdev)) > 0) {
memmove(testdev->in_buf, testdev->in_buf + eaten,
testdev->in_buf_used - eaten);
testdev->in_buf_used -= eaten;
}
}
return orig_len;
}
static void testdev_close(struct CharDriverState *chr)
{
TestdevCharState *testdev = chr->opaque;
g_free(testdev);
}
CharDriverState *chr_testdev_init(void)
{
TestdevCharState *testdev;
CharDriverState *chr;
testdev = g_malloc0(sizeof(TestdevCharState));
testdev->chr = chr = g_malloc0(sizeof(CharDriverState));
chr->opaque = testdev;
chr->chr_write = testdev_write;
chr->chr_close = testdev_close;
return chr;
}
static void register_types(void)
{
register_char_driver_qapi("testdev", CHARDEV_BACKEND_KIND_TESTDEV, NULL);
}
type_init(register_types);

View File

@@ -186,7 +186,7 @@ static int bmds_aio_inflight(BlkMigDevState *bmds, int64_t sector)
{
int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
if ((sector << BDRV_SECTOR_BITS) < bdrv_getlength(bmds->bs)) {
if (sector < bdrv_nb_sectors(bmds->bs)) {
return !!(bmds->aio_bitmap[chunk / (sizeof(unsigned long) * 8)] &
(1UL << (chunk % (sizeof(unsigned long) * 8))));
} else {
@@ -223,8 +223,7 @@ static void alloc_aio_bitmap(BlkMigDevState *bmds)
BlockDriverState *bs = bmds->bs;
int64_t bitmap_size;
bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1;
bitmap_size = bdrv_nb_sectors(bs) + BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1;
bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * 8;
bmds->aio_bitmap = g_malloc0(bitmap_size);
@@ -284,7 +283,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
nr_sectors = total_sectors - cur_sector;
}
blk = g_malloc(sizeof(BlkMigBlock));
blk = g_new(BlkMigBlock, 1);
blk->buf = g_malloc(BLOCK_SIZE);
blk->bmds = bmds;
blk->sector = cur_sector;
@@ -350,12 +349,12 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
int64_t sectors;
if (!bdrv_is_read_only(bs)) {
sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
sectors = bdrv_nb_sectors(bs);
if (sectors <= 0) {
return;
}
bmds = g_malloc0(sizeof(BlkMigDevState));
bmds = g_new0(BlkMigDevState, 1);
bmds->bs = bs;
bmds->bulk_completed = 0;
bmds->total_sectors = sectors;
@@ -466,7 +465,7 @@ static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
} else {
nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
}
blk = g_malloc(sizeof(BlkMigBlock));
blk = g_new(BlkMigBlock, 1);
blk->buf = g_malloc(BLOCK_SIZE);
blk->bmds = bmds;
blk->sector = sector;
@@ -799,7 +798,7 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
if (bs != bs_prev) {
bs_prev = bs;
total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
total_sectors = bdrv_nb_sectors(bs);
if (total_sectors <= 0) {
error_report("Error getting length of block device %s",
device_name);

365
block.c
View File

@@ -57,6 +57,8 @@ struct BdrvDirtyBitmap {
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
#define COROUTINE_POOL_RESERVATION 64 /* number of coroutines to reserve */
static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
static BlockDriverAIOCB *bdrv_aio_readv_em(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
@@ -349,7 +351,7 @@ BlockDriverState *bdrv_new(const char *device_name, Error **errp)
return NULL;
}
bs = g_malloc0(sizeof(BlockDriverState));
bs = g_new0(BlockDriverState, 1);
QLIST_INIT(&bs->dirty_bitmaps);
pstrcpy(bs->device_name, sizeof(bs->device_name), device_name);
if (device_name[0] != '\0') {
@@ -701,6 +703,7 @@ static int find_image_format(BlockDriverState *bs, const char *filename,
/**
* Set the current 'total_sectors' value
* Return 0 on success, -errno on error.
*/
static int refresh_total_sectors(BlockDriverState *bs, int64_t hint)
{
@@ -961,6 +964,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
} else {
bs->filename[0] = '\0';
}
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename), bs->filename);
bs->drv = drv;
bs->opaque = g_malloc0(drv->instance_size);
@@ -1313,7 +1317,6 @@ int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags, Error **errp)
error_setg_errno(errp, -total_size, "Could not get image size");
goto out;
}
total_size &= BDRV_SECTOR_MASK;
/* Create the temporary image */
ret = get_tmp_filename(tmp_filename, PATH_MAX + 1);
@@ -1503,6 +1506,8 @@ int bdrv_open(BlockDriverState **pbs, const char *filename,
}
}
bdrv_refresh_filename(bs);
/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
* temporary snapshot afterwards. */
if (snapshot_flags) {
@@ -1814,6 +1819,8 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
void bdrv_close(BlockDriverState *bs)
{
BdrvAioNotifier *ban, *ban_next;
if (bs->job) {
block_job_cancel_sync(bs->job);
}
@@ -1843,6 +1850,8 @@ void bdrv_close(BlockDriverState *bs)
bs->zero_beyond_eof = false;
QDECREF(bs->options);
bs->options = NULL;
QDECREF(bs->full_open_options);
bs->full_open_options = NULL;
if (bs->file != NULL) {
bdrv_unref(bs->file);
@@ -1856,6 +1865,11 @@ void bdrv_close(BlockDriverState *bs)
if (bs->io_limits_enabled) {
bdrv_io_limits_disable(bs);
}
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
g_free(ban);
}
QLIST_INIT(&bs->aio_notifiers);
}
void bdrv_close_all(void)
@@ -2107,6 +2121,9 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev)
}
bs->dev = dev;
bdrv_iostatus_reset(bs);
/* We're expecting I/O from the device so bump up coroutine pool size */
qemu_coroutine_adjust_pool_size(COROUTINE_POOL_RESERVATION);
return 0;
}
@@ -2126,6 +2143,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
bs->dev_ops = NULL;
bs->dev_opaque = NULL;
bs->guest_block_size = 512;
qemu_coroutine_adjust_pool_size(-COROUTINE_POOL_RESERVATION);
}
/* TODO change to return DeviceState * when all users are qdevified */
@@ -2203,6 +2221,9 @@ bool bdrv_dev_is_medium_locked(BlockDriverState *bs)
*/
int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix)
{
if (bs->drv == NULL) {
return -ENOMEDIUM;
}
if (bs->drv->bdrv_check == NULL) {
return -ENOTSUP;
}
@@ -2269,7 +2290,14 @@ int bdrv_commit(BlockDriverState *bs)
}
total_sectors = length >> BDRV_SECTOR_BITS;
buf = g_malloc(COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
/* qemu_try_blockalign() for bs will choose an alignment that works for
* bs->backing_hd as well, so no need to compare the alignment manually. */
buf = qemu_try_blockalign(bs, COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
if (buf == NULL) {
ret = -ENOMEM;
goto ro_cleanup;
}
for (sector = 0; sector < total_sectors; sector += n) {
ret = bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n);
@@ -2307,7 +2335,7 @@ int bdrv_commit(BlockDriverState *bs)
ret = 0;
ro_cleanup:
g_free(buf);
qemu_vfree(buf);
if (ro) {
/* ignoring error return here */
@@ -2612,7 +2640,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
* into our deletion queue, until we hit the 'base'
*/
while (intermediate) {
intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
intermediate_state = g_new0(BlkIntermediateStates, 1);
intermediate_state->bs = intermediate;
QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
@@ -2827,18 +2855,16 @@ int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
*/
int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags)
{
int64_t target_size;
int64_t ret, nb_sectors, sector_num = 0;
int64_t target_sectors, ret, nb_sectors, sector_num = 0;
int n;
target_size = bdrv_getlength(bs);
if (target_size < 0) {
return target_size;
target_sectors = bdrv_nb_sectors(bs);
if (target_sectors < 0) {
return target_sectors;
}
target_size /= BDRV_SECTOR_SIZE;
for (;;) {
nb_sectors = target_size - sector_num;
nb_sectors = target_sectors - sector_num;
if (nb_sectors <= 0) {
return 0;
}
@@ -2968,7 +2994,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
cluster_sector_num, cluster_nb_sectors);
iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
if (bounce_buffer == NULL) {
ret = -ENOMEM;
goto err;
}
qemu_iovec_init_external(&bounce_qiov, &iov, 1);
ret = drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
@@ -3056,15 +3087,14 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
} else {
/* Read zeros after EOF of growable BDSes */
int64_t len, total_sectors, max_nb_sectors;
int64_t total_sectors, max_nb_sectors;
len = bdrv_getlength(bs);
if (len < 0) {
ret = len;
total_sectors = bdrv_nb_sectors(bs);
if (total_sectors < 0) {
ret = total_sectors;
goto out;
}
total_sectors = DIV_ROUND_UP(len, BDRV_SECTOR_SIZE);
max_nb_sectors = ROUND_UP(MAX(0, total_sectors - sector_num),
align >> BDRV_SECTOR_BITS);
if (max_nb_sectors > 0) {
@@ -3253,7 +3283,11 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
/* Fall back to bounce buffer if write zeroes is unsupported */
iov.iov_len = num * BDRV_SECTOR_SIZE;
if (iov.iov_base == NULL) {
iov.iov_base = qemu_blockalign(bs, num * BDRV_SECTOR_SIZE);
iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE);
if (iov.iov_base == NULL) {
ret = -ENOMEM;
goto fail;
}
memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
}
qemu_iovec_init_external(&qiov, &iov, 1);
@@ -3273,6 +3307,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
nb_sectors -= num;
}
fail:
qemu_vfree(iov.iov_base);
return ret;
}
@@ -3536,11 +3571,12 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
}
/**
* Length of a file in bytes. Return < 0 if error or unknown.
* Return number of sectors on success, -errno on error.
*/
int64_t bdrv_getlength(BlockDriverState *bs)
int64_t bdrv_nb_sectors(BlockDriverState *bs)
{
BlockDriver *drv = bs->drv;
if (!drv)
return -ENOMEDIUM;
@@ -3550,19 +3586,26 @@ int64_t bdrv_getlength(BlockDriverState *bs)
return ret;
}
}
return bs->total_sectors * BDRV_SECTOR_SIZE;
return bs->total_sectors;
}
/**
* Return length in bytes on success, -errno on error.
* The length is always a multiple of BDRV_SECTOR_SIZE.
*/
int64_t bdrv_getlength(BlockDriverState *bs)
{
int64_t ret = bdrv_nb_sectors(bs);
return ret < 0 ? ret : ret * BDRV_SECTOR_SIZE;
}
/* return 0 as number of sectors if no device present or error */
void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
{
int64_t length;
length = bdrv_getlength(bs);
if (length < 0)
length = 0;
else
length = length >> BDRV_SECTOR_BITS;
*nb_sectors_ptr = length;
int64_t nb_sectors = bdrv_nb_sectors(bs);
*nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors;
}
void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
@@ -3708,11 +3751,17 @@ const char *bdrv_get_format_name(BlockDriverState *bs)
return bs->drv ? bs->drv->format_name : NULL;
}
static int qsort_strcmp(const void *a, const void *b)
{
return strcmp(a, b);
}
void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
void *opaque)
{
BlockDriver *drv;
int count = 0;
int i;
const char **formats = NULL;
QLIST_FOREACH(drv, &bdrv_drivers, list) {
@@ -3724,12 +3773,18 @@ void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
}
if (!found) {
formats = g_realloc(formats, (count + 1) * sizeof(char *));
formats = g_renew(const char *, formats, count + 1);
formats[count++] = drv->format_name;
it(opaque, drv->format_name);
}
}
}
qsort(formats, count, sizeof(formats[0]), qsort_strcmp);
for (i = 0; i < count; i++) {
it(opaque, formats[i]);
}
g_free(formats);
}
@@ -3945,21 +4000,21 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum)
{
int64_t length;
int64_t total_sectors;
int64_t n;
int64_t ret, ret2;
length = bdrv_getlength(bs);
if (length < 0) {
return length;
total_sectors = bdrv_nb_sectors(bs);
if (total_sectors < 0) {
return total_sectors;
}
if (sector_num >= (length >> BDRV_SECTOR_BITS)) {
if (sector_num >= total_sectors) {
*pnum = 0;
return 0;
}
n = bs->total_sectors - sector_num;
n = total_sectors - sector_num;
if (n < nb_sectors) {
nb_sectors = n;
}
@@ -3994,8 +4049,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
ret |= BDRV_BLOCK_ZERO;
} else if (bs->backing_hd) {
BlockDriverState *bs2 = bs->backing_hd;
int64_t length2 = bdrv_getlength(bs2);
if (length2 >= 0 && sector_num >= (length2 >> BDRV_SECTOR_BITS)) {
int64_t nb_sectors2 = bdrv_nb_sectors(bs2);
if (nb_sectors2 >= 0 && sector_num >= nb_sectors2) {
ret |= BDRV_BLOCK_ZERO;
}
}
@@ -4498,6 +4553,12 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
// Add the second request
qemu_iovec_concat(qiov, reqs[i].qiov, 0, reqs[i].qiov->size);
// Add tail of first request, if necessary
if (qiov->size < reqs[outidx].qiov->size) {
qemu_iovec_concat(qiov, reqs[outidx].qiov, qiov->size,
reqs[outidx].qiov->size - qiov->size);
}
reqs[outidx].nb_sectors = qiov->size >> 9;
reqs[outidx].qiov = qiov;
@@ -4607,8 +4668,9 @@ static void bdrv_aio_bh_cb(void *opaque)
{
BlockDriverAIOCBSync *acb = opaque;
if (!acb->is_write)
if (!acb->is_write && acb->ret >= 0) {
qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
}
qemu_vfree(acb->bounce);
acb->common.cb(acb->common.opaque, acb->ret);
qemu_bh_delete(acb->bh);
@@ -4630,10 +4692,12 @@ static BlockDriverAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
acb->is_write = is_write;
acb->qiov = qiov;
acb->bounce = qemu_blockalign(bs, qiov->size);
acb->bounce = qemu_try_blockalign(bs, qiov->size);
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
if (is_write) {
if (acb->bounce == NULL) {
acb->ret = -ENOMEM;
} else if (is_write) {
qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
acb->ret = bs->drv->bdrv_write(bs, sector_num, acb->bounce, nb_sectors);
} else {
@@ -5247,6 +5311,19 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
return qemu_memalign(bdrv_opt_mem_align(bs), size);
}
void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
{
size_t align = bdrv_opt_mem_align(bs);
/* Ensure that NULL is never returned on success */
assert(align > 0);
if (size == 0) {
size = align;
}
return qemu_try_memalign(align, size);
}
/*
* Check if all memory in this vector is sector aligned.
*/
@@ -5277,14 +5354,13 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int granularity,
granularity >>= BDRV_SECTOR_BITS;
assert(granularity);
bitmap_size = bdrv_getlength(bs);
bitmap_size = bdrv_nb_sectors(bs);
if (bitmap_size < 0) {
error_setg_errno(errp, -bitmap_size, "could not get length of device");
errno = -bitmap_size;
return NULL;
}
bitmap_size >>= BDRV_SECTOR_BITS;
bitmap = g_malloc0(sizeof(BdrvDirtyBitmap));
bitmap = g_new0(BdrvDirtyBitmap, 1);
bitmap->bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
return bitmap;
@@ -5310,8 +5386,8 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
BlockDirtyInfoList **plist = &list;
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
BlockDirtyInfo *info = g_malloc0(sizeof(BlockDirtyInfo));
BlockDirtyInfoList *entry = g_malloc0(sizeof(BlockDirtyInfoList));
BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
info->count = bdrv_get_dirty_count(bs, bm);
info->granularity =
((int64_t) BDRV_SECTOR_SIZE << hbitmap_granularity(bm->bitmap));
@@ -5371,6 +5447,9 @@ void bdrv_ref(BlockDriverState *bs)
* deleted. */
void bdrv_unref(BlockDriverState *bs)
{
if (!bs) {
return;
}
assert(bs->refcnt > 0);
if (--bs->refcnt == 0) {
bdrv_delete(bs);
@@ -5402,7 +5481,7 @@ void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason)
BdrvOpBlocker *blocker;
assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
blocker = g_malloc0(sizeof(BdrvOpBlocker));
blocker = g_new0(BdrvOpBlocker, 1);
blocker->reason = reason;
QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list);
}
@@ -5591,7 +5670,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
if (size == -1) {
if (backing_file) {
BlockDriverState *bs;
uint64_t size;
int64_t size;
int back_flags;
/* backing files always opened read-only */
@@ -5609,8 +5688,13 @@ void bdrv_img_create(const char *filename, const char *fmt,
local_err = NULL;
goto out;
}
bdrv_get_geometry(bs, &size);
size *= 512;
size = bdrv_getlength(bs);
if (size < 0) {
error_setg_errno(errp, -size, "Could not get size of '%s'",
backing_file);
bdrv_unref(bs);
goto out;
}
qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size);
@@ -5658,10 +5742,16 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
void bdrv_detach_aio_context(BlockDriverState *bs)
{
BdrvAioNotifier *baf;
if (!bs->drv) {
return;
}
QLIST_FOREACH(baf, &bs->aio_notifiers, list) {
baf->detach_aio_context(baf->opaque);
}
if (bs->io_limits_enabled) {
throttle_detach_aio_context(&bs->throttle_state);
}
@@ -5681,6 +5771,8 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
void bdrv_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
{
BdrvAioNotifier *ban;
if (!bs->drv) {
return;
}
@@ -5699,6 +5791,10 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
if (bs->io_limits_enabled) {
throttle_attach_aio_context(&bs->throttle_state, new_context);
}
QLIST_FOREACH(ban, &bs->aio_notifiers, list) {
ban->attached_aio_context(new_context, ban->opaque);
}
}
void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
@@ -5715,6 +5811,43 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
aio_context_release(new_context);
}
void bdrv_add_aio_context_notifier(BlockDriverState *bs,
void (*attached_aio_context)(AioContext *new_context, void *opaque),
void (*detach_aio_context)(void *opaque), void *opaque)
{
BdrvAioNotifier *ban = g_new(BdrvAioNotifier, 1);
*ban = (BdrvAioNotifier){
.attached_aio_context = attached_aio_context,
.detach_aio_context = detach_aio_context,
.opaque = opaque
};
QLIST_INSERT_HEAD(&bs->aio_notifiers, ban, list);
}
void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
void (*attached_aio_context)(AioContext *,
void *),
void (*detach_aio_context)(void *),
void *opaque)
{
BdrvAioNotifier *ban, *ban_next;
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
if (ban->attached_aio_context == attached_aio_context &&
ban->detach_aio_context == detach_aio_context &&
ban->opaque == opaque)
{
QLIST_REMOVE(ban, list);
g_free(ban);
return;
}
}
abort();
}
void bdrv_add_before_write_notifier(BlockDriverState *bs,
NotifierWithReturn *notifier)
{
@@ -5840,3 +5973,133 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
bdrv_flush_io_queue(bs->file);
}
}
static bool append_open_options(QDict *d, BlockDriverState *bs)
{
const QDictEntry *entry;
bool found_any = false;
for (entry = qdict_first(bs->options); entry;
entry = qdict_next(bs->options, entry))
{
/* Only take options for this level and exclude all non-driver-specific
* options */
if (!strchr(qdict_entry_key(entry), '.') &&
strcmp(qdict_entry_key(entry), "node-name"))
{
qobject_incref(qdict_entry_value(entry));
qdict_put_obj(d, qdict_entry_key(entry), qdict_entry_value(entry));
found_any = true;
}
}
return found_any;
}
/* Updates the following BDS fields:
* - exact_filename: A filename which may be used for opening a block device
* which (mostly) equals the given BDS (even without any
* other options; so reading and writing must return the same
* results, but caching etc. may be different)
* - full_open_options: Options which, when given when opening a block device
* (without a filename), result in a BDS (mostly)
* equalling the given one
* - filename: If exact_filename is set, it is copied here. Otherwise,
* full_open_options is converted to a JSON object, prefixed with
* "json:" (for use through the JSON pseudo protocol) and put here.
*/
void bdrv_refresh_filename(BlockDriverState *bs)
{
BlockDriver *drv = bs->drv;
QDict *opts;
if (!drv) {
return;
}
/* This BDS's file name will most probably depend on its file's name, so
* refresh that first */
if (bs->file) {
bdrv_refresh_filename(bs->file);
}
if (drv->bdrv_refresh_filename) {
/* Obsolete information is of no use here, so drop the old file name
* information before refreshing it */
bs->exact_filename[0] = '\0';
if (bs->full_open_options) {
QDECREF(bs->full_open_options);
bs->full_open_options = NULL;
}
drv->bdrv_refresh_filename(bs);
} else if (bs->file) {
/* Try to reconstruct valid information from the underlying file */
bool has_open_options;
bs->exact_filename[0] = '\0';
if (bs->full_open_options) {
QDECREF(bs->full_open_options);
bs->full_open_options = NULL;
}
opts = qdict_new();
has_open_options = append_open_options(opts, bs);
/* If no specific options have been given for this BDS, the filename of
* the underlying file should suffice for this one as well */
if (bs->file->exact_filename[0] && !has_open_options) {
strcpy(bs->exact_filename, bs->file->exact_filename);
}
/* Reconstructing the full options QDict is simple for most format block
* drivers, as long as the full options are known for the underlying
* file BDS. The full options QDict of that file BDS should somehow
* contain a representation of the filename, therefore the following
* suffices without querying the (exact_)filename of this BDS. */
if (bs->file->full_open_options) {
qdict_put_obj(opts, "driver",
QOBJECT(qstring_from_str(drv->format_name)));
QINCREF(bs->file->full_open_options);
qdict_put_obj(opts, "file", QOBJECT(bs->file->full_open_options));
bs->full_open_options = opts;
} else {
QDECREF(opts);
}
} else if (!bs->full_open_options && qdict_size(bs->options)) {
/* There is no underlying file BDS (at least referenced by BDS.file),
* so the full options QDict should be equal to the options given
* specifically for this block device when it was opened (plus the
* driver specification).
* Because those options don't change, there is no need to update
* full_open_options when it's already set. */
opts = qdict_new();
append_open_options(opts, bs);
qdict_put_obj(opts, "driver",
QOBJECT(qstring_from_str(drv->format_name)));
if (bs->exact_filename[0]) {
/* This may not work for all block protocol drivers (some may
* require this filename to be parsed), but we have to find some
* default solution here, so just include it. If some block driver
* does not support pure options without any filename at all or
* needs some special format of the options QDict, it needs to
* implement the driver-specific bdrv_refresh_filename() function.
*/
qdict_put_obj(opts, "filename",
QOBJECT(qstring_from_str(bs->exact_filename)));
}
bs->full_open_options = opts;
}
if (bs->exact_filename[0]) {
pstrcpy(bs->filename, sizeof(bs->filename), bs->exact_filename);
} else if (bs->full_open_options) {
QString *json = qobject_to_json(QOBJECT(bs->full_open_options));
snprintf(bs->filename, sizeof(bs->filename), "json:%s",
qstring_get_str(json));
QDECREF(json);
}
}

View File

@@ -10,15 +10,14 @@ block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
ifeq ($(CONFIG_POSIX),y)
block-obj-y += nbd.o nbd-client.o sheepdog.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o
block-obj-$(CONFIG_LIBNFS) += nfs.o
block-obj-$(CONFIG_CURL) += curl.o
block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
block-obj-$(CONFIG_LIBSSH2) += ssh.o
endif
common-obj-y += stream.o
common-obj-y += commit.o
@@ -35,5 +34,6 @@ gluster.o-cflags := $(GLUSTERFS_CFLAGS)
gluster.o-libs := $(GLUSTERFS_LIBS)
ssh.o-cflags := $(LIBSSH2_CFLAGS)
ssh.o-libs := $(LIBSSH2_LIBS)
archipelago.o-libs := $(ARCHIPELAGO_LIBS)
qcow.o-libs := -lz
linux-aio.o-libs := -laio

1069
block/archipelago.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -26,6 +26,10 @@
#include "qemu/config-file.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h"
typedef struct BDRVBlkdebugState {
int state;
@@ -449,6 +453,10 @@ static void error_callback_bh(void *opaque)
static void blkdebug_aio_cancel(BlockDriverAIOCB *blockacb)
{
BlkdebugAIOCB *acb = container_of(blockacb, BlkdebugAIOCB, common);
if (acb->bh) {
qemu_bh_delete(acb->bh);
acb->bh = NULL;
}
qemu_aio_release(acb);
}
@@ -522,6 +530,25 @@ static BlockDriverAIOCB *blkdebug_aio_writev(BlockDriverState *bs,
return bdrv_aio_writev(bs->file, sector_num, qiov, nb_sectors, cb, opaque);
}
static BlockDriverAIOCB *blkdebug_aio_flush(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
if (rule->options.inject.sector == -1) {
break;
}
}
if (rule && rule->options.inject.error) {
return inject_error(bs, cb, opaque, rule);
}
return bdrv_aio_flush(bs->file, cb, opaque);
}
static void blkdebug_close(BlockDriverState *bs)
{
@@ -687,6 +714,98 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
return bdrv_getlength(bs->file);
}
static void blkdebug_refresh_filename(BlockDriverState *bs)
{
BDRVBlkdebugState *s = bs->opaque;
struct BlkdebugRule *rule;
QDict *opts;
QList *inject_error_list = NULL, *set_state_list = NULL;
QList *suspend_list = NULL;
int event;
if (!bs->file->full_open_options) {
/* The config file cannot be recreated, so creating a plain filename
* is impossible */
return;
}
opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkdebug")));
QINCREF(bs->file->full_open_options);
qdict_put_obj(opts, "image", QOBJECT(bs->file->full_open_options));
for (event = 0; event < BLKDBG_EVENT_MAX; event++) {
QLIST_FOREACH(rule, &s->rules[event], next) {
if (rule->action == ACTION_INJECT_ERROR) {
QDict *inject_error = qdict_new();
qdict_put_obj(inject_error, "event", QOBJECT(qstring_from_str(
BlkdebugEvent_lookup[rule->event])));
qdict_put_obj(inject_error, "state",
QOBJECT(qint_from_int(rule->state)));
qdict_put_obj(inject_error, "errno", QOBJECT(qint_from_int(
rule->options.inject.error)));
qdict_put_obj(inject_error, "sector", QOBJECT(qint_from_int(
rule->options.inject.sector)));
qdict_put_obj(inject_error, "once", QOBJECT(qbool_from_int(
rule->options.inject.once)));
qdict_put_obj(inject_error, "immediately",
QOBJECT(qbool_from_int(
rule->options.inject.immediately)));
if (!inject_error_list) {
inject_error_list = qlist_new();
}
qlist_append_obj(inject_error_list, QOBJECT(inject_error));
} else if (rule->action == ACTION_SET_STATE) {
QDict *set_state = qdict_new();
qdict_put_obj(set_state, "event", QOBJECT(qstring_from_str(
BlkdebugEvent_lookup[rule->event])));
qdict_put_obj(set_state, "state",
QOBJECT(qint_from_int(rule->state)));
qdict_put_obj(set_state, "new_state", QOBJECT(qint_from_int(
rule->options.set_state.new_state)));
if (!set_state_list) {
set_state_list = qlist_new();
}
qlist_append_obj(set_state_list, QOBJECT(set_state));
} else if (rule->action == ACTION_SUSPEND) {
QDict *suspend = qdict_new();
qdict_put_obj(suspend, "event", QOBJECT(qstring_from_str(
BlkdebugEvent_lookup[rule->event])));
qdict_put_obj(suspend, "state",
QOBJECT(qint_from_int(rule->state)));
qdict_put_obj(suspend, "tag", QOBJECT(qstring_from_str(
rule->options.suspend.tag)));
if (!suspend_list) {
suspend_list = qlist_new();
}
qlist_append_obj(suspend_list, QOBJECT(suspend));
}
}
}
if (inject_error_list) {
qdict_put_obj(opts, "inject-error", QOBJECT(inject_error_list));
}
if (set_state_list) {
qdict_put_obj(opts, "set-state", QOBJECT(set_state_list));
}
if (suspend_list) {
qdict_put_obj(opts, "suspend", QOBJECT(suspend_list));
}
bs->full_open_options = opts;
}
static BlockDriver bdrv_blkdebug = {
.format_name = "blkdebug",
.protocol_name = "blkdebug",
@@ -696,9 +815,11 @@ static BlockDriver bdrv_blkdebug = {
.bdrv_file_open = blkdebug_open,
.bdrv_close = blkdebug_close,
.bdrv_getlength = blkdebug_getlength,
.bdrv_refresh_filename = blkdebug_refresh_filename,
.bdrv_aio_readv = blkdebug_aio_readv,
.bdrv_aio_writev = blkdebug_aio_writev,
.bdrv_aio_flush = blkdebug_aio_flush,
.bdrv_debug_event = blkdebug_debug_event,
.bdrv_debug_breakpoint = blkdebug_debug_breakpoint,

View File

@@ -10,6 +10,8 @@
#include <stdarg.h>
#include "qemu/sockets.h" /* for EINPROGRESS on Windows */
#include "block/block_int.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qstring.h"
typedef struct {
BlockDriverState *test_file;
@@ -156,6 +158,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail:
qemu_opts_del(opts);
return ret;
}
@@ -320,6 +323,32 @@ static void blkverify_attach_aio_context(BlockDriverState *bs,
bdrv_attach_aio_context(s->test_file, new_context);
}
static void blkverify_refresh_filename(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
/* bs->file has already been refreshed */
bdrv_refresh_filename(s->test_file);
if (bs->file->full_open_options && s->test_file->full_open_options) {
QDict *opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkverify")));
QINCREF(bs->file->full_open_options);
qdict_put_obj(opts, "raw", QOBJECT(bs->file->full_open_options));
QINCREF(s->test_file->full_open_options);
qdict_put_obj(opts, "test", QOBJECT(s->test_file->full_open_options));
bs->full_open_options = opts;
}
if (bs->file->exact_filename[0] && s->test_file->exact_filename[0]) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"blkverify:%s:%s",
bs->file->exact_filename, s->test_file->exact_filename);
}
}
static BlockDriver bdrv_blkverify = {
.format_name = "blkverify",
.protocol_name = "blkverify",
@@ -329,6 +358,7 @@ static BlockDriver bdrv_blkverify = {
.bdrv_file_open = blkverify_open,
.bdrv_close = blkverify_close,
.bdrv_getlength = blkverify_getlength,
.bdrv_refresh_filename = blkverify_refresh_filename,
.bdrv_aio_readv = blkverify_aio_readv,
.bdrv_aio_writev = blkverify_aio_writev,

View File

@@ -131,7 +131,11 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
return -EFBIG;
}
s->catalog_bitmap = g_malloc(s->catalog_size * 4);
s->catalog_bitmap = g_try_new(uint32_t, s->catalog_size);
if (s->catalog_size && s->catalog_bitmap == NULL) {
error_setg(errp, "Could not allocate memory for catalog");
return -ENOMEM;
}
ret = bdrv_pread(bs->file, le32_to_cpu(bochs.header), s->catalog_bitmap,
s->catalog_size * 4);

View File

@@ -116,7 +116,12 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
"try increasing block size");
return -EINVAL;
}
s->offsets = g_malloc(offsets_size);
s->offsets = g_try_malloc(offsets_size);
if (s->offsets == NULL) {
error_setg(errp, "Could not allocate offsets table");
return -ENOMEM;
}
ret = bdrv_pread(bs->file, 128 + 4 + 4, s->offsets, offsets_size);
if (ret < 0) {
@@ -158,8 +163,20 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
}
/* initialize zlib engine */
s->compressed_block = g_malloc(max_compressed_block_size + 1);
s->uncompressed_block = g_malloc(s->block_size);
s->compressed_block = g_try_malloc(max_compressed_block_size + 1);
if (s->compressed_block == NULL) {
error_setg(errp, "Could not allocate compressed_block");
ret = -ENOMEM;
goto fail;
}
s->uncompressed_block = g_try_malloc(s->block_size);
if (s->uncompressed_block == NULL) {
error_setg(errp, "Could not allocate uncompressed_block");
ret = -ENOMEM;
goto fail;
}
if (inflateInit(&s->zstream) != Z_OK) {
ret = -EINVAL;
goto fail;

View File

@@ -26,7 +26,7 @@
#include "qapi/qmp/qbool.h"
#include <curl/curl.h>
// #define DEBUG
// #define DEBUG_CURL
// #define DEBUG_VERBOSE
#ifdef DEBUG_CURL
@@ -63,6 +63,7 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
#define CURL_NUM_ACB 8
#define SECTOR_SIZE 512
#define READ_AHEAD_DEFAULT (256 * 1024)
#define CURL_TIMEOUT_DEFAULT 5
#define FIND_RET_NONE 0
#define FIND_RET_OK 1
@@ -71,6 +72,8 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
#define CURL_BLOCK_OPT_URL "url"
#define CURL_BLOCK_OPT_READAHEAD "readahead"
#define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
#define CURL_BLOCK_OPT_TIMEOUT "timeout"
#define CURL_BLOCK_OPT_COOKIE "cookie"
struct BDRVCURLState;
@@ -109,6 +112,8 @@ typedef struct BDRVCURLState {
char *url;
size_t readahead_size;
bool sslverify;
int timeout;
char *cookie;
bool accept_range;
AioContext *aio_context;
} BDRVCURLState;
@@ -352,7 +357,7 @@ static void curl_multi_timeout_do(void *arg)
#endif
}
static CURLState *curl_init_state(BDRVCURLState *s)
static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
{
CURLState *state = NULL;
int i, j;
@@ -370,7 +375,7 @@ static CURLState *curl_init_state(BDRVCURLState *s)
break;
}
if (!state) {
aio_poll(state->s->aio_context, true);
aio_poll(bdrv_get_aio_context(bs), true);
}
} while(!state);
@@ -382,7 +387,10 @@ static CURLState *curl_init_state(BDRVCURLState *s)
curl_easy_setopt(state->curl, CURLOPT_URL, s->url);
curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER,
(long) s->sslverify);
curl_easy_setopt(state->curl, CURLOPT_TIMEOUT, 5);
if (s->cookie) {
curl_easy_setopt(state->curl, CURLOPT_COOKIE, s->cookie);
}
curl_easy_setopt(state->curl, CURLOPT_TIMEOUT, s->timeout);
curl_easy_setopt(state->curl, CURLOPT_WRITEFUNCTION,
(void *)curl_read_cb);
curl_easy_setopt(state->curl, CURLOPT_WRITEDATA, (void *)state);
@@ -489,6 +497,16 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_BOOL,
.help = "Verify SSL certificate"
},
{
.name = CURL_BLOCK_OPT_TIMEOUT,
.type = QEMU_OPT_NUMBER,
.help = "Curl timeout"
},
{
.name = CURL_BLOCK_OPT_COOKIE,
.type = QEMU_OPT_STRING,
.help = "Pass the cookie or list of cookies with each request"
},
{ /* end of list */ }
},
};
@@ -501,6 +519,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
QemuOpts *opts;
Error *local_err = NULL;
const char *file;
const char *cookie;
double d;
static int inited = 0;
@@ -525,8 +544,14 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
goto out_noclean;
}
s->timeout = qemu_opt_get_number(opts, CURL_BLOCK_OPT_TIMEOUT,
CURL_TIMEOUT_DEFAULT);
s->sslverify = qemu_opt_get_bool(opts, CURL_BLOCK_OPT_SSLVERIFY, true);
cookie = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE);
s->cookie = g_strdup(cookie);
file = qemu_opt_get(opts, CURL_BLOCK_OPT_URL);
if (file == NULL) {
error_setg(errp, "curl block driver requires an 'url' option");
@@ -541,7 +566,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
DPRINTF("CURL: Opening %s\n", file);
s->aio_context = bdrv_get_aio_context(bs);
s->url = g_strdup(file);
state = curl_init_state(s);
state = curl_init_state(bs, s);
if (!state)
goto out_noclean;
@@ -582,6 +607,7 @@ out:
curl_easy_cleanup(state->curl);
state->curl = NULL;
out_noclean:
g_free(s->cookie);
g_free(s->url);
qemu_opts_del(opts);
return -EINVAL;
@@ -625,7 +651,7 @@ static void curl_readv_bh_cb(void *p)
}
// No cache found, so let's start a new request
state = curl_init_state(s);
state = curl_init_state(acb->common.bs, s);
if (!state) {
acb->common.cb(acb->common.opaque, -EIO);
qemu_aio_release(acb);
@@ -640,7 +666,13 @@ static void curl_readv_bh_cb(void *p)
state->buf_start = start;
state->buf_len = acb->end + s->readahead_size;
end = MIN(start + state->buf_len, s->len) - 1;
state->orig_buf = g_malloc(state->buf_len);
state->orig_buf = g_try_malloc(state->buf_len);
if (state->buf_len && state->orig_buf == NULL) {
curl_clean_state(state);
acb->common.cb(acb->common.opaque, -ENOMEM);
qemu_aio_release(acb);
return;
}
state->acb[0] = acb;
snprintf(state->range, 127, "%zd-%zd", start, end);
@@ -678,6 +710,7 @@ static void curl_close(BlockDriverState *bs)
DPRINTF("CURL: Close\n");
curl_detach_aio_context(bs);
g_free(s->cookie);
g_free(s->url);
}

View File

@@ -284,8 +284,15 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
}
/* initialize zlib engine */
s->compressed_chunk = g_malloc(max_compressed_size + 1);
s->uncompressed_chunk = g_malloc(512 * max_sectors_per_chunk);
s->compressed_chunk = qemu_try_blockalign(bs->file,
max_compressed_size + 1);
s->uncompressed_chunk = qemu_try_blockalign(bs->file,
512 * max_sectors_per_chunk);
if (s->compressed_chunk == NULL || s->uncompressed_chunk == NULL) {
ret = -ENOMEM;
goto fail;
}
if (inflateInit(&s->zstream) != Z_OK) {
ret = -EINVAL;
goto fail;
@@ -302,8 +309,8 @@ fail:
g_free(s->lengths);
g_free(s->sectors);
g_free(s->sectorcounts);
g_free(s->compressed_chunk);
g_free(s->uncompressed_chunk);
qemu_vfree(s->compressed_chunk);
qemu_vfree(s->uncompressed_chunk);
return ret;
}
@@ -426,8 +433,8 @@ static void dmg_close(BlockDriverState *bs)
g_free(s->lengths);
g_free(s->sectors);
g_free(s->sectorcounts);
g_free(s->compressed_chunk);
g_free(s->uncompressed_chunk);
qemu_vfree(s->compressed_chunk);
qemu_vfree(s->uncompressed_chunk);
inflateEnd(&s->zstream);
}

View File

@@ -291,7 +291,7 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
BDRVGlusterState *s = bs->opaque;
int open_flags = 0;
int ret = 0;
GlusterConf *gconf = g_malloc0(sizeof(GlusterConf));
GlusterConf *gconf = g_new0(GlusterConf, 1);
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
@@ -351,12 +351,12 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
assert(state != NULL);
assert(state->bs != NULL);
state->opaque = g_malloc0(sizeof(BDRVGlusterReopenState));
state->opaque = g_new0(BDRVGlusterReopenState, 1);
reop_s = state->opaque;
qemu_gluster_parse_flags(state->flags, &open_flags);
gconf = g_malloc0(sizeof(GlusterConf));
gconf = g_new0(GlusterConf, 1);
reop_s->glfs = qemu_gluster_init(gconf, state->bs->filename, errp);
if (reop_s->glfs == NULL) {
@@ -486,7 +486,7 @@ static int qemu_gluster_create(const char *filename,
int prealloc = 0;
int64_t total_size = 0;
char *tmp = NULL;
GlusterConf *gconf = g_malloc0(sizeof(GlusterConf));
GlusterConf *gconf = g_new0(GlusterConf, 1);
glfs = qemu_gluster_init(gconf, filename, errp);
if (!glfs) {

View File

@@ -893,7 +893,10 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
nb_blocks = sector_qemu2lun(nb_sectors, iscsilun);
if (iscsilun->zeroblock == NULL) {
iscsilun->zeroblock = g_malloc0(iscsilun->block_size);
iscsilun->zeroblock = g_try_malloc0(iscsilun->block_size);
if (iscsilun->zeroblock == NULL) {
return -ENOMEM;
}
}
iscsi_co_init_iscsitask(iscsilun, &iTask);
@@ -1509,7 +1512,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
if (iscsilun->allocationmap != NULL) {
g_free(iscsilun->allocationmap);
iscsilun->allocationmap =
bitmap_new(DIV_ROUND_UP(bs->total_sectors,
bitmap_new(DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks,
iscsilun),
iscsilun->cluster_sectors));
}
@@ -1529,7 +1533,7 @@ static int iscsi_create(const char *filename, QemuOpts *opts, Error **errp)
/* Read out options */
total_size =
qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
bs->opaque = g_malloc0(sizeof(struct IscsiLun));
bs->opaque = g_new0(struct IscsiLun, 1);
iscsilun = bs->opaque;
bs_options = qdict_new();

View File

@@ -51,6 +51,12 @@ struct qemu_laio_state {
/* io queue for submit at batch */
LaioQueue io_q;
/* I/O completion processing */
QEMUBH *completion_bh;
struct io_event events[MAX_EVENTS];
int event_idx;
int event_max;
};
static inline ssize_t io_event_ret(struct io_event *ev)
@@ -86,27 +92,58 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
qemu_aio_release(laiocb);
}
/* The completion BH fetches completed I/O requests and invokes their
* callbacks.
*
* The function is somewhat tricky because it supports nested event loops, for
* example when a request callback invokes aio_poll(). In order to do this,
* the completion events array and index are kept in qemu_laio_state. The BH
* reschedules itself as long as there are completions pending so it will
* either be called again in a nested event loop or will be called after all
* events have been completed. When there are no events left to complete, the
* BH returns without rescheduling.
*/
static void qemu_laio_completion_bh(void *opaque)
{
struct qemu_laio_state *s = opaque;
/* Fetch more completion events when empty */
if (s->event_idx == s->event_max) {
do {
struct timespec ts = { 0 };
s->event_max = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS,
s->events, &ts);
} while (s->event_max == -EINTR);
s->event_idx = 0;
if (s->event_max <= 0) {
s->event_max = 0;
return; /* no more events */
}
}
/* Reschedule so nested event loops see currently pending completions */
qemu_bh_schedule(s->completion_bh);
/* Process completion events */
while (s->event_idx < s->event_max) {
struct iocb *iocb = s->events[s->event_idx].obj;
struct qemu_laiocb *laiocb =
container_of(iocb, struct qemu_laiocb, iocb);
laiocb->ret = io_event_ret(&s->events[s->event_idx]);
s->event_idx++;
qemu_laio_process_completion(s, laiocb);
}
}
static void qemu_laio_completion_cb(EventNotifier *e)
{
struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
while (event_notifier_test_and_clear(&s->e)) {
struct io_event events[MAX_EVENTS];
struct timespec ts = { 0 };
int nevents, i;
do {
nevents = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS, events, &ts);
} while (nevents == -EINTR);
for (i = 0; i < nevents; i++) {
struct iocb *iocb = events[i].obj;
struct qemu_laiocb *laiocb =
container_of(iocb, struct qemu_laiocb, iocb);
laiocb->ret = io_event_ret(&events[i]);
qemu_laio_process_completion(s, laiocb);
}
if (event_notifier_test_and_clear(&s->e)) {
qemu_bh_schedule(s->completion_bh);
}
}
@@ -272,12 +309,14 @@ void laio_detach_aio_context(void *s_, AioContext *old_context)
struct qemu_laio_state *s = s_;
aio_set_event_notifier(old_context, &s->e, NULL);
qemu_bh_delete(s->completion_bh);
}
void laio_attach_aio_context(void *s_, AioContext *new_context)
{
struct qemu_laio_state *s = s_;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
}

View File

@@ -157,7 +157,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
BlockDriverState *source = s->common.bs;
int nb_sectors, sectors_per_chunk, nb_chunks;
int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
uint64_t delay_ns;
uint64_t delay_ns = 0;
MirrorOp *op;
s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -247,8 +247,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
next_chunk += added_chunks;
if (!s->synced && s->common.speed) {
delay_ns = ratelimit_calculate_delay(&s->limit, added_sectors);
} else {
delay_ns = 0;
}
} while (delay_ns == 0 && next_sector < end);
@@ -367,7 +365,12 @@ static void coroutine_fn mirror_run(void *opaque)
}
end = s->common.len >> BDRV_SECTOR_BITS;
s->buf = qemu_blockalign(bs, s->buf_size);
s->buf = qemu_try_blockalign(bs, s->buf_size);
if (s->buf == NULL) {
ret = -ENOMEM;
goto immediate_exit;
}
sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
mirror_free_init(s);

View File

@@ -31,8 +31,10 @@
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/sockets.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qjson.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h"
#include <sys/types.h>
#include <unistd.h>
@@ -338,6 +340,37 @@ static void nbd_attach_aio_context(BlockDriverState *bs,
nbd_client_session_attach_aio_context(&s->client, new_context);
}
static void nbd_refresh_filename(BlockDriverState *bs)
{
BDRVNBDState *s = bs->opaque;
QDict *opts = qdict_new();
const char *path = qemu_opt_get(s->socket_opts, "path");
const char *host = qemu_opt_get(s->socket_opts, "host");
const char *port = qemu_opt_get(s->socket_opts, "port");
const char *export = qemu_opt_get(s->socket_opts, "export");
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("nbd")));
if (path) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd+unix:%s", path);
qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(path)));
} else if (export) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd:%s:%s/%s", host, port, export);
qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(export)));
} else {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd:%s:%s", host, port);
qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
}
bs->full_open_options = opts;
}
static BlockDriver bdrv_nbd = {
.format_name = "nbd",
.protocol_name = "nbd",
@@ -352,6 +385,7 @@ static BlockDriver bdrv_nbd = {
.bdrv_getlength = nbd_getlength,
.bdrv_detach_aio_context = nbd_detach_aio_context,
.bdrv_attach_aio_context = nbd_attach_aio_context,
.bdrv_refresh_filename = nbd_refresh_filename,
};
static BlockDriver bdrv_nbd_tcp = {
@@ -368,6 +402,7 @@ static BlockDriver bdrv_nbd_tcp = {
.bdrv_getlength = nbd_getlength,
.bdrv_detach_aio_context = nbd_detach_aio_context,
.bdrv_attach_aio_context = nbd_attach_aio_context,
.bdrv_refresh_filename = nbd_refresh_filename,
};
static BlockDriver bdrv_nbd_unix = {
@@ -384,6 +419,7 @@ static BlockDriver bdrv_nbd_unix = {
.bdrv_getlength = nbd_getlength,
.bdrv_detach_aio_context = nbd_detach_aio_context,
.bdrv_attach_aio_context = nbd_attach_aio_context,
.bdrv_refresh_filename = nbd_refresh_filename,
};
static void bdrv_nbd_init(void)

View File

@@ -172,7 +172,11 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
nfs_co_init_task(client, &task);
buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
if (nb_sectors && buf == NULL) {
return -ENOMEM;
}
qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
if (nfs_pwrite_async(client->context, client->fh,
@@ -389,23 +393,27 @@ static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return -EINVAL;
ret = -EINVAL;
goto out;
}
ret = nfs_client_open(client, qemu_opt_get(opts, "filename"),
(flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY,
errp);
if (ret < 0) {
return ret;
goto out;
}
bs->total_sectors = ret;
return 0;
ret = 0;
out:
qemu_opts_del(opts);
return ret;
}
static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
{
int ret = 0;
int64_t total_size = 0;
NFSClient *client = g_malloc0(sizeof(NFSClient));
NFSClient *client = g_new0(NFSClient, 1);
client->aio_context = qemu_get_aio_context();

View File

@@ -30,6 +30,7 @@
/**************************************************************/
#define HEADER_MAGIC "WithoutFreeSpace"
#define HEADER_MAGIC2 "WithouFreSpacExt"
#define HEADER_VERSION 2
#define HEADER_SIZE 64
@@ -41,8 +42,10 @@ struct parallels_header {
uint32_t cylinders;
uint32_t tracks;
uint32_t catalog_entries;
uint32_t nb_sectors;
char padding[24];
uint64_t nb_sectors;
uint32_t inuse;
uint32_t data_off;
char padding[12];
} QEMU_PACKED;
typedef struct BDRVParallelsState {
@@ -52,6 +55,8 @@ typedef struct BDRVParallelsState {
unsigned int catalog_size;
unsigned int tracks;
unsigned int off_multiplier;
} BDRVParallelsState;
static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)
@@ -59,11 +64,12 @@ static int parallels_probe(const uint8_t *buf, int buf_size, const char *filenam
const struct parallels_header *ph = (const void *)buf;
if (buf_size < HEADER_SIZE)
return 0;
return 0;
if (!memcmp(ph->magic, HEADER_MAGIC, 16) &&
(le32_to_cpu(ph->version) == HEADER_VERSION))
return 100;
if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
(le32_to_cpu(ph->version) == HEADER_VERSION))
return 100;
return 0;
}
@@ -83,14 +89,19 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
if (memcmp(ph.magic, HEADER_MAGIC, 16) ||
(le32_to_cpu(ph.version) != HEADER_VERSION)) {
error_setg(errp, "Image not in Parallels format");
ret = -EINVAL;
goto fail;
}
bs->total_sectors = le64_to_cpu(ph.nb_sectors);
bs->total_sectors = le32_to_cpu(ph.nb_sectors);
if (le32_to_cpu(ph.version) != HEADER_VERSION) {
goto fail_format;
}
if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
s->off_multiplier = 1;
bs->total_sectors = 0xffffffff & bs->total_sectors;
} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) {
s->off_multiplier = le32_to_cpu(ph.tracks);
} else {
goto fail_format;
}
s->tracks = le32_to_cpu(ph.tracks);
if (s->tracks == 0) {
@@ -98,6 +109,11 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
ret = -EINVAL;
goto fail;
}
if (s->tracks > INT32_MAX/513) {
error_setg(errp, "Invalid image: Too big cluster");
ret = -EFBIG;
goto fail;
}
s->catalog_size = le32_to_cpu(ph.catalog_entries);
if (s->catalog_size > INT_MAX / 4) {
@@ -105,7 +121,11 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
ret = -EFBIG;
goto fail;
}
s->catalog_bitmap = g_malloc(s->catalog_size * 4);
s->catalog_bitmap = g_try_new(uint32_t, s->catalog_size);
if (s->catalog_size && s->catalog_bitmap == NULL) {
ret = -ENOMEM;
goto fail;
}
ret = bdrv_pread(bs->file, 64, s->catalog_bitmap, s->catalog_size * 4);
if (ret < 0) {
@@ -113,11 +133,14 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
}
for (i = 0; i < s->catalog_size; i++)
le32_to_cpus(&s->catalog_bitmap[i]);
le32_to_cpus(&s->catalog_bitmap[i]);
qemu_co_mutex_init(&s->lock);
return 0;
fail_format:
error_setg(errp, "Image not in Parallels format");
ret = -EINVAL;
fail:
g_free(s->catalog_bitmap);
return ret;
@@ -133,8 +156,9 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
/* not allocated */
if ((index > s->catalog_size) || (s->catalog_bitmap[index] == 0))
return -1;
return (uint64_t)(s->catalog_bitmap[index] + offset) * 512;
return -1;
return
((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 512;
}
static int parallels_read(BlockDriverState *bs, int64_t sector_num,

View File

@@ -28,6 +28,13 @@
#include "qapi-visit.h"
#include "qapi/qmp-output-visitor.h"
#include "qapi/qmp/types.h"
#ifdef __linux__
#include <linux/fs.h>
#include <sys/ioctl.h>
#ifndef FS_NOCOW_FL
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
#endif
#endif
BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
{
@@ -165,19 +172,28 @@ void bdrv_query_image_info(BlockDriverState *bs,
ImageInfo **p_info,
Error **errp)
{
uint64_t total_sectors;
int64_t size;
const char *backing_filename;
char backing_filename2[1024];
BlockDriverInfo bdi;
int ret;
Error *err = NULL;
ImageInfo *info = g_new0(ImageInfo, 1);
ImageInfo *info;
#ifdef __linux__
int fd, attr;
#endif
bdrv_get_geometry(bs, &total_sectors);
size = bdrv_getlength(bs);
if (size < 0) {
error_setg_errno(errp, -size, "Can't get size of device '%s'",
bdrv_get_device_name(bs));
return;
}
info = g_new0(ImageInfo, 1);
info->filename = g_strdup(bs->filename);
info->format = g_strdup(bdrv_get_format_name(bs));
info->virtual_size = total_sectors * 512;
info->virtual_size = size;
info->actual_size = bdrv_get_allocated_file_size(bs);
info->has_actual_size = info->actual_size >= 0;
if (bdrv_is_encrypted(bs)) {
@@ -195,6 +211,18 @@ void bdrv_query_image_info(BlockDriverState *bs,
info->format_specific = bdrv_get_specific_info(bs);
info->has_format_specific = info->format_specific != NULL;
#ifdef __linux__
/* get NOCOW info */
fd = qemu_open(bs->filename, O_RDONLY | O_NONBLOCK);
if (fd >= 0) {
if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0 && (attr & FS_NOCOW_FL)) {
info->has_nocow = true;
info->nocow = true;
}
qemu_close(fd);
}
#endif
backing_filename = bs->backing_file;
if (backing_filename[0] != '\0') {
info->backing_filename = g_strdup(backing_filename);
@@ -625,4 +653,8 @@ void bdrv_image_info_dump(fprintf_function func_fprintf, void *f,
func_fprintf(f, "Format specific information:\n");
bdrv_image_info_specific_dump(func_fprintf, f, info->format_specific);
}
if (info->has_nocow && info->nocow) {
func_fprintf(f, "NOCOW flag: set\n");
}
}

View File

@@ -182,7 +182,12 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
}
s->l1_table_offset = header.l1_table_offset;
s->l1_table = g_malloc(s->l1_size * sizeof(uint64_t));
s->l1_table = g_try_new(uint64_t, s->l1_size);
if (s->l1_table == NULL) {
error_setg(errp, "Could not allocate memory for L1 table");
ret = -ENOMEM;
goto fail;
}
ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_table,
s->l1_size * sizeof(uint64_t));
@@ -193,8 +198,16 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
for(i = 0;i < s->l1_size; i++) {
be64_to_cpus(&s->l1_table[i]);
}
/* alloc L2 cache */
s->l2_cache = g_malloc(s->l2_size * L2_CACHE_SIZE * sizeof(uint64_t));
/* alloc L2 cache (max. 64k * 16 * 8 = 8 MB) */
s->l2_cache =
qemu_try_blockalign(bs->file,
s->l2_size * L2_CACHE_SIZE * sizeof(uint64_t));
if (s->l2_cache == NULL) {
error_setg(errp, "Could not allocate L2 table cache");
ret = -ENOMEM;
goto fail;
}
s->cluster_cache = g_malloc(s->cluster_size);
s->cluster_data = g_malloc(s->cluster_size);
s->cluster_cache_offset = -1;
@@ -226,7 +239,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
fail:
g_free(s->l1_table);
g_free(s->l2_cache);
qemu_vfree(s->l2_cache);
g_free(s->cluster_cache);
g_free(s->cluster_data);
return ret;
@@ -517,7 +530,10 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, int64_t sector_num,
void *orig_buf;
if (qiov->niov > 1) {
buf = orig_buf = qemu_blockalign(bs, qiov->size);
buf = orig_buf = qemu_try_blockalign(bs, qiov->size);
if (buf == NULL) {
return -ENOMEM;
}
} else {
orig_buf = NULL;
buf = (uint8_t *)qiov->iov->iov_base;
@@ -619,7 +635,10 @@ static coroutine_fn int qcow_co_writev(BlockDriverState *bs, int64_t sector_num,
s->cluster_cache_offset = -1; /* disable compressed cache */
if (qiov->niov > 1) {
buf = orig_buf = qemu_blockalign(bs, qiov->size);
buf = orig_buf = qemu_try_blockalign(bs, qiov->size);
if (buf == NULL) {
return -ENOMEM;
}
qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
} else {
orig_buf = NULL;
@@ -685,7 +704,7 @@ static void qcow_close(BlockDriverState *bs)
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
g_free(s->l2_cache);
qemu_vfree(s->l2_cache);
g_free(s->cluster_cache);
g_free(s->cluster_data);

View File

@@ -48,15 +48,31 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
Qcow2Cache *c;
int i;
c = g_malloc0(sizeof(*c));
c = g_new0(Qcow2Cache, 1);
c->size = num_tables;
c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
c->entries = g_try_new0(Qcow2CachedTable, num_tables);
if (!c->entries) {
goto fail;
}
for (i = 0; i < c->size; i++) {
c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
if (c->entries[i].table == NULL) {
goto fail;
}
}
return c;
fail:
if (c->entries) {
for (i = 0; i < c->size; i++) {
qemu_vfree(c->entries[i].table);
}
}
g_free(c->entries);
g_free(c);
return NULL;
}
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)

View File

@@ -72,14 +72,20 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
#endif
new_l1_size2 = sizeof(uint64_t) * new_l1_size;
new_l1_table = g_malloc0(align_offset(new_l1_size2, 512));
new_l1_table = qemu_try_blockalign(bs->file,
align_offset(new_l1_size2, 512));
if (new_l1_table == NULL) {
return -ENOMEM;
}
memset(new_l1_table, 0, align_offset(new_l1_size2, 512));
memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
/* write new table (align to cluster) */
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
new_l1_table_offset = qcow2_alloc_clusters(bs, new_l1_size2);
if (new_l1_table_offset < 0) {
g_free(new_l1_table);
qemu_vfree(new_l1_table);
return new_l1_table_offset;
}
@@ -113,7 +119,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
if (ret < 0) {
goto fail;
}
g_free(s->l1_table);
qemu_vfree(s->l1_table);
old_l1_table_offset = s->l1_table_offset;
s->l1_table_offset = new_l1_table_offset;
s->l1_table = new_l1_table;
@@ -123,7 +129,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
QCOW2_DISCARD_OTHER);
return 0;
fail:
g_free(new_l1_table);
qemu_vfree(new_l1_table);
qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2,
QCOW2_DISCARD_OTHER);
return ret;
@@ -372,7 +378,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
}
iov.iov_len = n * BDRV_SECTOR_SIZE;
iov.iov_base = qemu_blockalign(bs, iov.iov_len);
iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
if (iov.iov_base == NULL) {
return -ENOMEM;
}
qemu_iovec_init_external(&qiov, &iov, 1);
@@ -702,7 +711,11 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
trace_qcow2_cluster_link_l2(qemu_coroutine_self(), m->nb_clusters);
assert(m->nb_clusters > 0);
old_cluster = g_malloc(m->nb_clusters * sizeof(uint64_t));
old_cluster = g_try_new(uint64_t, m->nb_clusters);
if (old_cluster == NULL) {
ret = -ENOMEM;
goto err;
}
/* copy content of unmodified sectors */
ret = perform_cow(bs, m, &m->cow_start);
@@ -1106,6 +1119,17 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
return 0;
}
/* !*host_offset would overwrite the image header and is reserved for "no
* host offset preferred". If 0 was a valid host offset, it'd trigger the
* following overlap check; do that now to avoid having an invalid value in
* *host_offset. */
if (!alloc_cluster_offset) {
ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
nb_clusters * s->cluster_size);
assert(ret < 0);
goto fail;
}
/*
* Save info needed for meta data update.
*
@@ -1562,7 +1586,10 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
if (!is_active_l1) {
/* inactive L2 tables require a buffer to be stored in when loading
* them from disk */
l2_table = qemu_blockalign(bs, s->cluster_size);
l2_table = qemu_try_blockalign(bs->file, s->cluster_size);
if (l2_table == NULL) {
return -ENOMEM;
}
}
for (i = 0; i < l1_size; i++) {
@@ -1740,7 +1767,11 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs)
nb_clusters = size_to_clusters(s, bs->file->total_sectors *
BDRV_SECTOR_SIZE);
expanded_clusters = g_malloc0((nb_clusters + 7) / 8);
expanded_clusters = g_try_malloc0((nb_clusters + 7) / 8);
if (expanded_clusters == NULL) {
ret = -ENOMEM;
goto fail;
}
ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size,
&expanded_clusters, &nb_clusters);

View File

@@ -46,19 +46,25 @@ int qcow2_refcount_init(BlockDriverState *bs)
assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
s->refcount_table = g_malloc(refcount_table_size2);
s->refcount_table = g_try_malloc(refcount_table_size2);
if (s->refcount_table_size > 0) {
if (s->refcount_table == NULL) {
ret = -ENOMEM;
goto fail;
}
BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
ret = bdrv_pread(bs->file, s->refcount_table_offset,
s->refcount_table, refcount_table_size2);
if (ret != refcount_table_size2)
if (ret < 0) {
goto fail;
}
for(i = 0; i < s->refcount_table_size; i++)
be64_to_cpus(&s->refcount_table[i]);
}
return 0;
fail:
return -ENOMEM;
return ret;
}
void qcow2_refcount_close(BlockDriverState *bs)
@@ -344,8 +350,14 @@ static int alloc_refcount_block(BlockDriverState *bs,
uint64_t meta_offset = (blocks_used * refcount_block_clusters) *
s->cluster_size;
uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
uint16_t *new_blocks = g_malloc0(blocks_clusters * s->cluster_size);
uint64_t *new_table = g_malloc0(table_size * sizeof(uint64_t));
uint64_t *new_table = g_try_new0(uint64_t, table_size);
uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
assert(table_size > 0 && blocks_clusters > 0);
if (new_table == NULL || new_blocks == NULL) {
ret = -ENOMEM;
goto fail_table;
}
/* Fill the new refcount table */
memcpy(new_table, s->refcount_table,
@@ -369,6 +381,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
ret = bdrv_pwrite_sync(bs->file, meta_offset, new_blocks,
blocks_clusters * s->cluster_size);
g_free(new_blocks);
new_blocks = NULL;
if (ret < 0) {
goto fail_table;
}
@@ -424,6 +437,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
return -EAGAIN;
fail_table:
g_free(new_blocks);
g_free(new_table);
fail_block:
if (*refcount_block != NULL) {
@@ -847,7 +861,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
int64_t l1_table_offset, int l1_size, int addend)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2, l1_allocated;
uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2;
bool l1_allocated = false;
int64_t old_offset, old_l2_offset;
int i, j, l1_modified = 0, nb_csectors, refcount;
int ret;
@@ -862,8 +877,12 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
* l1_table_offset when it is the current s->l1_table_offset! Be careful
* when changing this! */
if (l1_table_offset != s->l1_table_offset) {
l1_table = g_malloc0(align_offset(l1_size2, 512));
l1_allocated = 1;
l1_table = g_try_malloc0(align_offset(l1_size2, 512));
if (l1_size2 && l1_table == NULL) {
ret = -ENOMEM;
goto fail;
}
l1_allocated = true;
ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size2);
if (ret < 0) {
@@ -875,7 +894,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
} else {
assert(l1_size == s->l1_size);
l1_table = s->l1_table;
l1_allocated = 0;
l1_allocated = false;
}
for(i = 0; i < l1_size; i++) {
@@ -1197,7 +1216,11 @@ static int check_refcounts_l1(BlockDriverState *bs,
if (l1_size2 == 0) {
l1_table = NULL;
} else {
l1_table = g_malloc(l1_size2);
l1_table = g_try_malloc(l1_size2);
if (l1_table == NULL) {
ret = -ENOMEM;
goto fail;
}
if (bdrv_pread(bs->file, l1_table_offset,
l1_table, l1_size2) != l1_size2)
goto fail;
@@ -1501,7 +1524,11 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
return -EFBIG;
}
refcount_table = g_malloc0(nb_clusters * sizeof(uint16_t));
refcount_table = g_try_new0(uint16_t, nb_clusters);
if (nb_clusters && refcount_table == NULL) {
res->check_errors++;
return -ENOMEM;
}
res->bfi.total_clusters =
size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
@@ -1578,8 +1605,8 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
/* increase refcount_table size if necessary */
int old_nb_clusters = nb_clusters;
nb_clusters = (new_offset >> s->cluster_bits) + 1;
refcount_table = g_realloc(refcount_table,
nb_clusters * sizeof(uint16_t));
refcount_table = g_renew(uint16_t, refcount_table,
nb_clusters);
memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
- old_nb_clusters) * sizeof(uint16_t));
}
@@ -1753,9 +1780,13 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
uint64_t l1_ofs = s->snapshots[i].l1_table_offset;
uint32_t l1_sz = s->snapshots[i].l1_size;
uint64_t l1_sz2 = l1_sz * sizeof(uint64_t);
uint64_t *l1 = g_malloc(l1_sz2);
uint64_t *l1 = g_try_malloc(l1_sz2);
int ret;
if (l1_sz2 && l1 == NULL) {
return -ENOMEM;
}
ret = bdrv_pread(bs->file, l1_ofs, l1, l1_sz2);
if (ret < 0) {
g_free(l1);

View File

@@ -58,7 +58,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
}
offset = s->snapshots_offset;
s->snapshots = g_malloc0(s->nb_snapshots * sizeof(QCowSnapshot));
s->snapshots = g_new0(QCowSnapshot, s->nb_snapshots);
for(i = 0; i < s->nb_snapshots; i++) {
/* Read statically sized part of the snapshot header */
@@ -381,7 +381,12 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
sn->l1_table_offset = l1_table_offset;
sn->l1_size = s->l1_size;
l1_table = g_malloc(s->l1_size * sizeof(uint64_t));
l1_table = g_try_new(uint64_t, s->l1_size);
if (s->l1_size && l1_table == NULL) {
ret = -ENOMEM;
goto fail;
}
for(i = 0; i < s->l1_size; i++) {
l1_table[i] = cpu_to_be64(s->l1_table[i]);
}
@@ -412,7 +417,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
}
/* Append the new snapshot to the snapshot list */
new_snapshot_list = g_malloc((s->nb_snapshots + 1) * sizeof(QCowSnapshot));
new_snapshot_list = g_new(QCowSnapshot, s->nb_snapshots + 1);
if (s->snapshots) {
memcpy(new_snapshot_list, s->snapshots,
s->nb_snapshots * sizeof(QCowSnapshot));
@@ -499,7 +504,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
* Decrease the refcount referenced by the old one only when the L1
* table is overwritten.
*/
sn_l1_table = g_malloc0(cur_l1_bytes);
sn_l1_table = g_try_malloc0(cur_l1_bytes);
if (cur_l1_bytes && sn_l1_table == NULL) {
ret = -ENOMEM;
goto fail;
}
ret = bdrv_pread(bs->file, sn->l1_table_offset, sn_l1_table, sn_l1_bytes);
if (ret < 0) {
@@ -652,7 +661,7 @@ int qcow2_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
return s->nb_snapshots;
}
sn_tab = g_malloc0(s->nb_snapshots * sizeof(QEMUSnapshotInfo));
sn_tab = g_new0(QEMUSnapshotInfo, s->nb_snapshots);
for(i = 0; i < s->nb_snapshots; i++) {
sn_info = sn_tab + i;
sn = s->snapshots + i;
@@ -698,17 +707,21 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
return -EFBIG;
}
new_l1_bytes = sn->l1_size * sizeof(uint64_t);
new_l1_table = g_malloc0(align_offset(new_l1_bytes, 512));
new_l1_table = qemu_try_blockalign(bs->file,
align_offset(new_l1_bytes, 512));
if (new_l1_table == NULL) {
return -ENOMEM;
}
ret = bdrv_pread(bs->file, sn->l1_table_offset, new_l1_table, new_l1_bytes);
if (ret < 0) {
error_setg(errp, "Failed to read l1 table for snapshot");
g_free(new_l1_table);
qemu_vfree(new_l1_table);
return ret;
}
/* Switch the L1 table */
g_free(s->l1_table);
qemu_vfree(s->l1_table);
s->l1_size = sn->l1_size;
s->l1_table_offset = sn->l1_table_offset;

View File

@@ -442,6 +442,22 @@ static QemuOptsList qcow2_runtime_opts = {
.type = QEMU_OPT_BOOL,
.help = "Check for unintended writes into an inactive L2 table",
},
{
.name = QCOW2_OPT_CACHE_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Maximum combined metadata (L2 tables and refcount blocks) "
"cache size",
},
{
.name = QCOW2_OPT_L2_CACHE_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Maximum L2 table cache size",
},
{
.name = QCOW2_OPT_REFCOUNT_CACHE_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Maximum refcount block cache size",
},
{ /* end of list */ }
},
};
@@ -457,6 +473,61 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
[QCOW2_OL_INACTIVE_L2_BITNR] = QCOW2_OPT_OVERLAP_INACTIVE_L2,
};
static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
uint64_t *refcount_cache_size, Error **errp)
{
uint64_t combined_cache_size;
bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE);
combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
*refcount_cache_size = qemu_opt_get_size(opts,
QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
if (combined_cache_size_set) {
if (l2_cache_size_set && refcount_cache_size_set) {
error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
" and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
"the same time");
return;
} else if (*l2_cache_size > combined_cache_size) {
error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
return;
} else if (*refcount_cache_size > combined_cache_size) {
error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
return;
}
if (l2_cache_size_set) {
*refcount_cache_size = combined_cache_size - *l2_cache_size;
} else if (refcount_cache_size_set) {
*l2_cache_size = combined_cache_size - *refcount_cache_size;
} else {
*refcount_cache_size = combined_cache_size
/ (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
*l2_cache_size = combined_cache_size - *refcount_cache_size;
}
} else {
if (!l2_cache_size_set && !refcount_cache_size_set) {
*l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE;
*refcount_cache_size = *l2_cache_size
/ DEFAULT_L2_REFCOUNT_SIZE_RATIO;
} else if (!l2_cache_size_set) {
*l2_cache_size = *refcount_cache_size
* DEFAULT_L2_REFCOUNT_SIZE_RATIO;
} else if (!refcount_cache_size_set) {
*refcount_cache_size = *l2_cache_size
/ DEFAULT_L2_REFCOUNT_SIZE_RATIO;
}
}
}
static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
@@ -470,6 +541,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
uint64_t l1_vm_state_index;
const char *opt_overlap_check;
int overlap_check_template = 0;
uint64_t l2_cache_size, refcount_cache_size;
ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
if (ret < 0) {
@@ -688,8 +760,13 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
if (s->l1_size > 0) {
s->l1_table = g_malloc0(
s->l1_table = qemu_try_blockalign(bs->file,
align_offset(s->l1_size * sizeof(uint64_t), 512));
if (s->l1_table == NULL) {
error_setg(errp, "Could not allocate L1 table");
ret = -ENOMEM;
goto fail;
}
ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_table,
s->l1_size * sizeof(uint64_t));
if (ret < 0) {
@@ -701,14 +778,61 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
}
}
/* get L2 table/refcount block cache size from command line options */
opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
read_cache_sizes(opts, &l2_cache_size, &refcount_cache_size, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
l2_cache_size /= s->cluster_size;
if (l2_cache_size < MIN_L2_CACHE_SIZE) {
l2_cache_size = MIN_L2_CACHE_SIZE;
}
if (l2_cache_size > INT_MAX) {
error_setg(errp, "L2 cache size too big");
ret = -EINVAL;
goto fail;
}
refcount_cache_size /= s->cluster_size;
if (refcount_cache_size < MIN_REFCOUNT_CACHE_SIZE) {
refcount_cache_size = MIN_REFCOUNT_CACHE_SIZE;
}
if (refcount_cache_size > INT_MAX) {
error_setg(errp, "Refcount cache size too big");
ret = -EINVAL;
goto fail;
}
/* alloc L2 table/refcount block cache */
s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
s->l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
s->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
if (s->l2_table_cache == NULL || s->refcount_block_cache == NULL) {
error_setg(errp, "Could not allocate metadata caches");
ret = -ENOMEM;
goto fail;
}
s->cluster_cache = g_malloc(s->cluster_size);
/* one more sector for decompressed data alignment */
s->cluster_data = qemu_blockalign(bs, QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
+ 512);
s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
* s->cluster_size + 512);
if (s->cluster_data == NULL) {
error_setg(errp, "Could not allocate temporary cluster buffer");
ret = -ENOMEM;
goto fail;
}
s->cluster_cache_offset = -1;
s->flags = flags;
@@ -782,14 +906,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
}
/* Enable lazy_refcounts according to image and command line options */
opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
s->use_lazy_refcounts = qemu_opt_get_bool(opts, QCOW2_OPT_LAZY_REFCOUNTS,
(s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS));
@@ -852,7 +968,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
cleanup_unknown_header_ext(bs);
qcow2_free_snapshots(bs);
qcow2_refcount_close(bs);
g_free(s->l1_table);
qemu_vfree(s->l1_table);
/* else pre-write overlap checks in cache_destroy may crash */
s->l1_table = NULL;
if (s->l2_table_cache) {
@@ -1082,7 +1198,12 @@ static coroutine_fn int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num,
*/
if (!cluster_data) {
cluster_data =
qemu_blockalign(bs, QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
* s->cluster_size);
if (cluster_data == NULL) {
ret = -ENOMEM;
goto fail;
}
}
assert(cur_nr_sectors <=
@@ -1182,8 +1303,13 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
if (s->crypt_method) {
if (!cluster_data) {
cluster_data = qemu_blockalign(bs, QCOW_MAX_CRYPT_CLUSTERS *
s->cluster_size);
cluster_data = qemu_try_blockalign(bs->file,
QCOW_MAX_CRYPT_CLUSTERS
* s->cluster_size);
if (cluster_data == NULL) {
ret = -ENOMEM;
goto fail;
}
}
assert(hd_qiov.size <=
@@ -1270,7 +1396,7 @@ fail:
static void qcow2_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
qemu_vfree(s->l1_table);
/* else pre-write overlap checks in cache_destroy may crash */
s->l1_table = NULL;
@@ -1557,7 +1683,7 @@ static int preallocate(BlockDriverState *bs)
int ret;
QCowL2Meta *meta;
nb_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
nb_sectors = bdrv_nb_sectors(bs);
offset = 0;
while (nb_sectors) {
@@ -1947,7 +2073,6 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
/* align end of file to a sector boundary to ease reading with
sector based I/Os */
cluster_offset = bdrv_getlength(bs->file);
cluster_offset = (cluster_offset + 511) & ~511;
bdrv_truncate(bs->file, cluster_offset);
return 0;
}

View File

@@ -64,10 +64,16 @@
#define MIN_CLUSTER_BITS 9
#define MAX_CLUSTER_BITS 21
#define L2_CACHE_SIZE 16
#define MIN_L2_CACHE_SIZE 1 /* cluster */
/* Must be at least 4 to cover all cases of refcount table growth */
#define REFCOUNT_CACHE_SIZE 4
#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
/* The refblock cache needs only a fourth of the L2 cache size to cover as many
* clusters */
#define DEFAULT_L2_REFCOUNT_SIZE_RATIO 4
#define DEFAULT_CLUSTER_SIZE 65536
@@ -85,6 +91,9 @@
#define QCOW2_OPT_OVERLAP_SNAPSHOT_TABLE "overlap-check.snapshot-table"
#define QCOW2_OPT_OVERLAP_INACTIVE_L1 "overlap-check.inactive-l1"
#define QCOW2_OPT_OVERLAP_INACTIVE_L2 "overlap-check.inactive-l2"
#define QCOW2_OPT_CACHE_SIZE "cache-size"
#define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
#define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
typedef struct QCowHeader {
uint32_t magic;

View File

@@ -227,8 +227,10 @@ int qed_check(BDRVQEDState *s, BdrvCheckResult *result, bool fix)
};
int ret;
check.used_clusters = g_malloc0(((check.nclusters + 31) / 32) *
sizeof(check.used_clusters[0]));
check.used_clusters = g_try_new0(uint32_t, (check.nclusters + 31) / 32);
if (check.nclusters && check.used_clusters == NULL) {
return -ENOMEM;
}
check.result->bfi.total_clusters =
(s->header.image_size + s->header.cluster_size - 1) /

View File

@@ -1240,7 +1240,11 @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
struct iovec *iov = acb->qiov->iov;
if (!iov->iov_base) {
iov->iov_base = qemu_blockalign(acb->common.bs, iov->iov_len);
iov->iov_base = qemu_try_blockalign(acb->common.bs, iov->iov_len);
if (iov->iov_base == NULL) {
qed_aio_complete(acb, -ENOMEM);
return;
}
memset(iov->iov_base, 0, iov->iov_len);
}
}

View File

@@ -16,7 +16,12 @@
#include <gnutls/gnutls.h>
#include <gnutls/crypto.h>
#include "block/block_int.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qjson.h"
#include "qapi/qmp/qlist.h"
#include "qapi/qmp/qstring.h"
#include "qapi-event.h"
#define HASH_LENGTH 32
@@ -24,6 +29,7 @@
#define QUORUM_OPT_VOTE_THRESHOLD "vote-threshold"
#define QUORUM_OPT_BLKVERIFY "blkverify"
#define QUORUM_OPT_REWRITE "rewrite-corrupted"
#define QUORUM_OPT_READ_PATTERN "read-pattern"
/* This union holds a vote hash value */
typedef union QuorumVoteValue {
@@ -74,6 +80,8 @@ typedef struct BDRVQuorumState {
bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
* block if Quorum is reached.
*/
QuorumReadPattern read_pattern;
} BDRVQuorumState;
typedef struct QuorumAIOCB QuorumAIOCB;
@@ -117,6 +125,7 @@ struct QuorumAIOCB {
bool is_read;
int vote_ret;
int child_iter; /* which child to read in fifo pattern */
};
static bool quorum_vote(QuorumAIOCB *acb);
@@ -143,7 +152,6 @@ static AIOCBInfo quorum_aiocb_info = {
static void quorum_aio_finalize(QuorumAIOCB *acb)
{
BDRVQuorumState *s = acb->common.bs->opaque;
int i, ret = 0;
if (acb->vote_ret) {
@@ -153,7 +161,8 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
acb->common.cb(acb->common.opaque, ret);
if (acb->is_read) {
for (i = 0; i < s->num_children; i++) {
/* on the quorum case acb->child_iter == s->num_children - 1 */
for (i = 0; i <= acb->child_iter; i++) {
qemu_vfree(acb->qcrs[i].buf);
qemu_iovec_destroy(&acb->qcrs[i].qiov);
}
@@ -256,6 +265,21 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
quorum_aio_finalize(acb);
}
static BlockDriverAIOCB *read_fifo_child(QuorumAIOCB *acb);
static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
{
int i;
assert(dest->niov == source->niov);
assert(dest->size == source->size);
for (i = 0; i < source->niov; i++) {
assert(dest->iov[i].iov_len == source->iov[i].iov_len);
memcpy(dest->iov[i].iov_base,
source->iov[i].iov_base,
source->iov[i].iov_len);
}
}
static void quorum_aio_cb(void *opaque, int ret)
{
QuorumChildRequest *sacb = opaque;
@@ -263,6 +287,21 @@ static void quorum_aio_cb(void *opaque, int ret)
BDRVQuorumState *s = acb->common.bs->opaque;
bool rewrite = false;
if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
/* We try to read next child in FIFO order if we fail to read */
if (ret < 0 && ++acb->child_iter < s->num_children) {
read_fifo_child(acb);
return;
}
if (ret == 0) {
quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->child_iter].qiov);
}
acb->vote_ret = ret;
quorum_aio_finalize(acb);
return;
}
sacb->ret = ret;
acb->count++;
if (ret == 0) {
@@ -343,19 +382,6 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
return count;
}
static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
{
int i;
assert(dest->niov == source->niov);
assert(dest->size == source->size);
for (i = 0; i < source->niov; i++) {
assert(dest->iov[i].iov_len == source->iov[i].iov_len);
memcpy(dest->iov[i].iov_base,
source->iov[i].iov_base,
source->iov[i].iov_len);
}
}
static void quorum_count_vote(QuorumVotes *votes,
QuorumVoteValue *value,
int index)
@@ -615,32 +641,60 @@ free_exit:
return rewrite;
}
static BlockDriverAIOCB *read_quorum_children(QuorumAIOCB *acb)
{
BDRVQuorumState *s = acb->common.bs->opaque;
int i;
for (i = 0; i < s->num_children; i++) {
acb->qcrs[i].buf = qemu_blockalign(s->bs[i], acb->qiov->size);
qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
}
for (i = 0; i < s->num_children; i++) {
bdrv_aio_readv(s->bs[i], acb->sector_num, &acb->qcrs[i].qiov,
acb->nb_sectors, quorum_aio_cb, &acb->qcrs[i]);
}
return &acb->common;
}
static BlockDriverAIOCB *read_fifo_child(QuorumAIOCB *acb)
{
BDRVQuorumState *s = acb->common.bs->opaque;
acb->qcrs[acb->child_iter].buf = qemu_blockalign(s->bs[acb->child_iter],
acb->qiov->size);
qemu_iovec_init(&acb->qcrs[acb->child_iter].qiov, acb->qiov->niov);
qemu_iovec_clone(&acb->qcrs[acb->child_iter].qiov, acb->qiov,
acb->qcrs[acb->child_iter].buf);
bdrv_aio_readv(s->bs[acb->child_iter], acb->sector_num,
&acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
quorum_aio_cb, &acb->qcrs[acb->child_iter]);
return &acb->common;
}
static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
int nb_sectors,
BlockDriverCompletionFunc *cb,
void *opaque)
int64_t sector_num,
QEMUIOVector *qiov,
int nb_sectors,
BlockDriverCompletionFunc *cb,
void *opaque)
{
BDRVQuorumState *s = bs->opaque;
QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
nb_sectors, cb, opaque);
int i;
acb->is_read = true;
for (i = 0; i < s->num_children; i++) {
acb->qcrs[i].buf = qemu_blockalign(s->bs[i], qiov->size);
qemu_iovec_init(&acb->qcrs[i].qiov, qiov->niov);
qemu_iovec_clone(&acb->qcrs[i].qiov, qiov, acb->qcrs[i].buf);
if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
acb->child_iter = s->num_children - 1;
return read_quorum_children(acb);
}
for (i = 0; i < s->num_children; i++) {
bdrv_aio_readv(s->bs[i], sector_num, &acb->qcrs[i].qiov, nb_sectors,
quorum_aio_cb, &acb->qcrs[i]);
}
return &acb->common;
acb->child_iter = 0;
return read_fifo_child(acb);
}
static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
@@ -782,16 +836,39 @@ static QemuOptsList quorum_runtime_opts = {
.type = QEMU_OPT_BOOL,
.help = "Rewrite corrupted block on read quorum",
},
{
.name = QUORUM_OPT_READ_PATTERN,
.type = QEMU_OPT_STRING,
.help = "Allowed pattern: quorum, fifo. Quorum is default",
},
{ /* end of list */ }
},
};
static int parse_read_pattern(const char *opt)
{
int i;
if (!opt) {
/* Set quorum as default */
return QUORUM_READ_PATTERN_QUORUM;
}
for (i = 0; i < QUORUM_READ_PATTERN_MAX; i++) {
if (!strcmp(opt, QuorumReadPattern_lookup[i])) {
return i;
}
}
return -EINVAL;
}
static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVQuorumState *s = bs->opaque;
Error *local_err = NULL;
QemuOpts *opts;
QemuOpts *opts = NULL;
bool *opened;
QDict *sub = NULL;
QList *list = NULL;
@@ -827,28 +904,37 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
}
s->threshold = qemu_opt_get_number(opts, QUORUM_OPT_VOTE_THRESHOLD, 0);
/* and validate it against s->num_children */
ret = quorum_valid_threshold(s->threshold, s->num_children, &local_err);
ret = parse_read_pattern(qemu_opt_get(opts, QUORUM_OPT_READ_PATTERN));
if (ret < 0) {
error_setg(&local_err, "Please set read-pattern as fifo or quorum");
goto exit;
}
s->read_pattern = ret;
/* is the driver in blkverify mode */
if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false) &&
s->num_children == 2 && s->threshold == 2) {
s->is_blkverify = true;
} else if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false)) {
fprintf(stderr, "blkverify mode is set by setting blkverify=on "
"and using two files with vote_threshold=2\n");
}
if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
/* and validate it against s->num_children */
ret = quorum_valid_threshold(s->threshold, s->num_children, &local_err);
if (ret < 0) {
goto exit;
}
s->rewrite_corrupted = qemu_opt_get_bool(opts, QUORUM_OPT_REWRITE, false);
if (s->rewrite_corrupted && s->is_blkverify) {
error_setg(&local_err,
"rewrite-corrupted=on cannot be used with blkverify=on");
ret = -EINVAL;
goto exit;
/* is the driver in blkverify mode */
if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false) &&
s->num_children == 2 && s->threshold == 2) {
s->is_blkverify = true;
} else if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false)) {
fprintf(stderr, "blkverify mode is set by setting blkverify=on "
"and using two files with vote_threshold=2\n");
}
s->rewrite_corrupted = qemu_opt_get_bool(opts, QUORUM_OPT_REWRITE,
false);
if (s->rewrite_corrupted && s->is_blkverify) {
error_setg(&local_err,
"rewrite-corrupted=on cannot be used with blkverify=on");
ret = -EINVAL;
goto exit;
}
}
/* allocate the children BlockDriverState array */
@@ -903,6 +989,7 @@ close_exit:
g_free(s->bs);
g_free(opened);
exit:
qemu_opts_del(opts);
/* propagate error */
if (local_err) {
error_propagate(errp, local_err);
@@ -945,6 +1032,39 @@ static void quorum_attach_aio_context(BlockDriverState *bs,
}
}
static void quorum_refresh_filename(BlockDriverState *bs)
{
BDRVQuorumState *s = bs->opaque;
QDict *opts;
QList *children;
int i;
for (i = 0; i < s->num_children; i++) {
bdrv_refresh_filename(s->bs[i]);
if (!s->bs[i]->full_open_options) {
return;
}
}
children = qlist_new();
for (i = 0; i < s->num_children; i++) {
QINCREF(s->bs[i]->full_open_options);
qlist_append_obj(children, QOBJECT(s->bs[i]->full_open_options));
}
opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("quorum")));
qdict_put_obj(opts, QUORUM_OPT_VOTE_THRESHOLD,
QOBJECT(qint_from_int(s->threshold)));
qdict_put_obj(opts, QUORUM_OPT_BLKVERIFY,
QOBJECT(qbool_from_int(s->is_blkverify)));
qdict_put_obj(opts, QUORUM_OPT_REWRITE,
QOBJECT(qbool_from_int(s->rewrite_corrupted)));
qdict_put_obj(opts, "children", QOBJECT(children));
bs->full_open_options = opts;
}
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -953,6 +1073,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
.bdrv_refresh_filename = quorum_refresh_filename,
.bdrv_co_flush_to_disk = quorum_co_flush,

View File

@@ -517,7 +517,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
s = state->bs->opaque;
state->opaque = g_malloc0(sizeof(BDRVRawReopenState));
state->opaque = g_new0(BDRVRawReopenState, 1);
raw_s = state->opaque;
#ifdef CONFIG_LINUX_AIO
@@ -747,6 +747,15 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
}
if (len == -1 && errno == EINTR) {
continue;
} else if (len == -1 && errno == EINVAL &&
(aiocb->bs->open_flags & BDRV_O_NOCACHE) &&
!(aiocb->aio_type & QEMU_AIO_WRITE) &&
offset > 0) {
/* O_DIRECT pread() may fail with EINVAL when offset is unaligned
* after a short read. Assume that O_DIRECT short reads only occur
* at EOF. Therefore this is a short read, not an I/O error.
*/
break;
} else if (len == -1) {
offset = -errno;
break;
@@ -798,7 +807,11 @@ static ssize_t handle_aiocb_rw(RawPosixAIOData *aiocb)
* Ok, we have to do it the hard way, copy all segments into
* a single aligned buffer.
*/
buf = qemu_blockalign(aiocb->bs, aiocb->aio_nbytes);
buf = qemu_try_blockalign(aiocb->bs, aiocb->aio_nbytes);
if (buf == NULL) {
return -ENOMEM;
}
if (aiocb->aio_type & QEMU_AIO_WRITE) {
char *p = buf;
int i;

View File

@@ -617,7 +617,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
RBDAIOCmd cmd)
{
RBDAIOCB *acb;
RADOSCB *rcb;
RADOSCB *rcb = NULL;
rbd_completion_t c;
int64_t off, size;
char *buf;
@@ -631,7 +631,10 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
acb->bounce = NULL;
} else {
acb->bounce = qemu_blockalign(bs, qiov->size);
acb->bounce = qemu_try_blockalign(bs, qiov->size);
if (acb->bounce == NULL) {
goto failed;
}
}
acb->ret = 0;
acb->error = 0;
@@ -649,7 +652,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
off = sector_num * BDRV_SECTOR_SIZE;
size = nb_sectors * BDRV_SECTOR_SIZE;
rcb = g_malloc(sizeof(RADOSCB));
rcb = g_new(RADOSCB, 1);
rcb->done = 0;
rcb->acb = acb;
rcb->buf = buf;
@@ -859,7 +862,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
int max_snaps = RBD_MAX_SNAPS;
do {
snaps = g_malloc(sizeof(*snaps) * max_snaps);
snaps = g_new(rbd_snap_info_t, max_snaps);
snap_count = rbd_snap_list(s->image, snaps, &max_snaps);
if (snap_count <= 0) {
g_free(snaps);
@@ -870,7 +873,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
goto done;
}
sn_tab = g_malloc0(snap_count * sizeof(QEMUSnapshotInfo));
sn_tab = g_new0(QEMUSnapshotInfo, snap_count);
for (i = 0; i < snap_count; i++) {
const char *snap_name = snaps[i].name;

View File

@@ -103,6 +103,9 @@
#define SD_INODE_SIZE (sizeof(SheepdogInode))
#define CURRENT_VDI_ID 0
#define LOCK_TYPE_NORMAL 0
#define LOCK_TYPE_SHARED 1 /* for iSCSI multipath */
typedef struct SheepdogReq {
uint8_t proto_ver;
uint8_t opcode;
@@ -166,7 +169,8 @@ typedef struct SheepdogVdiReq {
uint8_t copy_policy;
uint8_t reserved[2];
uint32_t snapid;
uint32_t pad[3];
uint32_t type;
uint32_t pad[2];
} SheepdogVdiReq;
typedef struct SheepdogVdiRsp {
@@ -712,7 +716,6 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid)
static coroutine_fn void reconnect_to_sdog(void *opaque)
{
Error *local_err = NULL;
BDRVSheepdogState *s = opaque;
AIOReq *aio_req, *next;
@@ -727,6 +730,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
/* Try to reconnect the sheepdog server every one second. */
while (s->fd < 0) {
Error *local_err = NULL;
s->fd = get_sheep_fd(s, &local_err);
if (s->fd < 0) {
DPRINTF("Wait for connection to be established\n");
@@ -1090,6 +1094,7 @@ static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
memset(&hdr, 0, sizeof(hdr));
if (lock) {
hdr.opcode = SD_OP_LOCK_VDI;
hdr.type = LOCK_TYPE_NORMAL;
} else {
hdr.opcode = SD_OP_GET_VDI_INFO;
}
@@ -1110,6 +1115,8 @@ static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
sd_strerror(rsp->result), filename, snapid, tag);
if (rsp->result == SD_RES_NO_VDI) {
ret = -ENOENT;
} else if (rsp->result == SD_RES_VDI_LOCKED) {
ret = -EBUSY;
} else {
ret = -EIO;
}
@@ -1682,7 +1689,7 @@ static int sd_create(const char *filename, QemuOpts *opts,
uint32_t snapid;
bool prealloc = false;
s = g_malloc0(sizeof(BDRVSheepdogState));
s = g_new0(BDRVSheepdogState, 1);
memset(tag, 0, sizeof(tag));
if (strstr(filename, "://")) {
@@ -1793,6 +1800,7 @@ static void sd_close(BlockDriverState *bs)
memset(&hdr, 0, sizeof(hdr));
hdr.opcode = SD_OP_RELEASE_VDI;
hdr.type = LOCK_TYPE_NORMAL;
hdr.base_vdi_id = s->inode.vdi_id;
wlen = strlen(s->name) + 1;
hdr.data_length = wlen;
@@ -2273,7 +2281,7 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
uint32_t snapid = 0;
int ret = 0;
old_s = g_malloc(sizeof(BDRVSheepdogState));
old_s = g_new(BDRVSheepdogState, 1);
memcpy(old_s, s, sizeof(BDRVSheepdogState));
@@ -2357,7 +2365,7 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
goto out;
}
sn_tab = g_malloc0(nr * sizeof(*sn_tab));
sn_tab = g_new0(QEMUSnapshotInfo, nr);
/* calculate a vdi id with hash function */
hval = fnv_64a_buf(s->name, strlen(s->name), FNV1A_64_INIT);

View File

@@ -53,13 +53,6 @@
#include "block/block_int.h"
#include "qemu/module.h"
#include "migration/migration.h"
#ifdef __linux__
#include <linux/fs.h>
#include <sys/ioctl.h>
#ifndef FS_NOCOW_FL
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
#endif
#endif
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>
@@ -299,7 +292,12 @@ static int vdi_check(BlockDriverState *bs, BdrvCheckResult *res,
return -ENOTSUP;
}
bmap = g_malloc(s->header.blocks_in_image * sizeof(uint32_t));
bmap = g_try_new(uint32_t, s->header.blocks_in_image);
if (s->header.blocks_in_image && bmap == NULL) {
res->check_errors++;
return -ENOMEM;
}
memset(bmap, 0xff, s->header.blocks_in_image * sizeof(uint32_t));
/* Check block map and value of blocks_allocated. */
@@ -357,23 +355,23 @@ static int vdi_make_empty(BlockDriverState *bs)
static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
{
const VdiHeader *header = (const VdiHeader *)buf;
int result = 0;
int ret = 0;
logout("\n");
if (buf_size < sizeof(*header)) {
/* Header too small, no VDI. */
} else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
result = 100;
ret = 100;
}
if (result == 0) {
if (ret == 0) {
logout("no vdi image\n");
} else {
logout("%s", header->text);
}
return result;
return ret;
}
static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
@@ -478,7 +476,12 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
bmap_size = header.blocks_in_image * sizeof(uint32_t);
bmap_size = (bmap_size + SECTOR_SIZE - 1) / SECTOR_SIZE;
s->bmap = g_malloc(bmap_size * SECTOR_SIZE);
s->bmap = qemu_try_blockalign(bs->file, bmap_size * SECTOR_SIZE);
if (s->bmap == NULL) {
ret = -ENOMEM;
goto fail;
}
ret = bdrv_read(bs->file, s->bmap_sector, (uint8_t *)s->bmap, bmap_size);
if (ret < 0) {
goto fail_free_bmap;
@@ -493,7 +496,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
return 0;
fail_free_bmap:
g_free(s->bmap);
qemu_vfree(s->bmap);
fail:
return ret;
@@ -681,8 +684,7 @@ static int vdi_co_write(BlockDriverState *bs,
static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
{
int fd;
int result = 0;
int ret = 0;
uint64_t bytes = 0;
uint32_t blocks;
size_t block_size = DEFAULT_CLUSTER_SIZE;
@@ -690,7 +692,10 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
VdiHeader header;
size_t i;
size_t bmap_size;
bool nocow = false;
int64_t offset = 0;
Error *local_err = NULL;
BlockDriverState *bs = NULL;
uint32_t *bmap = NULL;
logout("\n");
@@ -707,37 +712,25 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
image_type = VDI_TYPE_STATIC;
}
#endif
nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
if (bytes > VDI_DISK_SIZE_MAX) {
result = -ENOTSUP;
ret = -ENOTSUP;
error_setg(errp, "Unsupported VDI image size (size is 0x%" PRIx64
", max supported is 0x%" PRIx64 ")",
bytes, VDI_DISK_SIZE_MAX);
goto exit;
}
fd = qemu_open(filename,
O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
0644);
if (fd < 0) {
result = -errno;
ret = bdrv_create_file(filename, opts, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto exit;
}
if (nocow) {
#ifdef __linux__
/* Set NOCOW flag to solve performance issue on fs like btrfs.
* This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will
* be ignored since any failure of this operation should not block the
* left work.
*/
int attr;
if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
attr |= FS_NOCOW_FL;
ioctl(fd, FS_IOC_SETFLAGS, &attr);
}
#endif
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
NULL, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto exit;
}
/* We need enough blocks to store the given disk size,
@@ -769,13 +762,20 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
vdi_header_print(&header);
#endif
vdi_header_to_le(&header);
if (write(fd, &header, sizeof(header)) < 0) {
result = -errno;
goto close_and_exit;
ret = bdrv_pwrite_sync(bs, offset, &header, sizeof(header));
if (ret < 0) {
error_setg(errp, "Error writing header to %s", filename);
goto exit;
}
offset += sizeof(header);
if (bmap_size > 0) {
uint32_t *bmap = g_malloc0(bmap_size);
bmap = g_try_malloc0(bmap_size);
if (bmap == NULL) {
ret = -ENOMEM;
error_setg(errp, "Could not allocate bmap");
goto exit;
}
for (i = 0; i < blocks; i++) {
if (image_type == VDI_TYPE_STATIC) {
bmap[i] = i;
@@ -783,35 +783,33 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
bmap[i] = VDI_UNALLOCATED;
}
}
if (write(fd, bmap, bmap_size) < 0) {
result = -errno;
g_free(bmap);
goto close_and_exit;
ret = bdrv_pwrite_sync(bs, offset, bmap, bmap_size);
if (ret < 0) {
error_setg(errp, "Error writing bmap to %s", filename);
goto exit;
}
g_free(bmap);
offset += bmap_size;
}
if (image_type == VDI_TYPE_STATIC) {
if (ftruncate(fd, sizeof(header) + bmap_size + blocks * block_size)) {
result = -errno;
goto close_and_exit;
ret = bdrv_truncate(bs, offset + blocks * block_size);
if (ret < 0) {
error_setg(errp, "Failed to statically allocate %s", filename);
goto exit;
}
}
close_and_exit:
if ((close(fd) < 0) && !result) {
result = -errno;
}
exit:
return result;
bdrv_unref(bs);
g_free(bmap);
return ret;
}
static void vdi_close(BlockDriverState *bs)
{
BDRVVdiState *s = bs->opaque;
g_free(s->bmap);
qemu_vfree(s->bmap);
migrate_del_blocker(s->migration_blocker);
error_free(s->migration_blocker);

View File

@@ -82,8 +82,6 @@ void vhdx_log_desc_le_import(VHDXLogDescriptor *d)
assert(d != NULL);
le32_to_cpus(&d->signature);
le32_to_cpus(&d->trailing_bytes);
le64_to_cpus(&d->leading_bytes);
le64_to_cpus(&d->file_offset);
le64_to_cpus(&d->sequence_number);
}
@@ -99,6 +97,15 @@ void vhdx_log_desc_le_export(VHDXLogDescriptor *d)
cpu_to_le64s(&d->sequence_number);
}
void vhdx_log_data_le_import(VHDXLogDataSector *d)
{
assert(d != NULL);
le32_to_cpus(&d->data_signature);
le32_to_cpus(&d->sequence_high);
le32_to_cpus(&d->sequence_low);
}
void vhdx_log_data_le_export(VHDXLogDataSector *d)
{
assert(d != NULL);

View File

@@ -84,6 +84,7 @@ static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
if (ret < 0) {
goto exit;
}
vhdx_log_entry_hdr_le_import(hdr);
exit:
return ret;
@@ -211,7 +212,7 @@ static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
{
int valid = false;
if (memcmp(&hdr->signature, "loge", 4)) {
if (hdr->signature != VHDX_LOG_SIGNATURE) {
goto exit;
}
@@ -275,12 +276,12 @@ static bool vhdx_log_desc_is_valid(VHDXLogDescriptor *desc,
goto exit;
}
if (!memcmp(&desc->signature, "zero", 4)) {
if (desc->signature == VHDX_LOG_ZERO_SIGNATURE) {
if (desc->zero_length % VHDX_LOG_SECTOR_SIZE == 0) {
/* valid */
ret = true;
}
} else if (!memcmp(&desc->signature, "desc", 4)) {
} else if (desc->signature == VHDX_LOG_DESC_SIGNATURE) {
/* valid */
ret = true;
}
@@ -327,13 +328,15 @@ static int vhdx_compute_desc_sectors(uint32_t desc_cnt)
* passed into this function. Each descriptor will also be validated,
* and error returned if any are invalid. */
static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
VHDXLogEntries *log, VHDXLogDescEntries **buffer)
VHDXLogEntries *log, VHDXLogDescEntries **buffer,
bool convert_endian)
{
int ret = 0;
uint32_t desc_sectors;
uint32_t sectors_read;
VHDXLogEntryHeader hdr;
VHDXLogDescEntries *desc_entries = NULL;
VHDXLogDescriptor desc;
int i;
assert(*buffer == NULL);
@@ -342,14 +345,19 @@ static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
if (ret < 0) {
goto exit;
}
vhdx_log_entry_hdr_le_import(&hdr);
if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
ret = -EINVAL;
goto exit;
}
desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
desc_entries = qemu_blockalign(bs, desc_sectors * VHDX_LOG_SECTOR_SIZE);
desc_entries = qemu_try_blockalign(bs->file,
desc_sectors * VHDX_LOG_SECTOR_SIZE);
if (desc_entries == NULL) {
ret = -ENOMEM;
goto exit;
}
ret = vhdx_log_read_sectors(bs, log, &sectors_read, desc_entries,
desc_sectors, false);
@@ -363,12 +371,19 @@ static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
/* put in proper endianness, and validate each desc */
for (i = 0; i < hdr.descriptor_count; i++) {
vhdx_log_desc_le_import(&desc_entries->desc[i]);
if (vhdx_log_desc_is_valid(&desc_entries->desc[i], &hdr) == false) {
desc = desc_entries->desc[i];
vhdx_log_desc_le_import(&desc);
if (convert_endian) {
desc_entries->desc[i] = desc;
}
if (vhdx_log_desc_is_valid(&desc, &hdr) == false) {
ret = -EINVAL;
goto free_and_exit;
}
}
if (convert_endian) {
desc_entries->hdr = hdr;
}
*buffer = desc_entries;
goto exit;
@@ -403,7 +418,7 @@ static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
if (!memcmp(&desc->signature, "desc", 4)) {
if (desc->signature == VHDX_LOG_DESC_SIGNATURE) {
/* data sector */
if (data == NULL) {
ret = -EFAULT;
@@ -431,10 +446,15 @@ static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
memcpy(buffer+offset, &desc->trailing_bytes, 4);
} else if (!memcmp(&desc->signature, "zero", 4)) {
} else if (desc->signature == VHDX_LOG_ZERO_SIGNATURE) {
/* write 'count' sectors of sector */
memset(buffer, 0, VHDX_LOG_SECTOR_SIZE);
count = desc->zero_length / VHDX_LOG_SECTOR_SIZE;
} else {
error_report("Invalid VHDX log descriptor entry signature 0x%" PRIx32,
desc->signature);
ret = -EINVAL;
goto exit;
}
file_offset = desc->file_offset;
@@ -493,13 +513,13 @@ static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
goto exit;
}
ret = vhdx_log_read_desc(bs, s, &logs->log, &desc_entries);
ret = vhdx_log_read_desc(bs, s, &logs->log, &desc_entries, true);
if (ret < 0) {
goto exit;
}
for (i = 0; i < desc_entries->hdr.descriptor_count; i++) {
if (!memcmp(&desc_entries->desc[i].signature, "desc", 4)) {
if (desc_entries->desc[i].signature == VHDX_LOG_DESC_SIGNATURE) {
/* data sector, so read a sector to flush */
ret = vhdx_log_read_sectors(bs, &logs->log, &sectors_read,
data, 1, false);
@@ -510,6 +530,7 @@ static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
ret = -EINVAL;
goto exit;
}
vhdx_log_data_le_import(data);
}
ret = vhdx_log_flush_desc(bs, &desc_entries->desc[i], data);
@@ -558,9 +579,6 @@ static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
goto inc_and_exit;
}
vhdx_log_entry_hdr_le_import(&hdr);
if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
goto inc_and_exit;
}
@@ -573,13 +591,13 @@ static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
/* Read desc sectors, and calculate log checksum */
/* Read all log sectors, and calculate log checksum */
total_sectors = hdr.entry_length / VHDX_LOG_SECTOR_SIZE;
/* read_desc() will increment the read idx */
ret = vhdx_log_read_desc(bs, s, log, &desc_buffer);
ret = vhdx_log_read_desc(bs, s, log, &desc_buffer, false);
if (ret < 0) {
goto free_and_exit;
}
@@ -602,7 +620,7 @@ static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
}
}
crc ^= 0xffffffff;
if (crc != desc_buffer->hdr.checksum) {
if (crc != hdr.checksum) {
goto free_and_exit;
}
@@ -905,7 +923,7 @@ static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
buffer = qemu_blockalign(bs, total_length);
memcpy(buffer, &new_hdr, sizeof(new_hdr));
new_desc = (VHDXLogDescriptor *) (buffer + sizeof(new_hdr));
new_desc = buffer + sizeof(new_hdr);
data_sector = buffer + (desc_sectors * VHDX_LOG_SECTOR_SIZE);
data_tmp = data;
@@ -962,7 +980,6 @@ static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
* last data sector */
vhdx_update_checksum(buffer, total_length,
offsetof(VHDXLogEntryHeader, checksum));
cpu_to_le32s((uint32_t *)(buffer + 4));
/* now write to the log */
ret = vhdx_log_write_sectors(bs, &s->log, &sectors_written, buffer,

View File

@@ -135,10 +135,8 @@ typedef struct VHDXSectorInfo {
* buf: buffer pointer
* size: size of buffer (must be > crc_offset+4)
*
* Note: The resulting checksum is in the CPU endianness, not necessarily
* in the file format endianness (LE). Any header export to disk should
* make sure that vhdx_header_le_export() is used to convert to the
* correct endianness
* Note: The buffer should have all multi-byte data in little-endian format,
* and the resulting checksum is in little endian format.
*/
uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
{
@@ -149,6 +147,7 @@ uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
memset(buf + crc_offset, 0, sizeof(crc));
crc = crc32c(0xffffffff, buf, size);
cpu_to_le32s(&crc);
memcpy(buf + crc_offset, &crc, sizeof(crc));
return crc;
@@ -300,7 +299,7 @@ static int vhdx_write_header(BlockDriverState *bs_file, VHDXHeader *hdr,
{
uint8_t *buffer = NULL;
int ret;
VHDXHeader header_le;
VHDXHeader *header_le;
assert(bs_file != NULL);
assert(hdr != NULL);
@@ -321,11 +320,12 @@ static int vhdx_write_header(BlockDriverState *bs_file, VHDXHeader *hdr,
}
/* overwrite the actual VHDXHeader portion */
memcpy(buffer, hdr, sizeof(VHDXHeader));
hdr->checksum = vhdx_update_checksum(buffer, VHDX_HEADER_SIZE,
offsetof(VHDXHeader, checksum));
vhdx_header_le_export(hdr, &header_le);
ret = bdrv_pwrite_sync(bs_file, offset, &header_le, sizeof(VHDXHeader));
header_le = (VHDXHeader *)buffer;
memcpy(header_le, hdr, sizeof(VHDXHeader));
vhdx_header_le_export(hdr, header_le);
vhdx_update_checksum(buffer, VHDX_HEADER_SIZE,
offsetof(VHDXHeader, checksum));
ret = bdrv_pwrite_sync(bs_file, offset, header_le, sizeof(VHDXHeader));
exit:
qemu_vfree(buffer);
@@ -432,13 +432,14 @@ static void vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s,
}
/* copy over just the relevant portion that we need */
memcpy(header1, buffer, sizeof(VHDXHeader));
vhdx_header_le_import(header1);
if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
!memcmp(&header1->signature, "head", 4) &&
header1->version == 1) {
h1_seq = header1->sequence_number;
h1_valid = true;
if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4)) {
vhdx_header_le_import(header1);
if (header1->signature == VHDX_HEADER_SIGNATURE &&
header1->version == 1) {
h1_seq = header1->sequence_number;
h1_valid = true;
}
}
ret = bdrv_pread(bs->file, VHDX_HEADER2_OFFSET, buffer, VHDX_HEADER_SIZE);
@@ -447,13 +448,14 @@ static void vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s,
}
/* copy over just the relevant portion that we need */
memcpy(header2, buffer, sizeof(VHDXHeader));
vhdx_header_le_import(header2);
if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4) &&
!memcmp(&header2->signature, "head", 4) &&
header2->version == 1) {
h2_seq = header2->sequence_number;
h2_valid = true;
if (vhdx_checksum_is_valid(buffer, VHDX_HEADER_SIZE, 4)) {
vhdx_header_le_import(header2);
if (header2->signature == VHDX_HEADER_SIGNATURE &&
header2->version == 1) {
h2_seq = header2->sequence_number;
h2_valid = true;
}
}
/* If there is only 1 valid header (or no valid headers), we
@@ -519,15 +521,21 @@ static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s)
goto fail;
}
memcpy(&s->rt, buffer, sizeof(s->rt));
vhdx_region_header_le_import(&s->rt);
offset += sizeof(s->rt);
if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
memcmp(&s->rt.signature, "regi", 4)) {
if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4)) {
ret = -EINVAL;
goto fail;
}
vhdx_region_header_le_import(&s->rt);
if (s->rt.signature != VHDX_REGION_SIGNATURE) {
ret = -EINVAL;
goto fail;
}
/* Per spec, maximum region table entry count is 2047 */
if (s->rt.entry_count > 2047) {
ret = -EINVAL;
@@ -630,7 +638,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
vhdx_metadata_header_le_import(&s->metadata_hdr);
if (memcmp(&s->metadata_hdr.signature, "metadata", 8)) {
if (s->metadata_hdr.signature != VHDX_METADATA_SIGNATURE) {
ret = -EINVAL;
goto exit;
}
@@ -950,7 +958,11 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags,
}
/* s->bat is freed in vhdx_close() */
s->bat = qemu_blockalign(bs, s->bat_rt.length);
s->bat = qemu_try_blockalign(bs->file, s->bat_rt.length);
if (s->bat == NULL) {
ret = -ENOMEM;
goto fail;
}
ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
if (ret < 0) {
@@ -1369,7 +1381,7 @@ static int vhdx_create_new_headers(BlockDriverState *bs, uint64_t image_size,
int ret = 0;
VHDXHeader *hdr = NULL;
hdr = g_malloc0(sizeof(VHDXHeader));
hdr = g_new0(VHDXHeader, 1);
hdr->signature = VHDX_HEADER_SIGNATURE;
hdr->sequence_number = g_random_int();
@@ -1540,7 +1552,8 @@ exit:
*/
static int vhdx_create_bat(BlockDriverState *bs, BDRVVHDXState *s,
uint64_t image_size, VHDXImageType type,
bool use_zero_blocks, VHDXRegionTableEntry *rt_bat)
bool use_zero_blocks, uint64_t file_offset,
uint32_t length)
{
int ret = 0;
uint64_t data_file_offset;
@@ -1555,7 +1568,7 @@ static int vhdx_create_bat(BlockDriverState *bs, BDRVVHDXState *s,
/* this gives a data start after BAT/bitmap entries, and well
* past any metadata entries (with a 4 MB buffer for future
* expansion */
data_file_offset = rt_bat->file_offset + rt_bat->length + 5 * MiB;
data_file_offset = file_offset + length + 5 * MiB;
total_sectors = image_size >> s->logical_sector_size_bits;
if (type == VHDX_TYPE_DYNAMIC) {
@@ -1579,7 +1592,11 @@ static int vhdx_create_bat(BlockDriverState *bs, BDRVVHDXState *s,
use_zero_blocks ||
bdrv_has_zero_init(bs) == 0) {
/* for a fixed file, the default BAT entry is not zero */
s->bat = g_malloc0(rt_bat->length);
s->bat = g_try_malloc0(length);
if (length && s->bat != NULL) {
ret = -ENOMEM;
goto exit;
}
block_state = type == VHDX_TYPE_FIXED ? PAYLOAD_BLOCK_FULLY_PRESENT :
PAYLOAD_BLOCK_NOT_PRESENT;
block_state = use_zero_blocks ? PAYLOAD_BLOCK_ZERO : block_state;
@@ -1594,7 +1611,7 @@ static int vhdx_create_bat(BlockDriverState *bs, BDRVVHDXState *s,
cpu_to_le64s(&s->bat[sinfo.bat_idx]);
sector_num += s->sectors_per_block;
}
ret = bdrv_pwrite(bs, rt_bat->file_offset, s->bat, rt_bat->length);
ret = bdrv_pwrite(bs, file_offset, s->bat, length);
if (ret < 0) {
goto exit;
}
@@ -1626,6 +1643,8 @@ static int vhdx_create_new_region_table(BlockDriverState *bs,
int ret = 0;
uint32_t offset = 0;
void *buffer = NULL;
uint64_t bat_file_offset;
uint32_t bat_length;
BDRVVHDXState *s = NULL;
VHDXRegionTableHeader *region_table;
VHDXRegionTableEntry *rt_bat;
@@ -1635,7 +1654,7 @@ static int vhdx_create_new_region_table(BlockDriverState *bs,
/* Populate enough of the BDRVVHDXState to be able to use the
* pre-existing BAT calculation, translation, and update functions */
s = g_malloc0(sizeof(BDRVVHDXState));
s = g_new0(BDRVVHDXState, 1);
s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
(uint64_t) sector_size / (uint64_t) block_size;
@@ -1674,19 +1693,26 @@ static int vhdx_create_new_region_table(BlockDriverState *bs,
rt_metadata->length = 1 * MiB; /* min size, and more than enough */
*metadata_offset = rt_metadata->file_offset;
bat_file_offset = rt_bat->file_offset;
bat_length = rt_bat->length;
vhdx_region_header_le_export(region_table);
vhdx_region_entry_le_export(rt_bat);
vhdx_region_entry_le_export(rt_metadata);
vhdx_update_checksum(buffer, VHDX_HEADER_BLOCK_SIZE,
offsetof(VHDXRegionTableHeader, checksum));
/* The region table gives us the data we need to create the BAT,
* so do that now */
ret = vhdx_create_bat(bs, s, image_size, type, use_zero_blocks, rt_bat);
ret = vhdx_create_bat(bs, s, image_size, type, use_zero_blocks,
bat_file_offset, bat_length);
if (ret < 0) {
goto exit;
}
/* Now write out the region headers to disk */
vhdx_region_header_le_export(region_table);
vhdx_region_entry_le_export(rt_bat);
vhdx_region_entry_le_export(rt_metadata);
ret = bdrv_pwrite(bs, VHDX_REGION_TABLE_OFFSET, buffer,
VHDX_HEADER_BLOCK_SIZE);
if (ret < 0) {

View File

@@ -435,6 +435,7 @@ void vhdx_header_le_import(VHDXHeader *h);
void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h);
void vhdx_log_desc_le_import(VHDXLogDescriptor *d);
void vhdx_log_desc_le_export(VHDXLogDescriptor *d);
void vhdx_log_data_le_import(VHDXLogDataSector *d);
void vhdx_log_data_le_export(VHDXLogDataSector *d);
void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);

View File

@@ -106,6 +106,7 @@ typedef struct VmdkExtent {
uint32_t l2_cache_counts[L2_CACHE_SIZE];
int64_t cluster_sectors;
int64_t next_cluster_sector;
char *type;
} VmdkExtent;
@@ -124,7 +125,6 @@ typedef struct BDRVVmdkState {
} BDRVVmdkState;
typedef struct VmdkMetaData {
uint32_t offset;
unsigned int l1_index;
unsigned int l2_index;
unsigned int l2_offset;
@@ -233,7 +233,7 @@ static void vmdk_free_last_extent(BlockDriverState *bs)
return;
}
s->num_extents--;
s->extents = g_realloc(s->extents, s->num_extents * sizeof(VmdkExtent));
s->extents = g_renew(VmdkExtent, s->extents, s->num_extents);
}
static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
@@ -397,6 +397,7 @@ static int vmdk_add_extent(BlockDriverState *bs,
{
VmdkExtent *extent;
BDRVVmdkState *s = bs->opaque;
int64_t nb_sectors;
if (cluster_sectors > 0x200000) {
/* 0x200000 * 512Bytes = 1GB for one cluster is unrealistic */
@@ -412,8 +413,12 @@ static int vmdk_add_extent(BlockDriverState *bs,
return -EFBIG;
}
s->extents = g_realloc(s->extents,
(s->num_extents + 1) * sizeof(VmdkExtent));
nb_sectors = bdrv_nb_sectors(file);
if (nb_sectors < 0) {
return nb_sectors;
}
s->extents = g_renew(VmdkExtent, s->extents, s->num_extents + 1);
extent = &s->extents[s->num_extents];
s->num_extents++;
@@ -427,6 +432,7 @@ static int vmdk_add_extent(BlockDriverState *bs,
extent->l1_entry_sectors = l2_size * cluster_sectors;
extent->l2_size = l2_size;
extent->cluster_sectors = flat ? sectors : cluster_sectors;
extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
if (s->num_extents > 1) {
extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -448,7 +454,11 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
/* read the L1 table */
l1_size = extent->l1_size * sizeof(uint32_t);
extent->l1_table = g_malloc(l1_size);
extent->l1_table = g_try_malloc(l1_size);
if (l1_size && extent->l1_table == NULL) {
return -ENOMEM;
}
ret = bdrv_pread(extent->file,
extent->l1_table_offset,
extent->l1_table,
@@ -464,7 +474,11 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
}
if (extent->l1_backup_table_offset) {
extent->l1_backup_table = g_malloc(l1_size);
extent->l1_backup_table = g_try_malloc(l1_size);
if (l1_size && extent->l1_backup_table == NULL) {
ret = -ENOMEM;
goto fail_l1;
}
ret = bdrv_pread(extent->file,
extent->l1_backup_table_offset,
extent->l1_backup_table,
@@ -481,7 +495,7 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
}
extent->l2_cache =
g_malloc(extent->l2_size * L2_CACHE_SIZE * sizeof(uint32_t));
g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
return 0;
fail_l1b:
g_free(extent->l1_backup_table);
@@ -669,8 +683,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
if (le32_to_cpu(header.flags) & VMDK4_FLAG_RGD) {
l1_backup_offset = le64_to_cpu(header.rgd_offset) << 9;
}
if (bdrv_getlength(file) <
le64_to_cpu(header.grain_offset) * BDRV_SECTOR_SIZE) {
if (bdrv_nb_sectors(file) < le64_to_cpu(header.grain_offset)) {
error_setg(errp, "File truncated, expecting at least %" PRId64 " bytes",
(int64_t)(le64_to_cpu(header.grain_offset)
* BDRV_SECTOR_SIZE));
@@ -952,57 +965,97 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
}
}
/**
* get_whole_cluster
*
* Copy backing file's cluster that covers @sector_num, otherwise write zero,
* to the cluster at @cluster_sector_num.
*
* If @skip_start_sector < @skip_end_sector, the relative range
* [@skip_start_sector, @skip_end_sector) is not copied or written, and leave
* it for call to write user data in the request.
*/
static int get_whole_cluster(BlockDriverState *bs,
VmdkExtent *extent,
uint64_t cluster_offset,
uint64_t offset,
bool allocate)
VmdkExtent *extent,
uint64_t cluster_sector_num,
uint64_t sector_num,
uint64_t skip_start_sector,
uint64_t skip_end_sector)
{
int ret = VMDK_OK;
uint8_t *whole_grain = NULL;
int64_t cluster_bytes;
uint8_t *whole_grain;
/* For COW, align request sector_num to cluster start */
sector_num = QEMU_ALIGN_DOWN(sector_num, extent->cluster_sectors);
cluster_bytes = extent->cluster_sectors << BDRV_SECTOR_BITS;
whole_grain = qemu_blockalign(bs, cluster_bytes);
if (!bs->backing_hd) {
memset(whole_grain, 0, skip_start_sector << BDRV_SECTOR_BITS);
memset(whole_grain + (skip_end_sector << BDRV_SECTOR_BITS), 0,
cluster_bytes - (skip_end_sector << BDRV_SECTOR_BITS));
}
assert(skip_end_sector <= extent->cluster_sectors);
/* we will be here if it's first write on non-exist grain(cluster).
* try to read from parent image, if exist */
if (bs->backing_hd) {
whole_grain =
qemu_blockalign(bs, extent->cluster_sectors << BDRV_SECTOR_BITS);
if (!vmdk_is_cid_valid(bs)) {
ret = VMDK_ERROR;
goto exit;
}
if (bs->backing_hd && !vmdk_is_cid_valid(bs)) {
ret = VMDK_ERROR;
goto exit;
}
/* floor offset to cluster */
offset -= offset % (extent->cluster_sectors * 512);
ret = bdrv_read(bs->backing_hd, offset >> 9, whole_grain,
extent->cluster_sectors);
if (ret < 0) {
ret = VMDK_ERROR;
goto exit;
/* Read backing data before skip range */
if (skip_start_sector > 0) {
if (bs->backing_hd) {
ret = bdrv_read(bs->backing_hd, sector_num,
whole_grain, skip_start_sector);
if (ret < 0) {
ret = VMDK_ERROR;
goto exit;
}
}
/* Write grain only into the active image */
ret = bdrv_write(extent->file, cluster_offset, whole_grain,
extent->cluster_sectors);
ret = bdrv_write(extent->file, cluster_sector_num, whole_grain,
skip_start_sector);
if (ret < 0) {
ret = VMDK_ERROR;
goto exit;
}
}
/* Read backing data after skip range */
if (skip_end_sector < extent->cluster_sectors) {
if (bs->backing_hd) {
ret = bdrv_read(bs->backing_hd, sector_num + skip_end_sector,
whole_grain + (skip_end_sector << BDRV_SECTOR_BITS),
extent->cluster_sectors - skip_end_sector);
if (ret < 0) {
ret = VMDK_ERROR;
goto exit;
}
}
ret = bdrv_write(extent->file, cluster_sector_num + skip_end_sector,
whole_grain + (skip_end_sector << BDRV_SECTOR_BITS),
extent->cluster_sectors - skip_end_sector);
if (ret < 0) {
ret = VMDK_ERROR;
goto exit;
}
}
exit:
qemu_vfree(whole_grain);
return ret;
}
static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data)
static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
uint32_t offset)
{
uint32_t offset;
QEMU_BUILD_BUG_ON(sizeof(offset) != sizeof(m_data->offset));
offset = cpu_to_le32(m_data->offset);
offset = cpu_to_le32(offset);
/* update L2 table */
if (bdrv_pwrite_sync(
extent->file,
((int64_t)m_data->l2_offset * 512)
+ (m_data->l2_index * sizeof(m_data->offset)),
+ (m_data->l2_index * sizeof(offset)),
&offset, sizeof(offset)) < 0) {
return VMDK_ERROR;
}
@@ -1012,7 +1065,7 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data)
if (bdrv_pwrite_sync(
extent->file,
((int64_t)m_data->l2_offset * 512)
+ (m_data->l2_index * sizeof(m_data->offset)),
+ (m_data->l2_index * sizeof(offset)),
&offset, sizeof(offset)) < 0) {
return VMDK_ERROR;
}
@@ -1024,17 +1077,41 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data)
return VMDK_OK;
}
/**
* get_cluster_offset
*
* Look up cluster offset in extent file by sector number, and store in
* @cluster_offset.
*
* For flat extents, the start offset as parsed from the description file is
* returned.
*
* For sparse extents, look up in L1, L2 table. If allocate is true, return an
* offset for a new cluster and update L2 cache. If there is a backing file,
* COW is done before returning; otherwise, zeroes are written to the allocated
* cluster. Both COW and zero writing skips the sector range
* [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
* has new data to write there.
*
* Returns: VMDK_OK if cluster exists and mapped in the image.
* VMDK_UNALLOC if cluster is not mapped and @allocate is false.
* VMDK_ERROR if failed.
*/
static int get_cluster_offset(BlockDriverState *bs,
VmdkExtent *extent,
VmdkMetaData *m_data,
uint64_t offset,
int allocate,
uint64_t *cluster_offset)
VmdkExtent *extent,
VmdkMetaData *m_data,
uint64_t offset,
bool allocate,
uint64_t *cluster_offset,
uint64_t skip_start_sector,
uint64_t skip_end_sector)
{
unsigned int l1_index, l2_offset, l2_index;
int min_index, i, j;
uint32_t min_count, *l2_table;
bool zeroed = false;
int64_t ret;
int32_t cluster_sector;
if (m_data) {
m_data->valid = 0;
@@ -1088,52 +1165,41 @@ static int get_cluster_offset(BlockDriverState *bs,
extent->l2_cache_counts[min_index] = 1;
found:
l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
*cluster_offset = le32_to_cpu(l2_table[l2_index]);
cluster_sector = le32_to_cpu(l2_table[l2_index]);
if (m_data) {
m_data->valid = 1;
m_data->l1_index = l1_index;
m_data->l2_index = l2_index;
m_data->offset = *cluster_offset;
m_data->l2_offset = l2_offset;
m_data->l2_cache_entry = &l2_table[l2_index];
}
if (extent->has_zero_grain && *cluster_offset == VMDK_GTE_ZEROED) {
if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
zeroed = true;
}
if (!*cluster_offset || zeroed) {
if (!cluster_sector || zeroed) {
if (!allocate) {
return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
}
/* Avoid the L2 tables update for the images that have snapshots. */
*cluster_offset = bdrv_getlength(extent->file);
if (!extent->compressed) {
bdrv_truncate(
extent->file,
*cluster_offset + (extent->cluster_sectors << 9)
);
}
*cluster_offset >>= 9;
l2_table[l2_index] = cpu_to_le32(*cluster_offset);
cluster_sector = extent->next_cluster_sector;
extent->next_cluster_sector += extent->cluster_sectors;
/* First of all we write grain itself, to avoid race condition
* that may to corrupt the image.
* This problem may occur because of insufficient space on host disk
* or inappropriate VM shutdown.
*/
if (get_whole_cluster(
bs, extent, *cluster_offset, offset, allocate) == -1) {
return VMDK_ERROR;
}
if (m_data) {
m_data->offset = *cluster_offset;
ret = get_whole_cluster(bs, extent,
cluster_sector,
offset >> BDRV_SECTOR_BITS,
skip_start_sector, skip_end_sector);
if (ret) {
return ret;
}
}
*cluster_offset <<= 9;
*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
return VMDK_OK;
}
@@ -1168,7 +1234,8 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
}
qemu_co_mutex_lock(&s->lock);
ret = get_cluster_offset(bs, extent, NULL,
sector_num * 512, 0, &offset);
sector_num * 512, false, &offset,
0, 0);
qemu_co_mutex_unlock(&s->lock);
switch (ret) {
@@ -1321,9 +1388,9 @@ static int vmdk_read(BlockDriverState *bs, int64_t sector_num,
if (!extent) {
return -EIO;
}
ret = get_cluster_offset(
bs, extent, NULL,
sector_num << 9, 0, &cluster_offset);
ret = get_cluster_offset(bs, extent, NULL,
sector_num << 9, false, &cluster_offset,
0, 0);
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
@@ -1404,12 +1471,17 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
if (!extent) {
return -EIO;
}
ret = get_cluster_offset(
bs,
extent,
&m_data,
sector_num << 9, !extent->compressed,
&cluster_offset);
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
n = extent->cluster_sectors - index_in_cluster;
if (n > nb_sectors) {
n = nb_sectors;
}
ret = get_cluster_offset(bs, extent, &m_data, sector_num << 9,
!(extent->compressed || zeroed),
&cluster_offset,
index_in_cluster, index_in_cluster + n);
if (extent->compressed) {
if (ret == VMDK_OK) {
/* Refuse write to allocated cluster for streamOptimized */
@@ -1418,24 +1490,13 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
return -EIO;
} else {
/* allocate */
ret = get_cluster_offset(
bs,
extent,
&m_data,
sector_num << 9, 1,
&cluster_offset);
ret = get_cluster_offset(bs, extent, &m_data, sector_num << 9,
true, &cluster_offset, 0, 0);
}
}
if (ret == VMDK_ERROR) {
return -EINVAL;
}
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
n = extent->cluster_sectors - index_in_cluster;
if (n > nb_sectors) {
n = nb_sectors;
}
if (zeroed) {
/* Do zeroed write, buf is ignored */
if (extent->has_zero_grain &&
@@ -1443,9 +1504,9 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
n >= extent->cluster_sectors) {
n = extent->cluster_sectors;
if (!zero_dry_run) {
m_data.offset = VMDK_GTE_ZEROED;
/* update L2 tables */
if (vmdk_L2update(extent, &m_data) != VMDK_OK) {
if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
!= VMDK_OK) {
return -EIO;
}
}
@@ -1461,7 +1522,9 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
}
if (m_data.valid) {
/* update L2 tables */
if (vmdk_L2update(extent, &m_data) != VMDK_OK) {
if (vmdk_L2update(extent, &m_data,
cluster_offset >> BDRV_SECTOR_BITS)
!= VMDK_OK) {
return -EIO;
}
}
@@ -1999,7 +2062,7 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
BDRVVmdkState *s = bs->opaque;
VmdkExtent *extent = NULL;
int64_t sector_num = 0;
int64_t total_sectors = bdrv_getlength(bs) / BDRV_SECTOR_SIZE;
int64_t total_sectors = bdrv_nb_sectors(bs);
int ret;
uint64_t cluster_offset;
@@ -2020,7 +2083,7 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
}
ret = get_cluster_offset(bs, extent, NULL,
sector_num << BDRV_SECTOR_BITS,
0, &cluster_offset);
false, &cluster_offset, 0, 0);
if (ret == VMDK_ERROR) {
fprintf(stderr,
"ERROR: could not get cluster_offset for sector %"

View File

@@ -29,13 +29,6 @@
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>
#endif
#ifdef __linux__
#include <linux/fs.h>
#include <sys/ioctl.h>
#ifndef FS_NOCOW_FL
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
#endif
#endif
/**************************************************************/
@@ -276,7 +269,11 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
s->pagetable = qemu_blockalign(bs, s->max_table_entries * 4);
s->pagetable = qemu_try_blockalign(bs->file, s->max_table_entries * 4);
if (s->pagetable == NULL) {
ret = -ENOMEM;
goto fail;
}
s->bat_offset = be64_to_cpu(dyndisk_header->table_offset);
@@ -656,39 +653,41 @@ static int calculate_geometry(int64_t total_sectors, uint16_t* cyls,
return 0;
}
static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors)
static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
int64_t total_sectors)
{
VHDDynDiskHeader *dyndisk_header =
(VHDDynDiskHeader *) buf;
size_t block_size, num_bat_entries;
int i;
int ret = -EIO;
int ret;
int64_t offset = 0;
// Write the footer (twice: at the beginning and at the end)
block_size = 0x200000;
num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
if (ret) {
goto fail;
}
if (lseek(fd, 1536 + ((num_bat_entries * 4 + 511) & ~511), SEEK_SET) < 0) {
goto fail;
}
if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
if (ret < 0) {
goto fail;
}
// Write the initial BAT
if (lseek(fd, 3 * 512, SEEK_SET) < 0) {
goto fail;
}
offset = 3 * 512;
memset(buf, 0xFF, 512);
for (i = 0; i < (num_bat_entries * 4 + 511) / 512; i++) {
if (write(fd, buf, 512) != 512) {
ret = bdrv_pwrite_sync(bs, offset, buf, 512);
if (ret < 0) {
goto fail;
}
offset += 512;
}
// Prepare the Dynamic Disk Header
@@ -709,39 +708,35 @@ static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors)
dyndisk_header->checksum = be32_to_cpu(vpc_checksum(buf, 1024));
// Write the header
if (lseek(fd, 512, SEEK_SET) < 0) {
goto fail;
}
offset = 512;
if (write(fd, buf, 1024) != 1024) {
ret = bdrv_pwrite_sync(bs, offset, buf, 1024);
if (ret < 0) {
goto fail;
}
ret = 0;
fail:
return ret;
}
static int create_fixed_disk(int fd, uint8_t *buf, int64_t total_size)
static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf,
int64_t total_size)
{
int ret = -EIO;
int ret;
/* Add footer to total size */
total_size += 512;
if (ftruncate(fd, total_size) != 0) {
ret = -errno;
goto fail;
}
if (lseek(fd, -512, SEEK_END) < 0) {
goto fail;
}
if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
goto fail;
total_size += HEADER_SIZE;
ret = bdrv_truncate(bs, total_size);
if (ret < 0) {
return ret;
}
ret = 0;
ret = bdrv_pwrite_sync(bs, total_size - HEADER_SIZE, buf, HEADER_SIZE);
if (ret < 0) {
return ret;
}
fail:
return ret;
}
@@ -750,7 +745,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
uint8_t buf[1024];
VHDFooter *footer = (VHDFooter *) buf;
char *disk_type_param;
int fd, i;
int i;
uint16_t cyls = 0;
uint8_t heads = 0;
uint8_t secs_per_cyl = 0;
@@ -758,7 +753,8 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
int64_t total_size;
int disk_type;
int ret = -EIO;
bool nocow = false;
Error *local_err = NULL;
BlockDriverState *bs = NULL;
/* Read out options */
total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
@@ -775,28 +771,17 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
} else {
disk_type = VHD_DYNAMIC;
}
nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
/* Create the file */
fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
if (fd < 0) {
ret = -EIO;
ret = bdrv_create_file(filename, opts, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto out;
}
if (nocow) {
#ifdef __linux__
/* Set NOCOW flag to solve performance issue on fs like btrfs.
* This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will
* be ignored since any failure of this operation should not block the
* left work.
*/
int attr;
if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
attr |= FS_NOCOW_FL;
ioctl(fd, FS_IOC_SETFLAGS, &attr);
}
#endif
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
NULL, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto out;
}
/*
@@ -810,7 +795,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
&secs_per_cyl))
{
ret = -EFBIG;
goto fail;
goto out;
}
}
@@ -856,14 +841,13 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
footer->checksum = be32_to_cpu(vpc_checksum(buf, HEADER_SIZE));
if (disk_type == VHD_DYNAMIC) {
ret = create_dynamic_disk(fd, buf, total_sectors);
ret = create_dynamic_disk(bs, buf, total_sectors);
} else {
ret = create_fixed_disk(fd, buf, total_size);
ret = create_fixed_disk(bs, buf, total_size);
}
fail:
qemu_close(fd);
out:
bdrv_unref(bs);
g_free(disk_type_param);
return ret;
}

View File

@@ -52,10 +52,6 @@
#define DLOG(a) a
#undef stderr
#define stderr STDERR
FILE* stderr = NULL;
static void checkpoint(void);
#ifdef __MINGW32__
@@ -732,7 +728,7 @@ static int read_directory(BDRVVVFATState* s, int mapping_index)
if(first_cluster == 0 && (is_dotdot || is_dot))
continue;
buffer=(char*)g_malloc(length);
buffer = g_malloc(length);
snprintf(buffer,length,"%s/%s",dirname,entry->d_name);
if(stat(buffer,&st)<0) {
@@ -767,7 +763,7 @@ static int read_directory(BDRVVVFATState* s, int mapping_index)
/* create mapping for this file */
if(!is_dot && !is_dotdot && (S_ISDIR(st.st_mode) || st.st_size)) {
s->current_mapping=(mapping_t*)array_get_next(&(s->mapping));
s->current_mapping = array_get_next(&(s->mapping));
s->current_mapping->begin=0;
s->current_mapping->end=st.st_size;
/*
@@ -811,12 +807,12 @@ static int read_directory(BDRVVVFATState* s, int mapping_index)
}
/* reget the mapping, since s->mapping was possibly realloc()ed */
mapping = (mapping_t*)array_get(&(s->mapping), mapping_index);
mapping = array_get(&(s->mapping), mapping_index);
first_cluster += (s->directory.next - mapping->info.dir.first_dir_index)
* 0x20 / s->cluster_size;
mapping->end = first_cluster;
direntry = (direntry_t*)array_get(&(s->directory), mapping->dir_index);
direntry = array_get(&(s->directory), mapping->dir_index);
set_begin_of_direntry(direntry, mapping->begin);
return 0;
@@ -1082,11 +1078,6 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
vvv = s;
#endif
DLOG(if (stderr == NULL) {
stderr = fopen("vvfat.log", "a");
setbuf(stderr, NULL);
})
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
@@ -2950,7 +2941,7 @@ static int enable_write_target(BDRVVVFATState *s, Error **errp)
bdrv_set_backing_hd(s->bs, bdrv_new("", &error_abort));
s->bs->backing_hd->drv = &vvfat_write_target;
s->bs->backing_hd->opaque = g_malloc(sizeof(void*));
s->bs->backing_hd->opaque = g_new(void *, 1);
*(void**)s->bs->backing_hd->opaque = s;
return 0;

View File

@@ -139,7 +139,10 @@ BlockDriverAIOCB *win32_aio_submit(BlockDriverState *bs,
waiocb->is_read = (type == QEMU_AIO_READ);
if (qiov->niov > 1) {
waiocb->buf = qemu_blockalign(bs, qiov->size);
waiocb->buf = qemu_try_blockalign(bs, qiov->size);
if (waiocb->buf == NULL) {
goto out;
}
if (type & QEMU_AIO_WRITE) {
iov_to_buf(qiov->iov, qiov->niov, 0, waiocb->buf, qiov->size);
}
@@ -168,6 +171,7 @@ BlockDriverAIOCB *win32_aio_submit(BlockDriverState *bs,
out_dec_count:
aio->count--;
out:
qemu_aio_release(waiocb);
return NULL;
}

View File

@@ -108,7 +108,7 @@ void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
nbd_export_set_name(exp, device);
n = g_malloc0(sizeof(NBDCloseNotifier));
n = g_new0(NBDCloseNotifier, 1);
n->n.notify = nbd_close_notifier;
n->exp = exp;
bdrv_add_close_notifier(bs, &n->n);

View File

@@ -1094,7 +1094,7 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
return NULL;
}
info = g_malloc0(sizeof(SnapshotInfo));
info = g_new0(SnapshotInfo, 1);
info->id = g_strdup(sn.id_str);
info->name = g_strdup(sn.name);
info->date_nsec = sn.date_nsec;
@@ -1757,6 +1757,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
{
const char *id = qdict_get_str(qdict, "id");
BlockDriverState *bs;
AioContext *aio_context;
Error *local_err = NULL;
bs = bdrv_find(id);
@@ -1764,9 +1765,14 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
error_report("Device '%s' not found", id);
return -1;
}
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, &local_err)) {
error_report("%s", error_get_pretty(local_err));
error_free(local_err);
aio_context_release(aio_context);
return -1;
}
@@ -1790,6 +1796,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
drive_del(drive_get_by_blockdev(bs));
}
aio_context_release(aio_context);
return 0;
}
@@ -1799,6 +1806,7 @@ void qmp_block_resize(bool has_device, const char *device,
{
Error *local_err = NULL;
BlockDriverState *bs;
AioContext *aio_context;
int ret;
bs = bdrv_lookup_bs(has_device ? device : NULL,
@@ -1809,19 +1817,22 @@ void qmp_block_resize(bool has_device, const char *device,
return;
}
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
if (!bdrv_is_first_non_filter(bs)) {
error_set(errp, QERR_FEATURE_DISABLED, "resize");
return;
goto out;
}
if (size < 0) {
error_set(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
return;
goto out;
}
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_RESIZE, NULL)) {
error_set(errp, QERR_DEVICE_IN_USE, device);
return;
goto out;
}
/* complete all in-flight operations before resizing the device */
@@ -1847,6 +1858,9 @@ void qmp_block_resize(bool has_device, const char *device,
error_setg_errno(errp, -ret, "Could not resize");
break;
}
out:
aio_context_release(aio_context);
}
static void block_job_cb(void *opaque, int ret)
@@ -2172,11 +2186,12 @@ void qmp_drive_mirror(const char *device, const char *target,
}
if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
error_set(errp, QERR_INVALID_PARAMETER, device);
error_set(errp, QERR_INVALID_PARAMETER_VALUE, "granularity",
"a value in range [512B, 64MB]");
return;
}
if (granularity & (granularity - 1)) {
error_set(errp, QERR_INVALID_PARAMETER, device);
error_set(errp, QERR_INVALID_PARAMETER_VALUE, "granularity", "power of 2");
return;
}

View File

@@ -205,7 +205,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
if (block_job_is_paused(job)) {
qemu_coroutine_yield();
} else {
co_sleep_ns(type, ns);
co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns);
}
job->busy = true;
}

97
configure vendored
View File

@@ -326,6 +326,7 @@ seccomp=""
glusterfs=""
glusterfs_discard="no"
glusterfs_zerofill="no"
archipelago=""
virtio_blk_data_plane=""
gtk=""
gtkabi=""
@@ -1087,6 +1088,10 @@ for opt do
;;
--enable-glusterfs) glusterfs="yes"
;;
--disable-archipelago) archipelago="no"
;;
--enable-archipelago) archipelago="yes"
;;
--disable-virtio-blk-data-plane) virtio_blk_data_plane="no"
;;
--enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
@@ -1382,6 +1387,8 @@ Advanced options (experts only):
--enable-coroutine-pool enable coroutine freelist (better performance)
--enable-glusterfs enable GlusterFS backend
--disable-glusterfs disable GlusterFS backend
--enable-archipelago enable Archipelago backend
--disable-archipelago disable Archipelago backend
--enable-gcov enable test coverage analysis with gcov
--gcov=GCOV use specified gcov [$gcov_tool]
--disable-tpm disable TPM support
@@ -1723,6 +1730,7 @@ fi
cat > $TMPC <<EOF
#include <sys/socket.h>
#include <linux/ip.h>
int main(void) { return sizeof(struct mmsghdr); }
EOF
if compile_prog "" "" ; then
@@ -3071,6 +3079,33 @@ EOF
fi
fi
##########################################
# archipelago probe
if test "$archipelago" != "no" ; then
cat > $TMPC <<EOF
#include <stdio.h>
#include <xseg/xseg.h>
#include <xseg/protocol.h>
int main(void) {
xseg_initialize();
return 0;
}
EOF
archipelago_libs=-lxseg
if compile_prog "" "$archipelago_libs"; then
archipelago="yes"
libs_tools="$archipelago_libs $libs_tools"
libs_softmmu="$archipelago_libs $libs_softmmu"
else
if test "$archipelago" = "yes" ; then
feature_not_found "Archipelago backend support" "Install libxseg devel"
fi
archipelago="no"
fi
fi
##########################################
# glusterfs probe
if test "$glusterfs" != "no" ; then
@@ -3086,7 +3121,8 @@ if test "$glusterfs" != "no" ; then
fi
else
if test "$glusterfs" = "yes" ; then
feature_not_found "GlusterFS backend support" "Install glusterfs-api devel"
feature_not_found "GlusterFS backend support" \
"Install glusterfs-api devel >= 3"
fi
glusterfs="no"
fi
@@ -3420,6 +3456,37 @@ if compile_prog "" "" ; then
sendfile=yes
fi
# check for timerfd support (glibc 2.8 and newer)
timerfd=no
cat > $TMPC << EOF
#include <sys/timerfd.h>
int main(void)
{
return(timerfd_create(CLOCK_REALTIME, 0));
}
EOF
if compile_prog "" "" ; then
timerfd=yes
fi
# check for setns and unshare support
setns=no
cat > $TMPC << EOF
#include <sched.h>
int main(void)
{
int ret;
ret = setns(0, 0);
ret = unshare(0);
return ret;
}
EOF
if compile_prog "" "" ; then
setns=yes
fi
# Check if tools are available to build documentation.
if test "$docs" != "no" ; then
if has makeinfo && has pod2man; then
@@ -3531,7 +3598,8 @@ EOF
spice_server_version=$($pkg_config --modversion spice-server)
else
if test "$spice" = "yes" ; then
feature_not_found "spice" "Install spice-server and spice-protocol devel"
feature_not_found "spice" \
"Install spice-server(>=0.12.0) and spice-protocol(>=0.12.3) devel"
fi
spice="no"
fi
@@ -3562,7 +3630,7 @@ EOF
smartcard_nss="yes"
else
if test "$smartcard_nss" = "yes"; then
feature_not_found "nss"
feature_not_found "nss" "Install nss devel >= 3.12.8"
fi
smartcard_nss="no"
fi
@@ -3578,7 +3646,7 @@ if test "$libusb" != "no" ; then
libs_softmmu="$libs_softmmu $libusb_libs"
else
if test "$libusb" = "yes"; then
feature_not_found "libusb" "Install libusb devel"
feature_not_found "libusb" "Install libusb devel >= 1.0.13"
fi
libusb="no"
fi
@@ -4003,7 +4071,7 @@ if test "$libnfs" != "no" ; then
LIBS="$LIBS $libnfs_libs"
else
if test "$libnfs" = "yes" ; then
feature_not_found "libnfs"
feature_not_found "libnfs" "Install libnfs devel >= 1.9.3"
fi
libnfs="no"
fi
@@ -4250,6 +4318,7 @@ echo "seccomp support $seccomp"
echo "coroutine backend $coroutine"
echo "coroutine pool $coroutine_pool"
echo "GlusterFS support $glusterfs"
echo "Archipelago support $archipelago"
echo "virtio-blk-data-plane $virtio_blk_data_plane"
echo "gcov $gcov_tool"
echo "gcov enabled $gcov"
@@ -4486,6 +4555,12 @@ fi
if test "$sendfile" = "yes" ; then
echo "CONFIG_SENDFILE=y" >> $config_host_mak
fi
if test "$timerfd" = "yes" ; then
echo "CONFIG_TIMERFD=y" >> $config_host_mak
fi
if test "$setns" = "yes" ; then
echo "CONFIG_SETNS=y" >> $config_host_mak
fi
if test "$inotify" = "yes" ; then
echo "CONFIG_INOTIFY=y" >> $config_host_mak
fi
@@ -4688,6 +4763,11 @@ if test "$glusterfs_zerofill" = "yes" ; then
echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
fi
if test "$archipelago" = "yes" ; then
echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
fi
if test "$libssh2" = "yes" ; then
echo "CONFIG_LIBSSH2=m" >> $config_host_mak
echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
@@ -4965,6 +5045,8 @@ case "$target_name" in
TARGET_BASE_ARCH=mips
echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
;;
tricore)
;;
moxie)
;;
or32)
@@ -5013,6 +5095,7 @@ case "$target_name" in
echo "TARGET_ABI32=y" >> $config_target_mak
;;
s390x)
gdb_xml_files="s390x-core64.xml s390-acr.xml s390-fpr.xml"
;;
unicore32)
;;
@@ -5292,10 +5375,6 @@ for rom in seabios vgabios ; do
echo "LD=$ld" >> $config_mak
done
if test "$docs" = "yes" ; then
mkdir -p QMP
fi
# set up qemu-iotests in this build directory
iotests_common_env="tests/qemu-iotests/common.env"
iotests_check="tests/qemu-iotests/check"

View File

@@ -18,10 +18,114 @@
*/
#include "config.h"
#include "cpu.h"
#include "trace.h"
#include "disas/disas.h"
#include "tcg.h"
#include "qemu/atomic.h"
#include "sysemu/qtest.h"
#include "qemu/timer.h"
/* -icount align implementation. */
typedef struct SyncClocks {
int64_t diff_clk;
int64_t last_cpu_icount;
int64_t realtime_clock;
} SyncClocks;
#if !defined(CONFIG_USER_ONLY)
/* Allow the guest to have a max 3ms advance.
* The difference between the 2 clocks could therefore
* oscillate around 0.
*/
#define VM_CLOCK_ADVANCE 3000000
#define THRESHOLD_REDUCE 1.5
#define MAX_DELAY_PRINT_RATE 2000000000LL
#define MAX_NB_PRINTS 100
static void align_clocks(SyncClocks *sc, const CPUState *cpu)
{
int64_t cpu_icount;
if (!icount_align_option) {
return;
}
cpu_icount = cpu->icount_extra + cpu->icount_decr.u16.low;
sc->diff_clk += cpu_icount_to_ns(sc->last_cpu_icount - cpu_icount);
sc->last_cpu_icount = cpu_icount;
if (sc->diff_clk > VM_CLOCK_ADVANCE) {
#ifndef _WIN32
struct timespec sleep_delay, rem_delay;
sleep_delay.tv_sec = sc->diff_clk / 1000000000LL;
sleep_delay.tv_nsec = sc->diff_clk % 1000000000LL;
if (nanosleep(&sleep_delay, &rem_delay) < 0) {
sc->diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 1000000000LL;
sc->diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec;
} else {
sc->diff_clk = 0;
}
#else
Sleep(sc->diff_clk / SCALE_MS);
sc->diff_clk = 0;
#endif
}
}
static void print_delay(const SyncClocks *sc)
{
static float threshold_delay;
static int64_t last_realtime_clock;
static int nb_prints;
if (icount_align_option &&
sc->realtime_clock - last_realtime_clock >= MAX_DELAY_PRINT_RATE &&
nb_prints < MAX_NB_PRINTS) {
if ((-sc->diff_clk / (float)1000000000LL > threshold_delay) ||
(-sc->diff_clk / (float)1000000000LL <
(threshold_delay - THRESHOLD_REDUCE))) {
threshold_delay = (-sc->diff_clk / 1000000000LL) + 1;
printf("Warning: The guest is now late by %.1f to %.1f seconds\n",
threshold_delay - 1,
threshold_delay);
nb_prints++;
last_realtime_clock = sc->realtime_clock;
}
}
}
static void init_delay_params(SyncClocks *sc,
const CPUState *cpu)
{
if (!icount_align_option) {
return;
}
sc->realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
sc->diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
sc->realtime_clock +
cpu_get_clock_offset();
sc->last_cpu_icount = cpu->icount_extra + cpu->icount_decr.u16.low;
if (sc->diff_clk < max_delay) {
max_delay = sc->diff_clk;
}
if (sc->diff_clk > max_advance) {
max_advance = sc->diff_clk;
}
/* Print every 2s max if the guest is late. We limit the number
of printed messages to NB_PRINT_MAX(currently 100) */
print_delay(sc);
}
#else
static void align_clocks(SyncClocks *sc, const CPUState *cpu)
{
}
static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
{
}
#endif /* CONFIG USER ONLY */
void cpu_loop_exit(CPUState *cpu)
{
@@ -65,6 +169,9 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
#endif /* DEBUG_DISAS */
next_tb = tcg_qemu_tb_exec(env, tb_ptr);
trace_exec_tb_exit((void *) (next_tb & ~TB_EXIT_MASK),
next_tb & TB_EXIT_MASK);
if ((next_tb & TB_EXIT_MASK) > TB_EXIT_IDX1) {
/* We didn't start executing this TB (eg because the instruction
* counter hit zero); we must restore the guest PC to the address
@@ -105,6 +212,7 @@ static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
max_cycles);
cpu->current_tb = tb;
/* execute the generated code */
trace_exec_tb_nocache(tb, tb->pc);
cpu_tb_exec(cpu, tb->tc_ptr);
cpu->current_tb = NULL;
tb_phys_invalidate(tb, -1);
@@ -227,6 +335,8 @@ int cpu_exec(CPUArchState *env)
TranslationBlock *tb;
uint8_t *tc_ptr;
uintptr_t next_tb;
SyncClocks sc;
/* This must be volatile so it is not trashed by longjmp() */
volatile bool have_tb_lock = false;
@@ -277,12 +387,20 @@ int cpu_exec(CPUArchState *env)
#elif defined(TARGET_CRIS)
#elif defined(TARGET_S390X)
#elif defined(TARGET_XTENSA)
#elif defined(TARGET_TRICORE)
/* XXXXX */
#else
#error unsupported target CPU
#endif
cpu->exception_index = -1;
/* Calculate difference between guest clock and host clock.
* This delay includes the delay of the last cycle, so
* what we have to do is sleep until it is 0. As for the
* advance/delay we gain here, we try to fix it next time.
*/
init_delay_params(&sc, cpu);
/* prepare setjmp context for exception handling */
for(;;) {
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
@@ -327,7 +445,8 @@ int cpu_exec(CPUArchState *env)
}
#if defined(TARGET_ARM) || defined(TARGET_SPARC) || defined(TARGET_MIPS) || \
defined(TARGET_PPC) || defined(TARGET_ALPHA) || defined(TARGET_CRIS) || \
defined(TARGET_MICROBLAZE) || defined(TARGET_LM32) || defined(TARGET_UNICORE32)
defined(TARGET_MICROBLAZE) || defined(TARGET_LM32) || \
defined(TARGET_UNICORE32) || defined(TARGET_TRICORE)
if (interrupt_request & CPU_INTERRUPT_HALT) {
cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
cpu->halted = 1;
@@ -443,6 +562,12 @@ int cpu_exec(CPUArchState *env)
cc->do_interrupt(cpu);
next_tb = 0;
}
#elif defined(TARGET_TRICORE)
if ((interrupt_request & CPU_INTERRUPT_HARD)) {
cc->do_interrupt(cpu);
next_tb = 0;
}
#elif defined(TARGET_OPENRISC)
{
int idx = -1;
@@ -637,6 +762,7 @@ int cpu_exec(CPUArchState *env)
cpu->current_tb = tb;
barrier();
if (likely(!cpu->exit_request)) {
trace_exec_tb(tb, tb->pc);
tc_ptr = tb->tc_ptr;
/* execute the generated code */
next_tb = cpu_tb_exec(cpu, tc_ptr);
@@ -672,6 +798,7 @@ int cpu_exec(CPUArchState *env)
if (insns_left > 0) {
/* Execute remaining instructions. */
cpu_exec_nocache(env, insns_left, tb);
align_clocks(&sc, cpu);
}
cpu->exception_index = EXCP_INTERRUPT;
next_tb = 0;
@@ -684,6 +811,9 @@ int cpu_exec(CPUArchState *env)
}
}
cpu->current_tb = NULL;
/* Try to align the host and virtual clocks
if the guest is in advance */
align_clocks(&sc, cpu);
/* reset soft MMU for next block (it can currently
only be set by a memory fault) */
} /* for(;;) */
@@ -724,6 +854,7 @@ int cpu_exec(CPUArchState *env)
| env->cc_dest | (env->cc_x << 4);
#elif defined(TARGET_MICROBLAZE)
#elif defined(TARGET_MIPS)
#elif defined(TARGET_TRICORE)
#elif defined(TARGET_MOXIE)
#elif defined(TARGET_OPENRISC)
#elif defined(TARGET_SH4)

141
cpus.c
View File

@@ -40,6 +40,7 @@
#include "qemu/bitmap.h"
#include "qemu/seqlock.h"
#include "qapi-event.h"
#include "hw/nmi.h"
#ifndef _WIN32
#include "qemu/compatfd.h"
@@ -64,6 +65,8 @@
#endif /* CONFIG_LINUX */
static CPUState *next_cpu;
int64_t max_delay;
int64_t max_advance;
bool cpu_is_stopped(CPUState *cpu)
{
@@ -102,17 +105,12 @@ static bool all_cpu_threads_idle(void)
/* Protected by TimersState seqlock */
/* Compensate for varying guest execution speed. */
static int64_t qemu_icount_bias;
static int64_t vm_clock_warp_start;
static int64_t vm_clock_warp_start = -1;
/* Conversion factor from emulated instructions to virtual clock ticks. */
static int icount_time_shift;
/* Arbitrarily pick 1MIPS as the minimum allowable speed. */
#define MAX_ICOUNT_SHIFT 10
/* Only written by TCG thread */
static int64_t qemu_icount;
static QEMUTimer *icount_rt_timer;
static QEMUTimer *icount_vm_timer;
static QEMUTimer *icount_warp_timer;
@@ -129,6 +127,11 @@ typedef struct TimersState {
int64_t cpu_clock_offset;
int32_t cpu_ticks_enabled;
int64_t dummy;
/* Compensate for varying guest execution speed. */
int64_t qemu_icount_bias;
/* Only written by TCG thread */
int64_t qemu_icount;
} TimersState;
static TimersState timers_state;
@@ -139,14 +142,14 @@ static int64_t cpu_get_icount_locked(void)
int64_t icount;
CPUState *cpu = current_cpu;
icount = qemu_icount;
icount = timers_state.qemu_icount;
if (cpu) {
if (!cpu_can_do_io(cpu)) {
fprintf(stderr, "Bad clock read\n");
}
icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
}
return qemu_icount_bias + (icount << icount_time_shift);
return timers_state.qemu_icount_bias + cpu_icount_to_ns(icount);
}
int64_t cpu_get_icount(void)
@@ -162,6 +165,11 @@ int64_t cpu_get_icount(void)
return icount;
}
int64_t cpu_icount_to_ns(int64_t icount)
{
return icount << icount_time_shift;
}
/* return the host CPU cycle counter and handle stop/restart */
/* Caller must hold the BQL */
int64_t cpu_get_ticks(void)
@@ -214,6 +222,23 @@ int64_t cpu_get_clock(void)
return ti;
}
/* return the offset between the host clock and virtual CPU clock */
int64_t cpu_get_clock_offset(void)
{
int64_t ti;
unsigned start;
do {
start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
ti = timers_state.cpu_clock_offset;
if (!timers_state.cpu_ticks_enabled) {
ti -= get_clock();
}
} while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
return -ti;
}
/* enable cpu_get_ticks()
* Caller must hold BQL which server as mutex for vm_clock_seqlock.
*/
@@ -284,7 +309,8 @@ static void icount_adjust(void)
icount_time_shift++;
}
last_delta = delta;
qemu_icount_bias = cur_icount - (qemu_icount << icount_time_shift);
timers_state.qemu_icount_bias = cur_icount
- (timers_state.qemu_icount << icount_time_shift);
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
}
@@ -333,7 +359,7 @@ static void icount_warp_rt(void *opaque)
int64_t delta = cur_time - cur_icount;
warp_delta = MIN(warp_delta, delta);
}
qemu_icount_bias += warp_delta;
timers_state.qemu_icount_bias += warp_delta;
}
vm_clock_warp_start = -1;
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
@@ -351,7 +377,7 @@ void qtest_clock_warp(int64_t dest)
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
seqlock_write_lock(&timers_state.vm_clock_seqlock);
qemu_icount_bias += warp;
timers_state.qemu_icount_bias += warp;
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
@@ -428,6 +454,25 @@ void qemu_clock_warp(QEMUClockType type)
}
}
static bool icount_state_needed(void *opaque)
{
return use_icount;
}
/*
* This is a subsection for icount migration.
*/
static const VMStateDescription icount_vmstate_timers = {
.name = "timer/icount",
.version_id = 1,
.minimum_version_id = 1,
.fields = (VMStateField[]) {
VMSTATE_INT64(qemu_icount_bias, TimersState),
VMSTATE_INT64(qemu_icount, TimersState),
VMSTATE_END_OF_LIST()
}
};
static const VMStateDescription vmstate_timers = {
.name = "timer",
.version_id = 2,
@@ -437,23 +482,44 @@ static const VMStateDescription vmstate_timers = {
VMSTATE_INT64(dummy, TimersState),
VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
VMSTATE_END_OF_LIST()
},
.subsections = (VMStateSubsection[]) {
{
.vmsd = &icount_vmstate_timers,
.needed = icount_state_needed,
}, {
/* empty */
}
}
};
void configure_icount(const char *option)
void configure_icount(QemuOpts *opts, Error **errp)
{
const char *option;
char *rem_str = NULL;
seqlock_init(&timers_state.vm_clock_seqlock, NULL);
vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
option = qemu_opt_get(opts, "shift");
if (!option) {
if (qemu_opt_get(opts, "align") != NULL) {
error_setg(errp, "Please specify shift option when using align");
}
return;
}
icount_align_option = qemu_opt_get_bool(opts, "align", false);
icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
icount_warp_rt, NULL);
if (strcmp(option, "auto") != 0) {
icount_time_shift = strtol(option, NULL, 0);
errno = 0;
icount_time_shift = strtol(option, &rem_str, 0);
if (errno != 0 || *rem_str != '\0' || !strlen(option)) {
error_setg(errp, "icount: Invalid shift value");
}
use_icount = 1;
return;
} else if (icount_align_option) {
error_setg(errp, "shift=auto and align=on are incompatible");
}
use_icount = 2;
@@ -1250,7 +1316,8 @@ static int tcg_cpu_exec(CPUArchState *env)
int64_t count;
int64_t deadline;
int decr;
qemu_icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+ cpu->icount_extra);
cpu->icount_decr.u16.low = 0;
cpu->icount_extra = 0;
deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
@@ -1265,7 +1332,7 @@ static int tcg_cpu_exec(CPUArchState *env)
}
count = qemu_icount_round(deadline);
qemu_icount += count;
timers_state.qemu_icount += count;
decr = (count > 0xffff) ? 0xffff : count;
count -= decr;
cpu->icount_decr.u16.low = decr;
@@ -1278,7 +1345,8 @@ static int tcg_cpu_exec(CPUArchState *env)
if (use_icount) {
/* Fold pending instructions back into the
instruction counter, and clear the interrupt flag. */
qemu_icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+ cpu->icount_extra);
cpu->icount_decr.u32 = 0;
cpu->icount_extra = 0;
}
@@ -1342,6 +1410,9 @@ CpuInfoList *qmp_query_cpus(Error **errp)
#elif defined(TARGET_MIPS)
MIPSCPU *mips_cpu = MIPS_CPU(cpu);
CPUMIPSState *env = &mips_cpu->env;
#elif defined(TARGET_TRICORE)
TriCoreCPU *tricore_cpu = TRICORE_CPU(cpu);
CPUTriCoreState *env = &tricore_cpu->env;
#endif
cpu_synchronize_state(cpu);
@@ -1366,6 +1437,9 @@ CpuInfoList *qmp_query_cpus(Error **errp)
#elif defined(TARGET_MIPS)
info->value->has_PC = true;
info->value->PC = env->active_tc.PC;
#elif defined(TARGET_TRICORE)
info->value->has_PC = true;
info->value->PC = env->PC;
#endif
/* XXX: waiting for the qapi to support GSList */
@@ -1469,21 +1543,24 @@ void qmp_inject_nmi(Error **errp)
apic_deliver_nmi(cpu->apic_state);
}
}
#elif defined(TARGET_S390X)
CPUState *cs;
S390CPU *cpu;
CPU_FOREACH(cs) {
cpu = S390_CPU(cs);
if (cpu->env.cpu_num == monitor_get_cpu_index()) {
if (s390_cpu_restart(S390_CPU(cs)) == -1) {
error_set(errp, QERR_UNSUPPORTED);
return;
}
break;
}
}
#else
error_set(errp, QERR_UNSUPPORTED);
nmi_monitor_handle(monitor_get_cpu_index(), errp);
#endif
}
void dump_drift_info(FILE *f, fprintf_function cpu_fprintf)
{
if (!use_icount) {
return;
}
cpu_fprintf(f, "Host - Guest clock %"PRIi64" ms\n",
(cpu_get_clock() - cpu_get_icount())/SCALE_MS);
if (icount_align_option) {
cpu_fprintf(f, "Max guest delay %"PRIi64" ms\n", -max_delay/SCALE_MS);
cpu_fprintf(f, "Max guest advance %"PRIi64" ms\n", max_advance/SCALE_MS);
} else {
cpu_fprintf(f, "Max guest delay NA\n");
cpu_fprintf(f, "Max guest advance NA\n");
}
}

View File

@@ -60,8 +60,10 @@ void tlb_flush(CPUState *cpu, int flush_global)
cpu->current_tb = NULL;
memset(env->tlb_table, -1, sizeof(env->tlb_table));
memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
env->vtlb_index = 0;
env->tlb_flush_addr = -1;
env->tlb_flush_mask = 0;
tlb_flush_count++;
@@ -108,6 +110,14 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
}
/* check whether there are entries that need to be flushed in the vtlb */
for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
int k;
for (k = 0; k < CPU_VTLB_SIZE; k++) {
tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
}
}
tb_flush_jmp_cache(cpu, addr);
}
@@ -172,6 +182,11 @@ void cpu_tlb_reset_dirty_all(ram_addr_t start1, ram_addr_t length)
tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
start1, length);
}
for (i = 0; i < CPU_VTLB_SIZE; i++) {
tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
start1, length);
}
}
}
}
@@ -195,6 +210,13 @@ void tlb_set_dirty(CPUArchState *env, target_ulong vaddr)
for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
}
for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
int k;
for (k = 0; k < CPU_VTLB_SIZE; k++) {
tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
}
}
}
/* Our TLB does not support large pages, so remember the area covered by
@@ -235,6 +257,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
uintptr_t addend;
CPUTLBEntry *te;
hwaddr iotlb, xlat, sz;
unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
assert(size >= TARGET_PAGE_SIZE);
if (size != TARGET_PAGE_SIZE) {
@@ -267,8 +290,14 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
prot, &address);
index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
env->iotlb[mmu_idx][index] = iotlb - vaddr;
te = &env->tlb_table[mmu_idx][index];
/* do not discard the translation in te, evict it into a victim tlb */
env->tlb_v_table[mmu_idx][vidx] = *te;
env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
/* refill the tlb */
env->iotlb[mmu_idx][index] = iotlb - vaddr;
te->addend = addend - vaddr;
if (prot & PAGE_READ) {
te->addr_read = address;

View File

View File

@@ -20,6 +20,7 @@
#include "config.h"
#include "qemu-common.h"
#include "qemu/error-report.h"
#include "sysemu/device_tree.h"
#include "sysemu/sysemu.h"
#include "hw/loader.h"
@@ -59,13 +60,13 @@ void *create_device_tree(int *sizep)
}
ret = fdt_open_into(fdt, fdt, *sizep);
if (ret) {
fprintf(stderr, "Unable to copy device tree in memory\n");
error_report("Unable to copy device tree in memory");
exit(1);
}
return fdt;
fail:
fprintf(stderr, "%s Couldn't create dt: %s\n", __func__, fdt_strerror(ret));
error_report("%s Couldn't create dt: %s", __func__, fdt_strerror(ret));
exit(1);
}
@@ -79,8 +80,8 @@ void *load_device_tree(const char *filename_path, int *sizep)
*sizep = 0;
dt_size = get_image_size(filename_path);
if (dt_size < 0) {
printf("Unable to get size of device tree file '%s'\n",
filename_path);
error_report("Unable to get size of device tree file '%s'",
filename_path);
goto fail;
}
@@ -92,21 +93,21 @@ void *load_device_tree(const char *filename_path, int *sizep)
dt_file_load_size = load_image(filename_path, fdt);
if (dt_file_load_size < 0) {
printf("Unable to open device tree file '%s'\n",
filename_path);
error_report("Unable to open device tree file '%s'",
filename_path);
goto fail;
}
ret = fdt_open_into(fdt, fdt, dt_size);
if (ret) {
printf("Unable to copy device tree in memory\n");
error_report("Unable to copy device tree in memory");
goto fail;
}
/* Check sanity of device tree */
if (fdt_check_header(fdt)) {
printf ("Device tree file loaded into memory is invalid: %s\n",
filename_path);
error_report("Device tree file loaded into memory is invalid: %s",
filename_path);
goto fail;
}
*sizep = dt_size;
@@ -123,8 +124,8 @@ static int findnode_nofail(void *fdt, const char *node_path)
offset = fdt_path_offset(fdt, node_path);
if (offset < 0) {
fprintf(stderr, "%s Couldn't find node %s: %s\n", __func__, node_path,
fdt_strerror(offset));
error_report("%s Couldn't find node %s: %s", __func__, node_path,
fdt_strerror(offset));
exit(1);
}
@@ -138,8 +139,8 @@ int qemu_fdt_setprop(void *fdt, const char *node_path,
r = fdt_setprop(fdt, findnode_nofail(fdt, node_path), property, val, size);
if (r < 0) {
fprintf(stderr, "%s: Couldn't set %s/%s: %s\n", __func__, node_path,
property, fdt_strerror(r));
error_report("%s: Couldn't set %s/%s: %s", __func__, node_path,
property, fdt_strerror(r));
exit(1);
}
@@ -153,8 +154,8 @@ int qemu_fdt_setprop_cell(void *fdt, const char *node_path,
r = fdt_setprop_cell(fdt, findnode_nofail(fdt, node_path), property, val);
if (r < 0) {
fprintf(stderr, "%s: Couldn't set %s/%s = %#08x: %s\n", __func__,
node_path, property, val, fdt_strerror(r));
error_report("%s: Couldn't set %s/%s = %#08x: %s", __func__,
node_path, property, val, fdt_strerror(r));
exit(1);
}
@@ -175,8 +176,8 @@ int qemu_fdt_setprop_string(void *fdt, const char *node_path,
r = fdt_setprop_string(fdt, findnode_nofail(fdt, node_path), property, string);
if (r < 0) {
fprintf(stderr, "%s: Couldn't set %s/%s = %s: %s\n", __func__,
node_path, property, string, fdt_strerror(r));
error_report("%s: Couldn't set %s/%s = %s: %s", __func__,
node_path, property, string, fdt_strerror(r));
exit(1);
}
@@ -193,8 +194,8 @@ const void *qemu_fdt_getprop(void *fdt, const char *node_path,
}
r = fdt_getprop(fdt, findnode_nofail(fdt, node_path), property, lenp);
if (!r) {
fprintf(stderr, "%s: Couldn't get %s/%s: %s\n", __func__,
node_path, property, fdt_strerror(*lenp));
error_report("%s: Couldn't get %s/%s: %s", __func__,
node_path, property, fdt_strerror(*lenp));
exit(1);
}
return r;
@@ -206,8 +207,8 @@ uint32_t qemu_fdt_getprop_cell(void *fdt, const char *node_path,
int len;
const uint32_t *p = qemu_fdt_getprop(fdt, node_path, property, &len);
if (len != 4) {
fprintf(stderr, "%s: %s/%s not 4 bytes long (not a cell?)\n",
__func__, node_path, property);
error_report("%s: %s/%s not 4 bytes long (not a cell?)",
__func__, node_path, property);
exit(1);
}
return be32_to_cpu(*p);
@@ -219,8 +220,8 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path)
r = fdt_get_phandle(fdt, findnode_nofail(fdt, path));
if (r == 0) {
fprintf(stderr, "%s: Couldn't get phandle for %s: %s\n", __func__,
path, fdt_strerror(r));
error_report("%s: Couldn't get phandle for %s: %s", __func__,
path, fdt_strerror(r));
exit(1);
}
@@ -265,8 +266,8 @@ int qemu_fdt_nop_node(void *fdt, const char *node_path)
r = fdt_nop_node(fdt, findnode_nofail(fdt, node_path));
if (r < 0) {
fprintf(stderr, "%s: Couldn't nop node %s: %s\n", __func__, node_path,
fdt_strerror(r));
error_report("%s: Couldn't nop node %s: %s", __func__, node_path,
fdt_strerror(r));
exit(1);
}
@@ -294,8 +295,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
retval = fdt_add_subnode(fdt, parent, basename);
if (retval < 0) {
fprintf(stderr, "FDT: Failed to create subnode %s: %s\n", name,
fdt_strerror(retval));
error_report("FDT: Failed to create subnode %s: %s", name,
fdt_strerror(retval));
exit(1);
}

View File

@@ -2,7 +2,7 @@
The code in this directory is a subset of libvixl:
https://github.com/armvixl/vixl
(specifically, it is the set of files needed for disassembly only,
taken from libvixl 1.4).
taken from libvixl 1.5).
Bugfixes should preferably be sent upstream initially.
The disassembler does not currently support the entire A64 instruction

View File

@@ -28,6 +28,7 @@
#define VIXL_A64_ASSEMBLER_A64_H_
#include <list>
#include <stack>
#include "globals.h"
#include "utils.h"
@@ -574,34 +575,107 @@ class MemOperand {
class Label {
public:
Label() : is_bound_(false), link_(NULL), target_(NULL) {}
Label() : location_(kLocationUnbound) {}
~Label() {
// If the label has been linked to, it needs to be bound to a target.
VIXL_ASSERT(!IsLinked() || IsBound());
}
inline Instruction* link() const { return link_; }
inline Instruction* target() const { return target_; }
inline bool IsBound() const { return is_bound_; }
inline bool IsLinked() const { return link_ != NULL; }
inline void set_link(Instruction* new_link) { link_ = new_link; }
static const int kEndOfChain = 0;
inline bool IsBound() const { return location_ >= 0; }
inline bool IsLinked() const { return !links_.empty(); }
private:
// Indicates if the label has been bound, ie its location is fixed.
bool is_bound_;
// Branches instructions branching to this label form a chained list, with
// their offset indicating where the next instruction is located.
// link_ points to the latest branch instruction generated branching to this
// branch.
// If link_ is not NULL, the label has been linked to.
Instruction* link_;
// The label location.
Instruction* target_;
// The list of linked instructions is stored in a stack-like structure. We
// don't use std::stack directly because it's slow for the common case where
// only one or two instructions refer to a label, and labels themselves are
// short-lived. This class behaves like std::stack, but the first few links
// are preallocated (configured by kPreallocatedLinks).
//
// If more than N links are required, this falls back to std::stack.
class LinksStack {
public:
LinksStack() : size_(0), links_extended_(NULL) {}
~LinksStack() {
delete links_extended_;
}
size_t size() const {
return size_;
}
bool empty() const {
return size_ == 0;
}
void push(ptrdiff_t value) {
if (size_ < kPreallocatedLinks) {
links_[size_] = value;
} else {
if (links_extended_ == NULL) {
links_extended_ = new std::stack<ptrdiff_t>();
}
VIXL_ASSERT(size_ == (links_extended_->size() + kPreallocatedLinks));
links_extended_->push(value);
}
size_++;
}
ptrdiff_t top() const {
return (size_ <= kPreallocatedLinks) ? links_[size_ - 1]
: links_extended_->top();
}
void pop() {
size_--;
if (size_ >= kPreallocatedLinks) {
links_extended_->pop();
VIXL_ASSERT(size_ == (links_extended_->size() + kPreallocatedLinks));
}
}
private:
static const size_t kPreallocatedLinks = 4;
size_t size_;
ptrdiff_t links_[kPreallocatedLinks];
std::stack<ptrdiff_t> * links_extended_;
};
inline ptrdiff_t location() const { return location_; }
inline void Bind(ptrdiff_t location) {
// Labels can only be bound once.
VIXL_ASSERT(!IsBound());
location_ = location;
}
inline void AddLink(ptrdiff_t instruction) {
// If a label is bound, the assembler already has the information it needs
// to write the instruction, so there is no need to add it to links_.
VIXL_ASSERT(!IsBound());
links_.push(instruction);
}
inline ptrdiff_t GetAndRemoveNextLink() {
VIXL_ASSERT(IsLinked());
ptrdiff_t link = links_.top();
links_.pop();
return link;
}
// The offsets of the instructions that have linked to this label.
LinksStack links_;
// The label location.
ptrdiff_t location_;
static const ptrdiff_t kLocationUnbound = -1;
// It is not safe to copy labels, so disable the copy constructor by declaring
// it private (without an implementation).
Label(const Label&);
// The Assembler class is responsible for binding and linking labels, since
// the stored offsets need to be consistent with the Assembler's buffer.
friend class Assembler;
};
@@ -635,10 +709,49 @@ class Literal {
};
// Control whether or not position-independent code should be emitted.
enum PositionIndependentCodeOption {
// All code generated will be position-independent; all branches and
// references to labels generated with the Label class will use PC-relative
// addressing.
PositionIndependentCode,
// Allow VIXL to generate code that refers to absolute addresses. With this
// option, it will not be possible to copy the code buffer and run it from a
// different address; code must be generated in its final location.
PositionDependentCode,
// Allow VIXL to assume that the bottom 12 bits of the address will be
// constant, but that the top 48 bits may change. This allows `adrp` to
// function in systems which copy code between pages, but otherwise maintain
// 4KB page alignment.
PageOffsetDependentCode
};
// Control how scaled- and unscaled-offset loads and stores are generated.
enum LoadStoreScalingOption {
// Prefer scaled-immediate-offset instructions, but emit unscaled-offset,
// register-offset, pre-index or post-index instructions if necessary.
PreferScaledOffset,
// Prefer unscaled-immediate-offset instructions, but emit scaled-offset,
// register-offset, pre-index or post-index instructions if necessary.
PreferUnscaledOffset,
// Require scaled-immediate-offset instructions.
RequireScaledOffset,
// Require unscaled-immediate-offset instructions.
RequireUnscaledOffset
};
// Assembler.
class Assembler {
public:
Assembler(byte* buffer, unsigned buffer_size);
Assembler(byte* buffer, unsigned buffer_size,
PositionIndependentCodeOption pic = PositionIndependentCode);
// The destructor asserts that one of the following is true:
// * The Assembler object has not been used.
@@ -662,12 +775,15 @@ class Assembler {
// Label.
// Bind a label to the current PC.
void bind(Label* label);
int UpdateAndGetByteOffsetTo(Label* label);
inline int UpdateAndGetInstructionOffsetTo(Label* label) {
VIXL_ASSERT(Label::kEndOfChain == 0);
return UpdateAndGetByteOffsetTo(label) >> kInstructionSizeLog2;
}
// Return the address of a bound label.
template <typename T>
inline T GetLabelAddress(const Label * label) {
VIXL_ASSERT(label->IsBound());
VIXL_STATIC_ASSERT(sizeof(T) >= sizeof(uintptr_t));
VIXL_STATIC_ASSERT(sizeof(*buffer_) == 1);
return reinterpret_cast<T>(buffer_ + label->location());
}
// Instruction set functions.
@@ -733,6 +849,12 @@ class Assembler {
// Calculate the address of a PC offset.
void adr(const Register& rd, int imm21);
// Calculate the page address of a label.
void adrp(const Register& rd, Label* label);
// Calculate the page address of a PC offset.
void adrp(const Register& rd, int imm21);
// Data Processing instructions.
// Add.
void add(const Register& rd,
@@ -1112,31 +1234,76 @@ class Assembler {
// Memory instructions.
// Load integer or FP register.
void ldr(const CPURegister& rt, const MemOperand& src);
void ldr(const CPURegister& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Store integer or FP register.
void str(const CPURegister& rt, const MemOperand& dst);
void str(const CPURegister& rt, const MemOperand& dst,
LoadStoreScalingOption option = PreferScaledOffset);
// Load word with sign extension.
void ldrsw(const Register& rt, const MemOperand& src);
void ldrsw(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Load byte.
void ldrb(const Register& rt, const MemOperand& src);
void ldrb(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Store byte.
void strb(const Register& rt, const MemOperand& dst);
void strb(const Register& rt, const MemOperand& dst,
LoadStoreScalingOption option = PreferScaledOffset);
// Load byte with sign extension.
void ldrsb(const Register& rt, const MemOperand& src);
void ldrsb(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Load half-word.
void ldrh(const Register& rt, const MemOperand& src);
void ldrh(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Store half-word.
void strh(const Register& rt, const MemOperand& dst);
void strh(const Register& rt, const MemOperand& dst,
LoadStoreScalingOption option = PreferScaledOffset);
// Load half-word with sign extension.
void ldrsh(const Register& rt, const MemOperand& src);
void ldrsh(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferScaledOffset);
// Load integer or FP register (with unscaled offset).
void ldur(const CPURegister& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Store integer or FP register (with unscaled offset).
void stur(const CPURegister& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load word with sign extension.
void ldursw(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load byte (with unscaled offset).
void ldurb(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Store byte (with unscaled offset).
void sturb(const Register& rt, const MemOperand& dst,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load byte with sign extension (and unscaled offset).
void ldursb(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load half-word (with unscaled offset).
void ldurh(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Store half-word (with unscaled offset).
void sturh(const Register& rt, const MemOperand& dst,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load half-word with sign extension (and unscaled offset).
void ldursh(const Register& rt, const MemOperand& src,
LoadStoreScalingOption option = PreferUnscaledOffset);
// Load integer or FP register pair.
void ldp(const CPURegister& rt, const CPURegister& rt2,
@@ -1166,6 +1333,79 @@ class Assembler {
// Load single precision floating point literal to FP register.
void ldr(const FPRegister& ft, float imm);
// Store exclusive byte.
void stxrb(const Register& rs, const Register& rt, const MemOperand& dst);
// Store exclusive half-word.
void stxrh(const Register& rs, const Register& rt, const MemOperand& dst);
// Store exclusive register.
void stxr(const Register& rs, const Register& rt, const MemOperand& dst);
// Load exclusive byte.
void ldxrb(const Register& rt, const MemOperand& src);
// Load exclusive half-word.
void ldxrh(const Register& rt, const MemOperand& src);
// Load exclusive register.
void ldxr(const Register& rt, const MemOperand& src);
// Store exclusive register pair.
void stxp(const Register& rs,
const Register& rt,
const Register& rt2,
const MemOperand& dst);
// Load exclusive register pair.
void ldxp(const Register& rt, const Register& rt2, const MemOperand& src);
// Store-release exclusive byte.
void stlxrb(const Register& rs, const Register& rt, const MemOperand& dst);
// Store-release exclusive half-word.
void stlxrh(const Register& rs, const Register& rt, const MemOperand& dst);
// Store-release exclusive register.
void stlxr(const Register& rs, const Register& rt, const MemOperand& dst);
// Load-acquire exclusive byte.
void ldaxrb(const Register& rt, const MemOperand& src);
// Load-acquire exclusive half-word.
void ldaxrh(const Register& rt, const MemOperand& src);
// Load-acquire exclusive register.
void ldaxr(const Register& rt, const MemOperand& src);
// Store-release exclusive register pair.
void stlxp(const Register& rs,
const Register& rt,
const Register& rt2,
const MemOperand& dst);
// Load-acquire exclusive register pair.
void ldaxp(const Register& rt, const Register& rt2, const MemOperand& src);
// Store-release byte.
void stlrb(const Register& rt, const MemOperand& dst);
// Store-release half-word.
void stlrh(const Register& rt, const MemOperand& dst);
// Store-release register.
void stlr(const Register& rt, const MemOperand& dst);
// Load-acquire byte.
void ldarb(const Register& rt, const MemOperand& src);
// Load-acquire half-word.
void ldarh(const Register& rt, const MemOperand& src);
// Load-acquire register.
void ldar(const Register& rt, const MemOperand& src);
// Move instructions. The default shift of -1 indicates that the move
// instruction will calculate an appropriate 16-bit immediate and left shift
// that is equal to the 64-bit immediate argument. If an explicit left shift
@@ -1214,6 +1454,9 @@ class Assembler {
// System hint.
void hint(SystemHint code);
// Clear exclusive monitor.
void clrex(int imm4 = 0xf);
// Data memory barrier.
void dmb(BarrierDomain domain, BarrierType type);
@@ -1429,6 +1672,11 @@ class Assembler {
return rt2.code() << Rt2_offset;
}
static Instr Rs(CPURegister rs) {
VIXL_ASSERT(rs.code() != kSPRegInternalCode);
return rs.code() << Rs_offset;
}
// These encoding functions allow the stack pointer to be encoded, and
// disallow the zero register.
static Instr RdSP(Register rd) {
@@ -1619,6 +1867,11 @@ class Assembler {
return imm7 << ImmHint_offset;
}
static Instr CRm(int imm4) {
VIXL_ASSERT(is_uint4(imm4));
return imm4 << CRm_offset;
}
static Instr ImmBarrierDomain(int imm2) {
VIXL_ASSERT(is_uint2(imm2));
return imm2 << ImmBarrierDomain_offset;
@@ -1660,16 +1913,20 @@ class Assembler {
}
// Size of the code generated in bytes
uint64_t SizeOfCodeGenerated() const {
size_t SizeOfCodeGenerated() const {
VIXL_ASSERT((pc_ >= buffer_) && (pc_ < (buffer_ + buffer_size_)));
return pc_ - buffer_;
}
// Size of the code generated since label to the current position.
uint64_t SizeOfCodeGeneratedSince(Label* label) const {
size_t SizeOfCodeGeneratedSince(Label* label) const {
size_t pc_offset = SizeOfCodeGenerated();
VIXL_ASSERT(label->IsBound());
VIXL_ASSERT((pc_ >= label->target()) && (pc_ < (buffer_ + buffer_size_)));
return pc_ - label->target();
VIXL_ASSERT(pc_offset >= static_cast<size_t>(label->location()));
VIXL_ASSERT(pc_offset < buffer_size_);
return pc_offset - label->location();
}
@@ -1693,6 +1950,15 @@ class Assembler {
void EmitLiteralPool(LiteralPoolEmitOption option = NoJumpRequired);
size_t LiteralPoolSize();
inline PositionIndependentCodeOption pic() {
return pic_;
}
inline bool AllowPageOffsetDependentCode() {
return (pic() == PageOffsetDependentCode) ||
(pic() == PositionDependentCode);
}
protected:
inline const Register& AppropriateZeroRegFor(const CPURegister& reg) const {
return reg.Is64Bits() ? xzr : wzr;
@@ -1701,7 +1967,8 @@ class Assembler {
void LoadStore(const CPURegister& rt,
const MemOperand& addr,
LoadStoreOp op);
LoadStoreOp op,
LoadStoreScalingOption option = PreferScaledOffset);
static bool IsImmLSUnscaled(ptrdiff_t offset);
static bool IsImmLSScaled(ptrdiff_t offset, LSDataSize size);
@@ -1717,9 +1984,9 @@ class Assembler {
LogicalOp op);
static bool IsImmLogical(uint64_t value,
unsigned width,
unsigned* n,
unsigned* imm_s,
unsigned* imm_r);
unsigned* n = NULL,
unsigned* imm_s = NULL,
unsigned* imm_r = NULL);
void ConditionalCompare(const Register& rn,
const Operand& operand,
@@ -1823,6 +2090,17 @@ class Assembler {
void RecordLiteral(int64_t imm, unsigned size);
// Link the current (not-yet-emitted) instruction to the specified label, then
// return an offset to be encoded in the instruction. If the label is not yet
// bound, an offset of 0 is returned.
ptrdiff_t LinkAndGetByteOffsetTo(Label * label);
ptrdiff_t LinkAndGetInstructionOffsetTo(Label * label);
ptrdiff_t LinkAndGetPageOffsetTo(Label * label);
// A common implementation for the LinkAndGet<Type>OffsetTo helpers.
template <int element_size>
ptrdiff_t LinkAndGetOffsetTo(Label* label);
// Emit the instruction at pc_.
void Emit(Instr instruction) {
VIXL_STATIC_ASSERT(sizeof(*pc_) == 1);
@@ -1864,12 +2142,15 @@ class Assembler {
// The buffer into which code and relocation info are generated.
Instruction* buffer_;
// Buffer size, in bytes.
unsigned buffer_size_;
size_t buffer_size_;
Instruction* pc_;
std::list<Literal*> literals_;
Instruction* next_literal_pool_check_;
unsigned literal_pool_monitor_;
PositionIndependentCodeOption pic_;
friend class Label;
friend class BlockLiteralPoolScope;
#ifdef DEBUG

View File

@@ -46,13 +46,13 @@ R(24) R(25) R(26) R(27) R(28) R(29) R(30) R(31)
#define INSTRUCTION_FIELDS_LIST(V_) \
/* Register fields */ \
V_(Rd, 4, 0, Bits) /* Destination register. */ \
V_(Rn, 9, 5, Bits) /* First source register. */ \
V_(Rm, 20, 16, Bits) /* Second source register. */ \
V_(Ra, 14, 10, Bits) /* Third source register. */ \
V_(Rt, 4, 0, Bits) /* Load dest / store source. */ \
V_(Rt2, 14, 10, Bits) /* Load second dest / */ \
/* store second source. */ \
V_(Rd, 4, 0, Bits) /* Destination register. */ \
V_(Rn, 9, 5, Bits) /* First source register. */ \
V_(Rm, 20, 16, Bits) /* Second source register. */ \
V_(Ra, 14, 10, Bits) /* Third source register. */ \
V_(Rt, 4, 0, Bits) /* Load/store register. */ \
V_(Rt2, 14, 10, Bits) /* Load/store second register. */ \
V_(Rs, 20, 16, Bits) /* Exclusive access status. */ \
V_(PrefetchMode, 4, 0, Bits) \
\
/* Common bits */ \
@@ -126,6 +126,13 @@ V_(SysOp1, 18, 16, Bits) \
V_(SysOp2, 7, 5, Bits) \
V_(CRn, 15, 12, Bits) \
V_(CRm, 11, 8, Bits) \
\
/* Load-/store-exclusive */ \
V_(LdStXLoad, 22, 22, Bits) \
V_(LdStXNotExclusive, 23, 23, Bits) \
V_(LdStXAcquireRelease, 15, 15, Bits) \
V_(LdStXSizeLog2, 31, 30, Bits) \
V_(LdStXPair, 21, 21, Bits) \
#define SYSTEM_REGISTER_FIELDS_LIST(V_, M_) \
@@ -585,6 +592,13 @@ enum MemBarrierOp {
ISB = MemBarrierFixed | 0x00000040
};
enum SystemExclusiveMonitorOp {
SystemExclusiveMonitorFixed = 0xD503305F,
SystemExclusiveMonitorFMask = 0xFFFFF0FF,
SystemExclusiveMonitorMask = 0xFFFFF0FF,
CLREX = SystemExclusiveMonitorFixed
};
// Any load or store.
enum LoadStoreAnyOp {
LoadStoreAnyFMask = 0x0a000000,
@@ -702,7 +716,7 @@ enum LoadStoreUnscaledOffsetOp {
// Load/store (post, pre, offset and unsigned.)
enum LoadStoreOp {
LoadStoreOpMask = 0xC4C00000,
LoadStoreOpMask = 0xC4C00000,
#define LOAD_STORE(A, B, C, D) \
A##B##_##C = D
LOAD_STORE_OP_LIST(LOAD_STORE),
@@ -756,6 +770,44 @@ enum LoadStoreRegisterOffset {
#undef LOAD_STORE_REGISTER_OFFSET
};
enum LoadStoreExclusive {
LoadStoreExclusiveFixed = 0x08000000,
LoadStoreExclusiveFMask = 0x3F000000,
LoadStoreExclusiveMask = 0xFFE08000,
STXRB_w = LoadStoreExclusiveFixed | 0x00000000,
STXRH_w = LoadStoreExclusiveFixed | 0x40000000,
STXR_w = LoadStoreExclusiveFixed | 0x80000000,
STXR_x = LoadStoreExclusiveFixed | 0xC0000000,
LDXRB_w = LoadStoreExclusiveFixed | 0x00400000,
LDXRH_w = LoadStoreExclusiveFixed | 0x40400000,
LDXR_w = LoadStoreExclusiveFixed | 0x80400000,
LDXR_x = LoadStoreExclusiveFixed | 0xC0400000,
STXP_w = LoadStoreExclusiveFixed | 0x80200000,
STXP_x = LoadStoreExclusiveFixed | 0xC0200000,
LDXP_w = LoadStoreExclusiveFixed | 0x80600000,
LDXP_x = LoadStoreExclusiveFixed | 0xC0600000,
STLXRB_w = LoadStoreExclusiveFixed | 0x00008000,
STLXRH_w = LoadStoreExclusiveFixed | 0x40008000,
STLXR_w = LoadStoreExclusiveFixed | 0x80008000,
STLXR_x = LoadStoreExclusiveFixed | 0xC0008000,
LDAXRB_w = LoadStoreExclusiveFixed | 0x00408000,
LDAXRH_w = LoadStoreExclusiveFixed | 0x40408000,
LDAXR_w = LoadStoreExclusiveFixed | 0x80408000,
LDAXR_x = LoadStoreExclusiveFixed | 0xC0408000,
STLXP_w = LoadStoreExclusiveFixed | 0x80208000,
STLXP_x = LoadStoreExclusiveFixed | 0xC0208000,
LDAXP_w = LoadStoreExclusiveFixed | 0x80608000,
LDAXP_x = LoadStoreExclusiveFixed | 0xC0608000,
STLRB_w = LoadStoreExclusiveFixed | 0x00808000,
STLRH_w = LoadStoreExclusiveFixed | 0x40808000,
STLR_w = LoadStoreExclusiveFixed | 0x80808000,
STLR_x = LoadStoreExclusiveFixed | 0xC0808000,
LDARB_w = LoadStoreExclusiveFixed | 0x00C08000,
LDARH_w = LoadStoreExclusiveFixed | 0x40C08000,
LDAR_w = LoadStoreExclusiveFixed | 0x80C08000,
LDAR_x = LoadStoreExclusiveFixed | 0xC0C08000
};
// Conditional compare.
enum ConditionalCompareOp {
ConditionalCompareMask = 0x60000000,

View File

@@ -28,6 +28,7 @@
#define VIXL_CPU_A64_H
#include "globals.h"
#include "instructions-a64.h"
namespace vixl {
@@ -42,6 +43,32 @@ class CPU {
// safely run.
static void EnsureIAndDCacheCoherency(void *address, size_t length);
// Handle tagged pointers.
template <typename T>
static T SetPointerTag(T pointer, uint64_t tag) {
VIXL_ASSERT(is_uintn(kAddressTagWidth, tag));
// Use C-style casts to get static_cast behaviour for integral types (T),
// and reinterpret_cast behaviour for other types.
uint64_t raw = (uint64_t)pointer;
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(raw));
raw = (raw & ~kAddressTagMask) | (tag << kAddressTagOffset);
return (T)raw;
}
template <typename T>
static uint64_t GetPointerTag(T pointer) {
// Use C-style casts to get static_cast behaviour for integral types (T),
// and reinterpret_cast behaviour for other types.
uint64_t raw = (uint64_t)pointer;
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(raw));
return (raw & kAddressTagMask) >> kAddressTagOffset;
}
private:
// Return the content of the cache type register.
static uint32_t GetCacheType();

View File

@@ -171,9 +171,9 @@ void Decoder::DecodePCRelAddressing(Instruction* instr) {
void Decoder::DecodeBranchSystemException(Instruction* instr) {
VIXL_ASSERT((instr->Bits(27, 24) == 0x4) ||
(instr->Bits(27, 24) == 0x5) ||
(instr->Bits(27, 24) == 0x6) ||
(instr->Bits(27, 24) == 0x7) );
(instr->Bits(27, 24) == 0x5) ||
(instr->Bits(27, 24) == 0x6) ||
(instr->Bits(27, 24) == 0x7) );
switch (instr->Bits(31, 29)) {
case 0:
@@ -272,16 +272,15 @@ void Decoder::DecodeBranchSystemException(Instruction* instr) {
void Decoder::DecodeLoadStore(Instruction* instr) {
VIXL_ASSERT((instr->Bits(27, 24) == 0x8) ||
(instr->Bits(27, 24) == 0x9) ||
(instr->Bits(27, 24) == 0xC) ||
(instr->Bits(27, 24) == 0xD) );
(instr->Bits(27, 24) == 0x9) ||
(instr->Bits(27, 24) == 0xC) ||
(instr->Bits(27, 24) == 0xD) );
if (instr->Bit(24) == 0) {
if (instr->Bit(28) == 0) {
if (instr->Bit(29) == 0) {
if (instr->Bit(26) == 0) {
// TODO: VisitLoadStoreExclusive.
VisitUnimplemented(instr);
VisitLoadStoreExclusive(instr);
} else {
DecodeAdvSIMDLoadStore(instr);
}

View File

@@ -59,6 +59,7 @@
V(LoadStorePreIndex) \
V(LoadStoreRegisterOffset) \
V(LoadStoreUnsignedOffset) \
V(LoadStoreExclusive) \
V(LogicalShifted) \
V(AddSubShifted) \
V(AddSubExtended) \

View File

@@ -24,6 +24,7 @@
// OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <cstdlib>
#include "a64/disasm-a64.h"
namespace vixl {
@@ -529,7 +530,7 @@ void Disassembler::VisitExtract(Instruction* instr) {
void Disassembler::VisitPCRelAddressing(Instruction* instr) {
switch (instr->Mask(PCRelAddressingMask)) {
case ADR: Format(instr, "adr", "'Xd, 'AddrPCRelByte"); break;
// ADRP is not implemented.
case ADRP: Format(instr, "adrp", "'Xd, 'AddrPCRelPage"); break;
default: Format(instr, "unimplemented", "(PCRelAddressing)");
}
}
@@ -943,6 +944,49 @@ void Disassembler::VisitLoadStorePairNonTemporal(Instruction* instr) {
}
void Disassembler::VisitLoadStoreExclusive(Instruction* instr) {
const char *mnemonic = "unimplemented";
const char *form;
switch (instr->Mask(LoadStoreExclusiveMask)) {
case STXRB_w: mnemonic = "stxrb"; form = "'Ws, 'Wt, ['Xns]"; break;
case STXRH_w: mnemonic = "stxrh"; form = "'Ws, 'Wt, ['Xns]"; break;
case STXR_w: mnemonic = "stxr"; form = "'Ws, 'Wt, ['Xns]"; break;
case STXR_x: mnemonic = "stxr"; form = "'Ws, 'Xt, ['Xns]"; break;
case LDXRB_w: mnemonic = "ldxrb"; form = "'Wt, ['Xns]"; break;
case LDXRH_w: mnemonic = "ldxrh"; form = "'Wt, ['Xns]"; break;
case LDXR_w: mnemonic = "ldxr"; form = "'Wt, ['Xns]"; break;
case LDXR_x: mnemonic = "ldxr"; form = "'Xt, ['Xns]"; break;
case STXP_w: mnemonic = "stxp"; form = "'Ws, 'Wt, 'Wt2, ['Xns]"; break;
case STXP_x: mnemonic = "stxp"; form = "'Ws, 'Xt, 'Xt2, ['Xns]"; break;
case LDXP_w: mnemonic = "ldxp"; form = "'Wt, 'Wt2, ['Xns]"; break;
case LDXP_x: mnemonic = "ldxp"; form = "'Xt, 'Xt2, ['Xns]"; break;
case STLXRB_w: mnemonic = "stlxrb"; form = "'Ws, 'Wt, ['Xns]"; break;
case STLXRH_w: mnemonic = "stlxrh"; form = "'Ws, 'Wt, ['Xns]"; break;
case STLXR_w: mnemonic = "stlxr"; form = "'Ws, 'Wt, ['Xns]"; break;
case STLXR_x: mnemonic = "stlxr"; form = "'Ws, 'Xt, ['Xns]"; break;
case LDAXRB_w: mnemonic = "ldaxrb"; form = "'Wt, ['Xns]"; break;
case LDAXRH_w: mnemonic = "ldaxrh"; form = "'Wt, ['Xns]"; break;
case LDAXR_w: mnemonic = "ldaxr"; form = "'Wt, ['Xns]"; break;
case LDAXR_x: mnemonic = "ldaxr"; form = "'Xt, ['Xns]"; break;
case STLXP_w: mnemonic = "stlxp"; form = "'Ws, 'Wt, 'Wt2, ['Xns]"; break;
case STLXP_x: mnemonic = "stlxp"; form = "'Ws, 'Xt, 'Xt2, ['Xns]"; break;
case LDAXP_w: mnemonic = "ldaxp"; form = "'Wt, 'Wt2, ['Xns]"; break;
case LDAXP_x: mnemonic = "ldaxp"; form = "'Xt, 'Xt2, ['Xns]"; break;
case STLRB_w: mnemonic = "stlrb"; form = "'Wt, ['Xns]"; break;
case STLRH_w: mnemonic = "stlrh"; form = "'Wt, ['Xns]"; break;
case STLR_w: mnemonic = "stlr"; form = "'Wt, ['Xns]"; break;
case STLR_x: mnemonic = "stlr"; form = "'Xt, ['Xns]"; break;
case LDARB_w: mnemonic = "ldarb"; form = "'Wt, ['Xns]"; break;
case LDARH_w: mnemonic = "ldarh"; form = "'Wt, ['Xns]"; break;
case LDAR_w: mnemonic = "ldar"; form = "'Wt, ['Xns]"; break;
case LDAR_x: mnemonic = "ldar"; form = "'Xt, ['Xns]"; break;
default: form = "(LoadStoreExclusive)";
}
Format(instr, mnemonic, form);
}
void Disassembler::VisitFPCompare(Instruction* instr) {
const char *mnemonic = "unimplemented";
const char *form = "'Fn, 'Fm";
@@ -1162,7 +1206,15 @@ void Disassembler::VisitSystem(Instruction* instr) {
const char *mnemonic = "unimplemented";
const char *form = "(System)";
if (instr->Mask(SystemSysRegFMask) == SystemSysRegFixed) {
if (instr->Mask(SystemExclusiveMonitorFMask) == SystemExclusiveMonitorFixed) {
switch (instr->Mask(SystemExclusiveMonitorMask)) {
case CLREX: {
mnemonic = "clrex";
form = (instr->CRm() == 0xf) ? NULL : "'IX";
break;
}
}
} else if (instr->Mask(SystemSysRegFMask) == SystemSysRegFixed) {
switch (instr->Mask(SystemSysRegMask)) {
case MRS: {
mnemonic = "mrs";
@@ -1184,7 +1236,6 @@ void Disassembler::VisitSystem(Instruction* instr) {
}
}
} else if (instr->Mask(SystemHintFMask) == SystemHintFixed) {
VIXL_ASSERT(instr->Mask(SystemHintMask) == HINT);
switch (instr->ImmHint()) {
case NOP: {
mnemonic = "nop";
@@ -1312,6 +1363,7 @@ int Disassembler::SubstituteRegisterField(Instruction* instr,
case 'n': reg_num = instr->Rn(); break;
case 'm': reg_num = instr->Rm(); break;
case 'a': reg_num = instr->Ra(); break;
case 's': reg_num = instr->Rs(); break;
case 't': {
if (format[2] == '2') {
reg_num = instr->Rt2();
@@ -1458,6 +1510,10 @@ int Disassembler::SubstituteImmediateField(Instruction* instr,
AppendToOutput("#0x%" PRIx64, instr->ImmException());
return 6;
}
case 'X': { // IX - CLREX instruction.
AppendToOutput("#0x%" PRIx64, instr->CRm());
return 2;
}
default: {
VIXL_UNIMPLEMENTED();
return 0;
@@ -1564,21 +1620,20 @@ int Disassembler::SubstituteConditionField(Instruction* instr,
int Disassembler::SubstitutePCRelAddressField(Instruction* instr,
const char* format) {
USE(format);
VIXL_ASSERT(strncmp(format, "AddrPCRel", 9) == 0);
VIXL_ASSERT((strcmp(format, "AddrPCRelByte") == 0) || // Used by `adr`.
(strcmp(format, "AddrPCRelPage") == 0)); // Used by `adrp`.
int offset = instr->ImmPCRel();
int64_t offset = instr->ImmPCRel();
Instruction * base = instr;
// Only ADR (AddrPCRelByte) is supported.
VIXL_ASSERT(strcmp(format, "AddrPCRelByte") == 0);
char sign = '+';
if (offset < 0) {
offset = -offset;
sign = '-';
if (format[9] == 'P') {
offset *= kPageSize;
base = AlignDown(base, kPageSize);
}
VIXL_STATIC_ASSERT(sizeof(*instr) == 1);
AppendToOutput("#%c0x%x (addr %p)", sign, offset, instr + offset);
char sign = (offset < 0) ? '-' : '+';
void * target = reinterpret_cast<void *>(base + offset);
AppendToOutput("#%c0x%" PRIx64 " (addr %p)", sign, std::abs(offset), target);
return 13;
}
@@ -1606,7 +1661,8 @@ int Disassembler::SubstituteBranchTargetField(Instruction* instr,
sign = '-';
}
VIXL_STATIC_ASSERT(sizeof(*instr) == 1);
AppendToOutput("#%c0x%" PRIx64 " (addr %p)", sign, offset, instr + offset);
void * address = reinterpret_cast<void *>(instr + offset);
AppendToOutput("#%c0x%" PRIx64 " (addr %p)", sign, offset, address);
return 8;
}

View File

@@ -85,7 +85,7 @@ class Disassembler: public DecoderVisitor {
bool IsMovzMovnImm(unsigned reg_size, uint64_t value);
void ResetOutput();
void AppendToOutput(const char* string, ...);
void AppendToOutput(const char* string, ...) PRINTF_CHECK(2, 3);
char* buffer_;
uint32_t buffer_pos_;

View File

@@ -149,17 +149,24 @@ LSDataSize CalcLSPairDataSize(LoadStorePairOp op) {
Instruction* Instruction::ImmPCOffsetTarget() {
Instruction * base = this;
ptrdiff_t offset;
if (IsPCRelAddressing()) {
// PC-relative addressing. Only ADR is supported.
// ADR and ADRP.
offset = ImmPCRel();
if (Mask(PCRelAddressingMask) == ADRP) {
base = AlignDown(base, kPageSize);
offset *= kPageSize;
} else {
VIXL_ASSERT(Mask(PCRelAddressingMask) == ADR);
}
} else {
// All PC-relative branches.
VIXL_ASSERT(BranchType() != UnknownBranchType);
// Relative branch offsets are instruction-size-aligned.
offset = ImmBranch() << kInstructionSizeLog2;
}
return this + offset;
return base + offset;
}
@@ -185,10 +192,16 @@ void Instruction::SetImmPCOffsetTarget(Instruction* target) {
void Instruction::SetPCRelImmTarget(Instruction* target) {
// ADRP is not supported, so 'this' must point to an ADR instruction.
VIXL_ASSERT(Mask(PCRelAddressingMask) == ADR);
Instr imm = Assembler::ImmPCRelAddress(target - this);
int32_t imm21;
if ((Mask(PCRelAddressingMask) == ADR)) {
imm21 = target - this;
} else {
VIXL_ASSERT(Mask(PCRelAddressingMask) == ADRP);
uintptr_t this_page = reinterpret_cast<uintptr_t>(this) / kPageSize;
uintptr_t target_page = reinterpret_cast<uintptr_t>(target) / kPageSize;
imm21 = target_page - this_page;
}
Instr imm = Assembler::ImmPCRelAddress(imm21);
SetInstructionBits(Mask(~ImmPCRel_mask) | imm);
}

View File

@@ -41,6 +41,10 @@ const unsigned kLiteralEntrySize = 4;
const unsigned kLiteralEntrySizeLog2 = 2;
const unsigned kMaxLoadLiteralRange = 1 * MBytes;
// This is the nominal page size (as used by the adrp instruction); the actual
// size of the memory pages allocated by the kernel is likely to differ.
const unsigned kPageSize = 4 * KBytes;
const unsigned kWRegSize = 32;
const unsigned kWRegSizeLog2 = 5;
const unsigned kWRegSizeInBytes = kWRegSize / 8;
@@ -79,6 +83,12 @@ const unsigned kZeroRegCode = 31;
const unsigned kSPRegInternalCode = 63;
const unsigned kRegCodeMask = 0x1f;
const unsigned kAddressTagOffset = 56;
const unsigned kAddressTagWidth = 8;
const uint64_t kAddressTagMask =
((UINT64_C(1) << kAddressTagWidth) - 1) << kAddressTagOffset;
VIXL_STATIC_ASSERT(kAddressTagMask == UINT64_C(0xff00000000000000));
// AArch64 floating-point specifics. These match IEEE-754.
const unsigned kDoubleMantissaBits = 52;
const unsigned kDoubleExponentBits = 11;

View File

@@ -28,14 +28,10 @@
#define PLATFORM_H
// Define platform specific functionalities.
#include <signal.h>
namespace vixl {
#ifdef USE_SIMULATOR
// Currently we assume running the simulator implies running on x86 hardware.
inline void HostBreakpoint() { asm("int3"); }
#else
inline void HostBreakpoint() { asm("brk"); }
#endif
inline void HostBreakpoint() { raise(SIGINT); }
} // namespace vixl
#endif

View File

@@ -124,4 +124,14 @@ int CountSetBits(uint64_t value, int width) {
return value;
}
uint64_t LowestSetBit(uint64_t value) {
return value & -value;
}
bool IsPowerOf2(int64_t value) {
return (value != 0) && ((value & (value - 1)) == 0);
}
} // namespace vixl

View File

@@ -33,6 +33,14 @@
namespace vixl {
// Macros for compile-time format checking.
#if defined(__GNUC__)
#define PRINTF_CHECK(format_index, varargs_index) \
__attribute__((format(printf, format_index, varargs_index)))
#else
#define PRINTF_CHECK(format_index, varargs_index)
#endif
// Check number width.
inline bool is_intn(unsigned n, int64_t x) {
VIXL_ASSERT((0 < n) && (n < 64));
@@ -155,6 +163,8 @@ int CountLeadingZeros(uint64_t value, int width);
int CountLeadingSignBits(int64_t value, int width);
int CountTrailingZeros(uint64_t value, int width);
int CountSetBits(uint64_t value, int width);
uint64_t LowestSetBit(uint64_t value);
bool IsPowerOf2(int64_t value);
// Pointer alignment
// TODO: rename/refactor to make it specific to instructions.
@@ -167,21 +177,31 @@ bool IsWordAligned(T pointer) {
// Increment a pointer until it has the specified alignment.
template<class T>
T AlignUp(T pointer, size_t alignment) {
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(uintptr_t));
uintptr_t pointer_raw = reinterpret_cast<uintptr_t>(pointer);
// Use C-style casts to get static_cast behaviour for integral types (T), and
// reinterpret_cast behaviour for other types.
uintptr_t pointer_raw = (uintptr_t)pointer;
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(pointer_raw));
size_t align_step = (alignment - pointer_raw) % alignment;
VIXL_ASSERT((pointer_raw + align_step) % alignment == 0);
return reinterpret_cast<T>(pointer_raw + align_step);
return (T)(pointer_raw + align_step);
}
// Decrement a pointer until it has the specified alignment.
template<class T>
T AlignDown(T pointer, size_t alignment) {
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(uintptr_t));
uintptr_t pointer_raw = reinterpret_cast<uintptr_t>(pointer);
// Use C-style casts to get static_cast behaviour for integral types (T), and
// reinterpret_cast behaviour for other types.
uintptr_t pointer_raw = (uintptr_t)pointer;
VIXL_STATIC_ASSERT(sizeof(pointer) == sizeof(pointer_raw));
size_t align_step = pointer_raw % alignment;
VIXL_ASSERT((pointer_raw - align_step) % alignment == 0);
return reinterpret_cast<T>(pointer_raw - align_step);
return (T)(pointer_raw - align_step);
}

238
docs/image-fuzzer.txt Normal file
View File

@@ -0,0 +1,238 @@
# Specification for the fuzz testing tool
#
# Copyright (C) 2014 Maria Kustova <maria.k@catit.be>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Image fuzzer
============
Description
-----------
The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img
by providing to them randomly corrupted images.
Test images are generated from scratch and have valid inner structure with some
elements, e.g. L1/L2 tables, having random invalid values.
Test runner
-----------
The test runner generates test images, executes tests utilizing generated
images, indicates their results and collects all test related artifacts (logs,
core dumps, test images, backing files).
The test means execution of all available commands under test with the same
generated test image.
By default, the test runner generates new tests and executes them until
keyboard interruption. But if a test seed is specified via the '--seed' runner
parameter, then only one test with this seed will be executed, after its finish
the runner will exit.
The runner uses an external image fuzzer to generate test images. An image
generator should be specified as a mandatory parameter of the test runner.
Details about interactions between the runner and fuzzers see "Module
interfaces".
The runner activates generation of core dumps during test executions, but it
assumes that core dumps will be generated in the current working directory.
For comprehensive test results, please, set up your test environment
properly.
Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from
environment variables. If the environment check fails the runner will
use SUTs installed in system paths.
qemu-img is required for creation of backing files, so it's mandatory to set
the related environment variable if it's not installed in the system path.
For details about environment variables see qemu-iotests/check.
The runner accepts a JSON array of fields expected to be fuzzed via the
'--config' argument, e.g.
'[["feature_name_table"], ["header", "l1_table_offset"]]'
Each sublist can have one or two strings defining image structure elements.
In the latter case a parent element should be placed on the first position,
and a field name on the second one.
The runner accepts a list of commands under test as a JSON array via
the '--command' argument. Each command is a list containing a SUT and all its
arguments, e.g.
runner.py -c '[["qemu-io", "$test_img", "-c", "write $off $len"]]'
/tmp/test ../qcow2
For variable arguments next aliases can be used:
- $test_img for a fuzzed img
- $off for an offset in the fuzzed image
- $len for a data size
Values for last two aliases will be generated based on a size of a virtual
disk of the generated image.
In case when no commands are specified the runner will execute commands from
the default list:
- qemu-img check
- qemu-img info
- qemu-img convert
- qemu-io -c read
- qemu-io -c write
- qemu-io -c aio_read
- qemu-io -c aio_write
- qemu-io -c flush
- qemu-io -c discard
- qemu-io -c truncate
Qcow2 image generator
---------------------
The 'qcow2' generator is a Python package providing 'create_image' method as
a single public API. See details in 'Test runner/image fuzzer' chapter of
'Module interfaces'.
Qcow2 contains two submodules: fuzz.py and layout.py.
'fuzz.py' contains all fuzzing functions, one per image field. It's assumed
that after code analysis every field will have own constraints for its value.
For now only universal potentially dangerous values are used, e.g. type limits
for integers or unsafe symbols as '%s' for strings. For bitmasks random amount
of bits are set to ones. All fuzzed values are checked on non-equality to the
current valid value of the field. In case of equality the value will be
regenerated.
'layout.py' creates a random valid image, fuzzes a random subset of the image
fields by 'fuzz.py' module and writes a fuzzed image to the file specified.
If a fuzzer configuration is specified, then it has the next interpretation:
1. If a list contains a parent image element only, then some random portion
of fields of this element will be fuzzed every test.
The same behavior is applied for the entire image if no configuration is
used. This case is useful for the test specialization.
2. If a list contains a parent element and a field name, then a field
will be always fuzzed for every test. This case is useful for regression
testing.
For now only header fields, header extensions and L1/L2 tables are generated.
Module interfaces
-----------------
* Test runner/image fuzzer
The runner calls an image generator specifying the path to a test image file,
path to a backing file and its format and a fuzzer configuration.
An image generator is expected to provide a
'create_image(test_img_path, backing_file_path=None,
backing_file_format=None, fuzz_config=None)'
method that creates a test image, writes it to the specified file and returns
the size of the virtual disk.
The file should be created if it doesn't exist or overwritten otherwise.
fuzz_config has a form of a list of lists. Every sublist can have one
or two elements: first element is a name of a parent image element, second one
if exists is a name of a field in this element.
Example,
[['header', 'l1_table_offset'],
['header', 'nb_snapshots'],
['feature_name_table']]
Random seed is set by the runner at every test execution for the regression
purpose, so an image generator is not recommended to modify it internally.
Overall fuzzer requirements
===========================
Input data:
----------
- image template (generator)
- work directory
- action vector (optional)
- seed (optional)
- SUT and its arguments (optional)
Fuzzer requirements:
-------------------
1. Should be able to inject random data
2. Should be able to select a random value from the manually pregenerated
vector (boundary values, e.g. max/min cluster size)
3. Image template should describe a general structure invariant for all
test images (image format description)
4. Image template should be autonomous and other fuzzer parts should not
rely on it
5. Image template should contain reference rules (not only block+size
description)
6. Should generate the test image with the correct structure based on an image
template
7. Should accept a seed as an argument (for regression purpose)
8. Should generate a seed if it is not specified as an input parameter.
9. The same seed should generate the same image for the same action vector,
specified or generated.
10. Should accept a vector of actions as an argument (for test reproducing and
for test case specification, e.g. group of tests for header structure,
group of test for snapshots, etc)
11. Action vector should be randomly generated from the pool of available
actions, if it is not specified as an input parameter
12. Pool of actions should be defined automatically based on an image template
13. Should accept a SUT and its call parameters as an argument or select them
randomly otherwise. As far as it's expected to be rarely changed, the list
of all possible test commands can be available in the test runner
internally.
14. Should support an external cancellation of a test run
15. Seed should be logged (for regression purpose)
16. All files related to a test result should be collected: a test image,
SUT logs, fuzzer logs and crash dumps
17. Should be compatible with python version 2.4-2.7
18. Usage of external libraries should be limited as much as possible.
Image formats:
-------------
Main target image format is qcow2, but support of image templates should
provide an ability to add any other image format.
Effectiveness:
-------------
The fuzzer can be controlled via template, seed and action vector;
it makes the fuzzer itself invariant to an image format and test logic.
It should be able to perform rather complex and precise tests, that can be
specified via an action vector. Otherwise, knowledge about an image structure
allows the fuzzer to generate the pool of all available areas can be fuzzed
and randomly select some of them and so compose its own action vector.
Also complexity of a template defines complexity of the fuzzer, so its
functionality can be varied from simple model-independent fuzzing to smart
model-based one.
Glossary:
--------
Action vector is a sequence of structure elements retrieved from an image
format, each of them will be fuzzed for the test image. It's a subset of
elements of the action pool. Example: header, refcount table, etc.
Action pool is all available elements of an image structure that generated
automatically from an image template.
Image template is a formal description of an image structure and relations
between image blocks.
Test image is an output image of the fuzzer defined by the current seed and
action vector.

View File

@@ -74,11 +74,16 @@ Region lifecycle
----------------
A region is created by one of the constructor functions (memory_region_init*())
and destroyed by the destructor (memory_region_destroy()). In between,
a region can be added to an address space by using memory_region_add_subregion()
and removed using memory_region_del_subregion(). Region attributes may be
changed at any point; they take effect once the region becomes exposed to the
guest.
and attached to an object. It is then destroyed by object_unparent() or simply
when the parent object dies.
In between, a region can be added to an address space
by using memory_region_add_subregion() and removed using
memory_region_del_subregion(). Destroying the region implicitly
removes the region from the address space.
Region attributes may be changed at any point; they take effect once
the region becomes exposed to the guest.
Overlapping regions and priority
--------------------------------

134
docs/multiple-iothreads.txt Normal file
View File

@@ -0,0 +1,134 @@
Copyright (c) 2014 Red Hat Inc.
This work is licensed under the terms of the GNU GPL, version 2 or later. See
the COPYING file in the top-level directory.
This document explains the IOThread feature and how to write code that runs
outside the QEMU global mutex.
The main loop and IOThreads
---------------------------
QEMU is an event-driven program that can do several things at once using an
event loop. The VNC server and the QMP monitor are both processed from the
same event loop, which monitors their file descriptors until they become
readable and then invokes a callback.
The default event loop is called the main loop (see main-loop.c). It is
possible to create additional event loop threads using -object
iothread,id=my-iothread.
Side note: The main loop and IOThread are both event loops but their code is
not shared completely. Sometimes it is useful to remember that although they
are conceptually similar they are currently not interchangeable.
Why IOThreads are useful
------------------------
IOThreads allow the user to control the placement of work. The main loop is a
scalability bottleneck on hosts with many CPUs. Work can be spread across
several IOThreads instead of just one main loop. When set up correctly this
can improve I/O latency and reduce jitter seen by the guest.
The main loop is also deeply associated with the QEMU global mutex, which is a
scalability bottleneck in itself. vCPU threads and the main loop use the QEMU
global mutex to serialize execution of QEMU code. This mutex is necessary
because a lot of QEMU's code historically was not thread-safe.
The fact that all I/O processing is done in a single main loop and that the
QEMU global mutex is contended by all vCPU threads and the main loop explain
why it is desirable to place work into IOThreads.
The experimental virtio-blk data-plane implementation has been benchmarked and
shows these effects:
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
How to program for IOThreads
----------------------------
The main difference between legacy code and new code that can run in an
IOThread is dealing explicitly with the event loop object, AioContext
(see include/block/aio.h). Code that only works in the main loop
implicitly uses the main loop's AioContext. Code that supports running
in IOThreads must be aware of its AioContext.
AioContext supports the following services:
* File descriptor monitoring (read/write/error on POSIX hosts)
* Event notifiers (inter-thread signalling)
* Timers
* Bottom Halves (BH) deferred callbacks
There are several old APIs that use the main loop AioContext:
* LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
* LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
* LEGACY timer_new_ms() - create a timer
* LEGACY qemu_bh_new() - create a BH
* LEGACY qemu_aio_wait() - run an event loop iteration
Since they implicitly work on the main loop they cannot be used in code that
runs in an IOThread. They might cause a crash or deadlock if called from an
IOThread since the QEMU global mutex is not held.
Instead, use the AioContext functions directly (see include/block/aio.h):
* aio_set_fd_handler() - monitor a file descriptor
* aio_set_event_notifier() - monitor an event notifier
* aio_timer_new() - create a timer
* aio_bh_new() - create a BH
* aio_poll() - run an event loop iteration
The AioContext can be obtained from the IOThread using
iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
Code that takes an AioContext argument works both in IOThreads or the main
loop, depending on which AioContext instance the caller passes in.
How to synchronize with an IOThread
-----------------------------------
AioContext is not thread-safe so some rules must be followed when using file
descriptors, event notifiers, timers, or BHs across threads:
1. AioContext functions can be called safely from file descriptor, event
notifier, timer, or BH callbacks invoked by the AioContext. No locking is
necessary.
2. Other threads wishing to access the AioContext must use
aio_context_acquire()/aio_context_release() for mutual exclusion. Once the
context is acquired no other thread can access it or run event loop iterations
in this AioContext.
aio_context_acquire()/aio_context_release() calls may be nested. This
means you can call them if you're not sure whether #1 applies.
There is currently no lock ordering rule if a thread needs to acquire multiple
AioContexts simultaneously. Therefore, it is only safe for code holding the
QEMU global mutex to acquire other AioContexts.
Side note: the best way to schedule a function call across threads is to create
a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No
acquire/release or locking is needed for the qemu_bh_schedule() call. But be
sure to acquire the AioContext for aio_bh_new() if necessary.
The relationship between AioContext and the block layer
-------------------------------------------------------
The AioContext originates from the QEMU block layer because it provides a
scoped way of running event loop iterations until all work is done. This
feature is used to complete all in-flight block I/O requests (see
bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be
used by any QEMU subsystem.
The block layer has support for AioContext integrated. Each BlockDriverState
is associated with an AioContext using bdrv_set_aio_context() and
bdrv_get_aio_context(). This allows block layer code to process I/O inside the
right AioContext. Other subsystems may wish to follow a similar approach.
Block layer code must therefore expect to run in an IOThread and avoid using
old APIs that implicitly use the main loop. See the "How to program for
IOThreads" above for information on how to do that.
If main loop code such as a QMP function wishes to access a BlockDriverState it
must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
IOThread does not run in parallel.
Long-running jobs (usually in the form of coroutines) are best scheduled in the
BlockDriverState's AioContext to avoid the need to acquire/release around each
bdrv_*() call. Be aware that there is currently no mechanism to get notified
when bdrv_set_aio_context() moves this BlockDriverState to a different
AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
may need to add this if you want to support long-running jobs.

View File

@@ -135,12 +135,12 @@ be stored. Each extension has a structure like the following:
Unless stated otherwise, each header extension type shall appear at most once
in the same image.
The remaining space between the end of the header extension area and the end of
the first cluster can be used for the backing file name. It is not allowed to
store other data here, so that an implementation can safely modify the header
and add extensions without harming data of compatible features that it
doesn't support. Compatible features that need space for additional data can
use a header extension.
If the image has a backing file then the backing file name should be stored in
the remaining space between the end of the header extension area and the end of
the first cluster. It is not allowed to store other data here, so that an
implementation can safely modify the header and add extensions without harming
data of compatible features that it doesn't support. Compatible features that
need space for additional data can use a header extension.
== Feature name table ==

View File

@@ -307,3 +307,43 @@ guard such computations and avoid its compilation when the event is disabled:
You can check both if the event has been disabled and is dynamically enabled at
the same time using the 'trace_event_get_state' routine (see header
"trace/control.h" for more information).
=== "tcg" ===
Guest code generated by TCG can be traced by defining an event with the "tcg"
event property. Internally, this property generates two events:
"<eventname>_trans" to trace the event at translation time, and
"<eventname>_exec" to trace the event at execution time.
Instead of using these two events, you should instead use the function
"trace_<eventname>_tcg" during translation (TCG code generation). This function
will automatically call "trace_<eventname>_trans", and will generate the
necessary TCG code to call "trace_<eventname>_exec" during guest code execution.
Events with the "tcg" property can be declared in the "trace-events" file with a
mix of native and TCG types, and "trace_<eventname>_tcg" will gracefully forward
them to the "<eventname>_trans" and "<eventname>_exec" events. Since TCG values
are not known at translation time, these are ignored by the "<eventname>_trans"
event. Because of this, the entry in the "trace-events" file needs two printing
formats (separated by a comma):
tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d", "a1=%d a2=%d"
For example:
#include "trace-tcg.h"
void some_disassembly_func (...)
{
uint8_t a1 = ...;
TCGv_i32 a2 = ...;
trace_foo_tcg(a1, a2);
}
This will immediately call:
void trace_foo_trans(uint8_t a1);
and will generate the TCG code to call:
void trace_foo(uint8_t a1, uint32_t a2);

18
dump.c
View File

@@ -71,18 +71,14 @@ uint64_t cpu_to_dump64(DumpState *s, uint64_t val)
static int dump_cleanup(DumpState *s)
{
int ret = 0;
guest_phys_blocks_free(&s->guest_phys_blocks);
memory_mapping_list_free(&s->list);
if (s->fd != -1) {
close(s->fd);
}
close(s->fd);
if (s->resume) {
vm_start();
}
return ret;
return 0;
}
static void dump_error(DumpState *s, const char *reason)
@@ -1499,6 +1495,8 @@ static int dump_init(DumpState *s, int fd, bool has_format,
s->begin = begin;
s->length = length;
memory_mapping_list_init(&s->list);
guest_phys_blocks_init(&s->guest_phys_blocks);
guest_phys_blocks_append(&s->guest_phys_blocks);
@@ -1526,7 +1524,6 @@ static int dump_init(DumpState *s, int fd, bool has_format,
}
/* get memory mapping */
memory_mapping_list_init(&s->list);
if (paging) {
qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
if (err != NULL) {
@@ -1622,12 +1619,7 @@ static int dump_init(DumpState *s, int fd, bool has_format,
return 0;
cleanup:
guest_phys_blocks_free(&s->guest_phys_blocks);
if (s->resume) {
vm_start();
}
dump_cleanup(s);
return -1;
}

6
exec.c
View File

@@ -373,7 +373,7 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
break;
}
iotlb = mr->iommu_ops->translate(mr, addr);
iotlb = mr->iommu_ops->translate(mr, addr, is_write);
addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
| (addr & iotlb.addr_mask));
len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
@@ -1044,7 +1044,7 @@ static void *file_ram_alloc(RAMBlock *block,
}
/* Make name safe to use with mkstemp by replacing '/' with '_'. */
sanitized_name = g_strdup(block->mr->name);
sanitized_name = g_strdup(memory_region_name(block->mr));
for (c = sanitized_name; *c != '\0'; c++) {
if (*c == '/')
*c = '_';
@@ -1242,7 +1242,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block)
new_block->host = phys_mem_alloc(new_block->length);
if (!new_block->host) {
fprintf(stderr, "Cannot set up guest memory '%s': %s\n",
new_block->mr->name, strerror(errno));
memory_region_name(new_block->mr), strerror(errno));
exit(1);
}
memory_try_enable_merging(new_block->host, new_block->length);

26
gdb-xml/s390-acr.xml Normal file
View File

@@ -0,0 +1,26 @@
<?xml version="1.0"?>
<!-- Copyright (C) 2010-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. -->
<!DOCTYPE feature SYSTEM "gdb-target.dtd">
<feature name="org.gnu.gdb.s390.acr">
<reg name="acr0" bitsize="32" type="uint32" group="access"/>
<reg name="acr1" bitsize="32" type="uint32" group="access"/>
<reg name="acr2" bitsize="32" type="uint32" group="access"/>
<reg name="acr3" bitsize="32" type="uint32" group="access"/>
<reg name="acr4" bitsize="32" type="uint32" group="access"/>
<reg name="acr5" bitsize="32" type="uint32" group="access"/>
<reg name="acr6" bitsize="32" type="uint32" group="access"/>
<reg name="acr7" bitsize="32" type="uint32" group="access"/>
<reg name="acr8" bitsize="32" type="uint32" group="access"/>
<reg name="acr9" bitsize="32" type="uint32" group="access"/>
<reg name="acr10" bitsize="32" type="uint32" group="access"/>
<reg name="acr11" bitsize="32" type="uint32" group="access"/>
<reg name="acr12" bitsize="32" type="uint32" group="access"/>
<reg name="acr13" bitsize="32" type="uint32" group="access"/>
<reg name="acr14" bitsize="32" type="uint32" group="access"/>
<reg name="acr15" bitsize="32" type="uint32" group="access"/>
</feature>

27
gdb-xml/s390-fpr.xml Normal file
View File

@@ -0,0 +1,27 @@
<?xml version="1.0"?>
<!-- Copyright (C) 2010-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. -->
<!DOCTYPE feature SYSTEM "gdb-target.dtd">
<feature name="org.gnu.gdb.s390.fpr">
<reg name="fpc" bitsize="32" type="uint32" group="float"/>
<reg name="f0" bitsize="64" type="ieee_double" group="float"/>
<reg name="f1" bitsize="64" type="ieee_double" group="float"/>
<reg name="f2" bitsize="64" type="ieee_double" group="float"/>
<reg name="f3" bitsize="64" type="ieee_double" group="float"/>
<reg name="f4" bitsize="64" type="ieee_double" group="float"/>
<reg name="f5" bitsize="64" type="ieee_double" group="float"/>
<reg name="f6" bitsize="64" type="ieee_double" group="float"/>
<reg name="f7" bitsize="64" type="ieee_double" group="float"/>
<reg name="f8" bitsize="64" type="ieee_double" group="float"/>
<reg name="f9" bitsize="64" type="ieee_double" group="float"/>
<reg name="f10" bitsize="64" type="ieee_double" group="float"/>
<reg name="f11" bitsize="64" type="ieee_double" group="float"/>
<reg name="f12" bitsize="64" type="ieee_double" group="float"/>
<reg name="f13" bitsize="64" type="ieee_double" group="float"/>
<reg name="f14" bitsize="64" type="ieee_double" group="float"/>
<reg name="f15" bitsize="64" type="ieee_double" group="float"/>
</feature>

28
gdb-xml/s390x-core64.xml Normal file
View File

@@ -0,0 +1,28 @@
<?xml version="1.0"?>
<!-- Copyright (C) 2010-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. -->
<!DOCTYPE feature SYSTEM "gdb-target.dtd">
<feature name="org.gnu.gdb.s390.core">
<reg name="pswm" bitsize="64" type="uint64" group="psw"/>
<reg name="pswa" bitsize="64" type="uint64" group="psw"/>
<reg name="r0" bitsize="64" type="uint64" group="general"/>
<reg name="r1" bitsize="64" type="uint64" group="general"/>
<reg name="r2" bitsize="64" type="uint64" group="general"/>
<reg name="r3" bitsize="64" type="uint64" group="general"/>
<reg name="r4" bitsize="64" type="uint64" group="general"/>
<reg name="r5" bitsize="64" type="uint64" group="general"/>
<reg name="r6" bitsize="64" type="uint64" group="general"/>
<reg name="r7" bitsize="64" type="uint64" group="general"/>
<reg name="r8" bitsize="64" type="uint64" group="general"/>
<reg name="r9" bitsize="64" type="uint64" group="general"/>
<reg name="r10" bitsize="64" type="uint64" group="general"/>
<reg name="r11" bitsize="64" type="uint64" group="general"/>
<reg name="r12" bitsize="64" type="uint64" group="general"/>
<reg name="r13" bitsize="64" type="uint64" group="general"/>
<reg name="r14" bitsize="64" type="uint64" group="general"/>
<reg name="r15" bitsize="64" type="uint64" group="general"/>
</feature>

View File

@@ -832,19 +832,17 @@ The values that can be specified here depend on the machine type, but are
the same that can be specified in the @code{-boot} command line option.
ETEXI
#if defined(TARGET_I386) || defined(TARGET_S390X)
{
.name = "nmi",
.args_type = "",
.params = "",
.help = "inject an NMI on all guest's CPUs",
.help = "inject an NMI",
.mhandler.cmd = hmp_inject_nmi,
},
#endif
STEXI
@item nmi @var{cpu}
@findex nmi
Inject an NMI (x86) or RESTART (s390x) on the given CPU.
Inject an NMI on the default CPU (x86/s390) or all CPUs (ppc64).
ETEXI

8
hmp.c
View File

@@ -1687,6 +1687,7 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
MemdevList *memdev_list = qmp_query_memdev(&err);
MemdevList *m = memdev_list;
StringOutputVisitor *ov;
char *str;
int i = 0;
@@ -1704,13 +1705,16 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
m->value->prealloc ? "true" : "false");
monitor_printf(mon, " policy: %s\n",
HostMemPolicy_lookup[m->value->policy]);
monitor_printf(mon, " host nodes: %s\n",
string_output_get_string(ov));
str = string_output_get_string(ov);
monitor_printf(mon, " host nodes: %s\n", str);
g_free(str);
string_output_visitor_cleanup(ov);
m = m->next;
i++;
}
monitor_printf(mon, "\n");
qapi_free_MemdevList(memdev_list);
}

View File

@@ -135,17 +135,17 @@ static int local_lstat(FsContext *fs_ctx, V9fsPath *fs_path, struct stat *stbuf)
mode_t tmp_mode;
dev_t tmp_dev;
if (getxattr(buffer, "user.virtfs.uid", &tmp_uid, sizeof(uid_t)) > 0) {
stbuf->st_uid = tmp_uid;
stbuf->st_uid = le32_to_cpu(tmp_uid);
}
if (getxattr(buffer, "user.virtfs.gid", &tmp_gid, sizeof(gid_t)) > 0) {
stbuf->st_gid = tmp_gid;
stbuf->st_gid = le32_to_cpu(tmp_gid);
}
if (getxattr(buffer, "user.virtfs.mode",
&tmp_mode, sizeof(mode_t)) > 0) {
stbuf->st_mode = tmp_mode;
stbuf->st_mode = le32_to_cpu(tmp_mode);
}
if (getxattr(buffer, "user.virtfs.rdev", &tmp_dev, sizeof(dev_t)) > 0) {
stbuf->st_rdev = tmp_dev;
stbuf->st_rdev = le64_to_cpu(tmp_dev);
}
} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
local_mapped_file_attr(fs_ctx, path, stbuf);
@@ -255,29 +255,29 @@ static int local_set_xattr(const char *path, FsCred *credp)
int err;
if (credp->fc_uid != -1) {
err = setxattr(path, "user.virtfs.uid", &credp->fc_uid, sizeof(uid_t),
0);
uint32_t tmp_uid = cpu_to_le32(credp->fc_uid);
err = setxattr(path, "user.virtfs.uid", &tmp_uid, sizeof(uid_t), 0);
if (err) {
return err;
}
}
if (credp->fc_gid != -1) {
err = setxattr(path, "user.virtfs.gid", &credp->fc_gid, sizeof(gid_t),
0);
uint32_t tmp_gid = cpu_to_le32(credp->fc_gid);
err = setxattr(path, "user.virtfs.gid", &tmp_gid, sizeof(gid_t), 0);
if (err) {
return err;
}
}
if (credp->fc_mode != -1) {
err = setxattr(path, "user.virtfs.mode", &credp->fc_mode,
sizeof(mode_t), 0);
uint32_t tmp_mode = cpu_to_le32(credp->fc_mode);
err = setxattr(path, "user.virtfs.mode", &tmp_mode, sizeof(mode_t), 0);
if (err) {
return err;
}
}
if (credp->fc_rdev != -1) {
err = setxattr(path, "user.virtfs.rdev", &credp->fc_rdev,
sizeof(dev_t), 0);
uint64_t tmp_rdev = cpu_to_le64(credp->fc_rdev);
err = setxattr(path, "user.virtfs.rdev", &tmp_rdev, sizeof(dev_t), 0);
if (err) {
return err;
}
@@ -397,12 +397,15 @@ static int local_readdir_r(FsContext *ctx, V9fsFidOpenState *fs,
again:
ret = readdir_r(fs->dir, entry, result);
if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
if (ctx->export_flags & V9FS_SM_MAPPED) {
entry->d_type = DT_UNKNOWN;
} else if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
if (!ret && *result != NULL &&
!strcmp(entry->d_name, VIRTFS_META_DIR)) {
/* skp the meta data directory */
goto again;
}
entry->d_type = DT_UNKNOWN;
}
return ret;
}
@@ -630,21 +633,17 @@ static int local_fstat(FsContext *fs_ctx, int fid_type,
mode_t tmp_mode;
dev_t tmp_dev;
if (fgetxattr(fd, "user.virtfs.uid",
&tmp_uid, sizeof(uid_t)) > 0) {
stbuf->st_uid = tmp_uid;
if (fgetxattr(fd, "user.virtfs.uid", &tmp_uid, sizeof(uid_t)) > 0) {
stbuf->st_uid = le32_to_cpu(tmp_uid);
}
if (fgetxattr(fd, "user.virtfs.gid",
&tmp_gid, sizeof(gid_t)) > 0) {
stbuf->st_gid = tmp_gid;
if (fgetxattr(fd, "user.virtfs.gid", &tmp_gid, sizeof(gid_t)) > 0) {
stbuf->st_gid = le32_to_cpu(tmp_gid);
}
if (fgetxattr(fd, "user.virtfs.mode",
&tmp_mode, sizeof(mode_t)) > 0) {
stbuf->st_mode = tmp_mode;
if (fgetxattr(fd, "user.virtfs.mode", &tmp_mode, sizeof(mode_t)) > 0) {
stbuf->st_mode = le32_to_cpu(tmp_mode);
}
if (fgetxattr(fd, "user.virtfs.rdev",
&tmp_dev, sizeof(dev_t)) > 0) {
stbuf->st_rdev = tmp_dev;
if (fgetxattr(fd, "user.virtfs.rdev", &tmp_dev, sizeof(dev_t)) > 0) {
stbuf->st_rdev = le64_to_cpu(tmp_dev);
}
} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
errno = EOPNOTSUPP;

View File

@@ -231,7 +231,7 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size)
uint32_t val = 0;
int bsel = s->hotplug_select;
if (bsel < 0 || bsel > ACPI_PCIHP_MAX_HOTPLUG_BUS) {
if (bsel < 0 || bsel >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
return 0;
}

View File

@@ -660,7 +660,8 @@ static bool window_translate(TyphoonWindow *win, hwaddr addr,
/* Handle PCI-to-system address translation. */
/* TODO: A translation failure here ought to set PCI error codes on the
Pchip and generate a machine check interrupt. */
static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr)
static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
bool is_write)
{
TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
IOMMUTLBEntry ret;

View File

@@ -166,7 +166,7 @@ static void armv7m_reset(void *opaque)
flash_size and sram_size are in kb.
Returns the NVIC array. */
qemu_irq *armv7m_init(MemoryRegion *address_space_mem,
qemu_irq *armv7m_init(MemoryRegion *system_memory,
int flash_size, int sram_size,
const char *kernel_filename, const char *cpu_model)
{
@@ -213,10 +213,10 @@ qemu_irq *armv7m_init(MemoryRegion *address_space_mem,
memory_region_init_ram(flash, NULL, "armv7m.flash", flash_size);
vmstate_register_ram_global(flash);
memory_region_set_readonly(flash, true);
memory_region_add_subregion(address_space_mem, 0, flash);
memory_region_add_subregion(system_memory, 0, flash);
memory_region_init_ram(sram, NULL, "armv7m.sram", sram_size);
vmstate_register_ram_global(sram);
memory_region_add_subregion(address_space_mem, 0x20000000, sram);
memory_region_add_subregion(system_memory, 0x20000000, sram);
armv7m_bitband_init();
nvic = qdev_create(NULL, "armv7m_nvic");
@@ -257,7 +257,7 @@ qemu_irq *armv7m_init(MemoryRegion *address_space_mem,
when returning from an exception. */
memory_region_init_ram(hack, NULL, "armv7m.hack", 0x1000);
vmstate_register_ram_global(hack);
memory_region_add_subregion(address_space_mem, 0xfffff000, hack);
memory_region_add_subregion(system_memory, 0xfffff000, hack);
qemu_register_reset(armv7m_reset, cpu);
return pic;

View File

@@ -417,8 +417,12 @@ static void do_cpu_reset(void *opaque)
if (info) {
if (!info->is_linux) {
/* Jump to the entry point. */
env->regs[15] = info->entry & 0xfffffffe;
env->thumb = info->entry & 1;
if (env->aarch64) {
env->pc = info->entry;
} else {
env->regs[15] = info->entry & 0xfffffffe;
env->thumb = info->entry & 1;
}
} else {
if (CPU(cpu) == first_cpu) {
if (env->aarch64) {
@@ -510,6 +514,13 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
kernel_size = load_uimage(info->kernel_filename, &entry, NULL,
&is_linux);
}
/* On aarch64, it's the bootloader's job to uncompress the kernel. */
if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64) && kernel_size < 0) {
entry = info->loader_start + kernel_load_offset;
kernel_size = load_image_gzipped(info->kernel_filename, entry,
info->ram_size - kernel_load_offset);
is_linux = 1;
}
if (kernel_size < 0) {
entry = info->loader_start + kernel_load_offset;
kernel_size = load_image_targphys(info->kernel_filename, entry,

View File

@@ -1208,7 +1208,6 @@ static void stellaris_init(const char *kernel_filename, const char *cpu_model,
0x40024000, 0x40025000, 0x40026000};
static const int gpio_irq[7] = {0, 1, 2, 3, 4, 30, 31};
MemoryRegion *address_space_mem = get_system_memory();
qemu_irq *pic;
DeviceState *gpio_dev[7];
qemu_irq gpio_in[7][8];
@@ -1223,7 +1222,7 @@ static void stellaris_init(const char *kernel_filename, const char *cpu_model,
flash_size = ((board->dc0 & 0xffff) + 1) << 1;
sram_size = (board->dc0 >> 18) + 1;
pic = armv7m_init(address_space_mem,
pic = armv7m_init(get_system_memory(),
flash_size, sram_size, kernel_filename, cpu_model);
if (board->dc1 & (1 << 16)) {

View File

@@ -98,17 +98,17 @@ typedef struct VirtBoardInfo {
*/
static const MemMapEntry a15memmap[] = {
/* Space up to 0x8000000 is reserved for a boot ROM */
[VIRT_FLASH] = { 0, 0x8000000 },
[VIRT_CPUPERIPHS] = { 0x8000000, 0x20000 },
[VIRT_FLASH] = { 0, 0x08000000 },
[VIRT_CPUPERIPHS] = { 0x08000000, 0x00020000 },
/* GIC distributor and CPU interfaces sit inside the CPU peripheral space */
[VIRT_GIC_DIST] = { 0x8000000, 0x10000 },
[VIRT_GIC_CPU] = { 0x8010000, 0x10000 },
[VIRT_UART] = { 0x9000000, 0x1000 },
[VIRT_RTC] = { 0x9010000, 0x1000 },
[VIRT_MMIO] = { 0xa000000, 0x200 },
[VIRT_GIC_DIST] = { 0x08000000, 0x00010000 },
[VIRT_GIC_CPU] = { 0x08010000, 0x00010000 },
[VIRT_UART] = { 0x09000000, 0x00001000 },
[VIRT_RTC] = { 0x09010000, 0x00001000 },
[VIRT_MMIO] = { 0x0a000000, 0x00000200 },
/* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
/* 0x10000000 .. 0x40000000 reserved for PCI */
[VIRT_MEM] = { 0x40000000, 30ULL * 1024 * 1024 * 1024 },
[VIRT_MEM] = { 0x40000000, 30ULL * 1024 * 1024 * 1024 },
};
static const int a15irqmap[] = {
@@ -194,20 +194,41 @@ static void fdt_add_psci_node(const VirtBoardInfo *vbi)
/* No PSCI for TCG yet */
if (kvm_enabled()) {
uint32_t cpu_suspend_fn;
uint32_t cpu_off_fn;
uint32_t cpu_on_fn;
uint32_t migrate_fn;
qemu_fdt_add_subnode(fdt, "/psci");
if (armcpu->psci_version == 2) {
const char comp[] = "arm,psci-0.2\0arm,psci";
qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
cpu_suspend_fn = QEMU_PSCI_0_2_FN64_CPU_SUSPEND;
cpu_on_fn = QEMU_PSCI_0_2_FN64_CPU_ON;
migrate_fn = QEMU_PSCI_0_2_FN64_MIGRATE;
} else {
cpu_suspend_fn = QEMU_PSCI_0_2_FN_CPU_SUSPEND;
cpu_on_fn = QEMU_PSCI_0_2_FN_CPU_ON;
migrate_fn = QEMU_PSCI_0_2_FN_MIGRATE;
}
} else {
qemu_fdt_setprop_string(fdt, "/psci", "compatible", "arm,psci");
cpu_suspend_fn = QEMU_PSCI_0_1_FN_CPU_SUSPEND;
cpu_off_fn = QEMU_PSCI_0_1_FN_CPU_OFF;
cpu_on_fn = QEMU_PSCI_0_1_FN_CPU_ON;
migrate_fn = QEMU_PSCI_0_1_FN_MIGRATE;
}
qemu_fdt_setprop_string(fdt, "/psci", "method", "hvc");
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_suspend",
PSCI_FN_CPU_SUSPEND);
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_off", PSCI_FN_CPU_OFF);
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_on", PSCI_FN_CPU_ON);
qemu_fdt_setprop_cell(fdt, "/psci", "migrate", PSCI_FN_MIGRATE);
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_suspend", cpu_suspend_fn);
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_off", cpu_off_fn);
qemu_fdt_setprop_cell(fdt, "/psci", "cpu_on", cpu_on_fn);
qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
}
}
@@ -520,7 +541,7 @@ static QEMUMachine machvirt_a15_machine = {
.name = "virt",
.desc = "ARM Virtual Machine",
.init = machvirt_init,
.max_cpus = 4,
.max_cpus = 8,
};
static void machvirt_machine_init(void)

View File

@@ -1388,14 +1388,6 @@ static int ac97_initfn (PCIDevice *dev)
return 0;
}
static void ac97_exitfn (PCIDevice *dev)
{
AC97LinkState *s = DO_UPCAST (AC97LinkState, dev, dev);
memory_region_destroy (&s->io_nam);
memory_region_destroy (&s->io_nabm);
}
static int ac97_init (PCIBus *bus)
{
pci_create_simple (bus, -1, "AC97");
@@ -1413,7 +1405,6 @@ static void ac97_class_init (ObjectClass *klass, void *data)
PCIDeviceClass *k = PCI_DEVICE_CLASS (klass);
k->init = ac97_initfn;
k->exit = ac97_exitfn;
k->vendor_id = PCI_VENDOR_ID_INTEL;
k->device_id = PCI_DEVICE_ID_INTEL_82801AA_5;
k->revision = 0x01;

View File

@@ -1042,13 +1042,6 @@ static int es1370_initfn (PCIDevice *dev)
return 0;
}
static void es1370_exitfn (PCIDevice *dev)
{
ES1370State *s = DO_UPCAST (ES1370State, dev, dev);
memory_region_destroy (&s->io);
}
static int es1370_init (PCIBus *bus)
{
pci_create_simple (bus, -1, "ES1370");
@@ -1061,7 +1054,6 @@ static void es1370_class_init (ObjectClass *klass, void *data)
PCIDeviceClass *k = PCI_DEVICE_CLASS (klass);
k->init = es1370_initfn;
k->exit = es1370_exitfn;
k->vendor_id = PCI_VENDOR_ID_ENSONIQ;
k->device_id = PCI_DEVICE_ID_ENSONIQ_ES1370;
k->class_id = PCI_CLASS_MULTIMEDIA_AUDIO;

View File

@@ -212,7 +212,7 @@ static int GUS_read_DMA (void *opaque, int nchan, int dma_pos, int dma_len)
pos += copied;
}
if (0 == ((mode >> 4) & 1)) {
if (((mode >> 4) & 1) == 0) {
DMA_release_DREQ (s->emu.gusdma);
}
return dma_len;

View File

@@ -489,8 +489,9 @@ static int hda_audio_init(HDACodecDevice *hda, const struct desc_codec *desc)
for (i = 0; i < a->desc->nnodes; i++) {
node = a->desc->nodes + i;
param = hda_codec_find_param(node, AC_PAR_AUDIO_WIDGET_CAP);
if (NULL == param)
if (param == NULL) {
continue;
}
type = (param->val & AC_WCAP_TYPE) >> AC_WCAP_TYPE_SHIFT;
switch (type) {
case AC_WID_AUD_OUT:

View File

@@ -187,6 +187,7 @@ struct IntelHDAState {
/* properties */
uint32_t debug;
uint32_t msi;
bool old_msi_addr;
};
#define TYPE_INTEL_HDA_GENERIC "intel-hda-generic"
@@ -1141,7 +1142,7 @@ static int intel_hda_init(PCIDevice *pci)
"intel-hda", 0x4000);
pci_register_bar(&d->pci, 0, 0, &d->mmio);
if (d->msi) {
msi_init(&d->pci, 0x50, 1, true, false);
msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
}
hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
@@ -1155,7 +1156,6 @@ static void intel_hda_exit(PCIDevice *pci)
IntelHDAState *d = INTEL_HDA(pci);
msi_uninit(&d->pci);
memory_region_destroy(&d->mmio);
}
static int intel_hda_post_load(void *opaque, int version)
@@ -1236,6 +1236,7 @@ static const VMStateDescription vmstate_intel_hda = {
static Property intel_hda_properties[] = {
DEFINE_PROP_UINT32("debug", IntelHDAState, debug, 0),
DEFINE_PROP_UINT32("msi", IntelHDAState, msi, 1),
DEFINE_PROP_BOOL("old_msi_addr", IntelHDAState, old_msi_addr, false),
DEFINE_PROP_END_OF_LIST(),
};

Some files were not shown because too many files have changed in this diff Show More